diff --git a/doc/fast-sync.md b/doc/fast-sync.md new file mode 100644 index 000000000..55302e290 --- /dev/null +++ b/doc/fast-sync.md @@ -0,0 +1,22 @@ +# Fast Sync + +In Grin, we call "sync" the process of synchronizing a new node or a node that +hasn't been keeping up with the chain for a while, and bringing it up to the +latest known most-worked block. Initial Block Download (or IBD) is often used +by other blockchains, but this is problematic for Grin as it typically does not +download full blocks. + +In short, a fast-sync in Grin does the following: + +1. Download all block headers, by chunks, on the most worked chain, as +advertized by other nodes. +2. Find a header sufficiently back from the chain head. This is called the node +horizon as it's the furthest a node can reorganize its chain on a new fork if +it were to occur without triggering another new full sync. +3. Download the full state as it was at the horizon, including the unspent +output, range proof and kernel data, as well as all corresponding MMRs. This is +just one large zip file. +4. Validate the full state. +5. Download full blocks since the horizon to get to the chain head. + +In the rest of this section, we will elaborate on each of those steps. diff --git a/doc/mmr.md b/doc/mmr.md new file mode 100644 index 000000000..3d00d5b3b --- /dev/null +++ b/doc/mmr.md @@ -0,0 +1,148 @@ +# Merkle Mountain Ranges + +## Structure + +Merkle Mountain Ranges [1] are an alternative to Merkle trees [2]. While the +latter relies on perfectly balanced binary trees, the former can be seen +either as list of perfectly balance binary trees or a single binary tree that +would have been truncated from the top right. A Merkle Mountain Range (MMR) is +strictly append-only, elements are added from the left to the right and +the range fills up accordingly. + +This illustrates a range of 19 elements, where each node is annotated with +its order of insertion. + +``` +Height + +3 14 + / \ + / \ + / \ + / \ +2 6 13 + / \ / \ +1 2 5 9 12 17 + / \ / \ / \ / \ / \ +0 0 1 3 4 7 8 10 11 15 16 18 +``` + +This can be represented as a flat list, here storing the height of each node +at their position of insertion: + +``` +0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 +0 0 1 0 0 1 2 0 0 1 0 0 1 2 3 0 0 1 0 +``` + +This structure can be fully described simply from its size (19). It's also +fairly simple, using fast binary operations, to navigate within a MMR. +Given a node's position `n`, we can compute its height, the position of its +parent, its siblings, etc. + +## Hashing and Bagging + +Just like with Merkle trees, parent nodes in a MMR have for value the hash of +their 2 children. Grin uses the Blake2b hash function throughout, and always +prepends the node's position in the MMR before hashing to avoid collisions. So +for a leaf `l` at index `n` storing data `D` (in the case of an output, the +data is its Pedersen commitment, for example), we have: + +``` +Node(l) = Blake2b(n | D) +``` + +And for any parent `p` at index `m`: + +``` +Node(p) = Blake2b(m | Node(left_child(p)) | Node(right_child(p))) +``` + +Contrarily to a Merkle tree, a MMR generally has no single root by construction +so we need a method to compute one (otherwise it would defeat the purpose of +using a hash tree). This process is called "bagging the peaks" for reasons +described in [1]. + +First, we identify the peaks of the MMR; we will define one method of doing so +here. We first write another small example MMR but with the indexes written as +binary (instead of decimal), starting from 1: + +``` +Height + +2 111 + / \ +1 11 110 1010 + / \ / \ / \ +0 1 10 100 101 1000 1001 1011 + +This MMR has 11 nodes and its peaks are at position 111 (7), 1010 (10) and +1011 (11). We first notice how the first leftmost peak is always going to be +the highest and always "all ones" when expressed in binary. Therefore that +peak will have a position of the form `2^n - 1` and will always be the +largest such position that is inside the MMR (its position is lesser than the +total size). We process iteratively for a MMR of size 11: + +``` +2^0 - 1 = 0, and 0 < 11 +2^1 - 1 = 1, and 1 < 11 +2^2 - 1 = 3, and 3 < 11 +2^3 - 1 = 7, and 7 < 11 +2^4 - 1 = 15, and 15 is not < 11 +``` + +Therefore the first peak is 7. To find the next peak, we then need to "jump" to +its right sibling. If that node is not in the MMR (and it won't), take its left +child. If that child is not in the MMR either, keep taking its left child +until we have a node that exists in our MMR. Once we find that next peak, +keep repeating the process until we're at the last node. + +All these operations are very simple. Jumping to the right sibling of a node at +height `h` is adding `2^(h+1) - 1` to its position. Taking its left sibling is +subtracting `2^h`. + +Finally, once all the positions of the peaks are known, "bagging" the peaks +consists of hashing them iteratively from the right, using the total size of +the MMR as prefix. For a MMR of size N with 3 peaks p1, p2 and p3 we get the +final top peak: + +``` +P = Blake2b(N | Blake2b(N | Node(p3) | Node(p2)) | Node(p1)) +``` + +## Pruning + +In Grin, a lot of the data that gets hashed and stored in MMRs can eventually +be removed. As this happens, the presence of some leaf hashes in the +corresponding MMRs become unnecessary abd their hash can be removed. When +enough leaves are removed, the presence of their parents may become unnecessary +as well. We can therefore prune a significant part of a MMR from the removal of +its leaves. + +Pruning a MMR relies on a simple iterative process. `X` is first initialized as +the leaf we wish to prune. + +1. Prune `X`. +2. If `X` has a sibling, stop here. +3. If 'X' has no sibling, assign the parent of `X` as `X`. + +To visualize the result, starting from our first MMR example and removing leaves +[0, 3, 4, 8, 16] leads to the following pruned MMR: + +``` +Height + +3 14 + / \ + / \ + / \ + / \ +2 6 13 + / / \ +1 2 9 12 17 + \ / / \ / +0 1 7 10 11 15 18 +``` + +[1] Peter Todd, https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md +[2] https://en.wikipedia.org/wiki/Merkl\_tree diff --git a/doc/state.md b/doc/state.md new file mode 100644 index 000000000..774da9b37 --- /dev/null +++ b/doc/state.md @@ -0,0 +1,63 @@ +# State and Storage + +## The Grin State + +### Structure + +The full state of a Grin chain consists of all the following data: + +1. The full unspent output (UTXO) set. +2. The range proof for each output. +3. All the transaction kernels. +4. A MMR for each of the above (with the exception that the output MMR includes +hashes for *all* outputs, not only the unspent ones). + +In addition, all headers in the chain are required to anchor the above state +with a valid proof of work (the state corresponds to the most worked chain). +We note that once each range proof is validated and the sum of all kernels +commitment is computed, range proofs and kernels are not stricly necessary for +a node to function anymore. + +### Validation + +With a full Grin state, we can validate the following: + +1. The kernel signature is valid against its commitment (public key). This +proves the kernel is valid. +2. The sum of all kernel commitments equals the sum of all UTXO commitments +minus the total supply. This proves that kernels and output commitments are all +valid and no coins have unexpectedly been created. +3. All UTXOs, range proofs and kernels hashes are present in their respective +MMR and those MMRs hash to a valid root. +4. A known block header with the most work at a given point in time includes +the roots of the 3 MMRs. This validates the MMRs and proves that the whole +state has been produced by the most worked chain. + +### MMRs and Pruning + +The data used to produce the hashes for leaf nodes in each MMR (in addition to +their position is the following: + +* The output MMR hashes the feature field and the commitments of all outputs +since genesis. +* The range proof MMR hashes the whole range proof data. +* The kernel MMR hashes all fields of the kernel: feature, fee, lock height, +excess commitment and excess signature. + +Note that all outputs, range proofs and kernels are added in their respective +MMRs in the order they occur in each block (recall that block data is required +to be sorted). + +As outputs get spent, both their commitment and range proof data can be +removed. In addition, the corresponding output and range proof MMRs can be +pruned. + +## State Storage + +Data storage for outputs, range proofs and kernels in Grin is simple: a plain +append-only file that's memory-mapped for data access. As outputs get spent, +a remove log maintains which positions can be removed. Those positions nicely +match MMR node positions as they're all inserted in the same order. When the +remove log gets large, corresponding files can be occasionally compacted by +rewriting them without the removed pieces (also append-only) and the remove +log can be emptied. As for MMRs, we need to add a little more complexity. diff --git a/doc/toc.md b/doc/toc.md index 143a29d8d..7e250ae1c 100644 --- a/doc/toc.md +++ b/doc/toc.md @@ -21,9 +21,9 @@ more widely. * Compact Block * Chain State and Merkle Mountain Range * Motivation - * Merkle Mountain Range - * State and Storage - * Fast Sync + * [Merkle Mountain Range](mmr.md) + * [State and Storage](state.md) + * [Fast Sync](fast-sync.md) * Merkle Proofs * Proof of Work * Cuckoo Cycle