Doc drop, all unfinished drafts

2025-02-01 17:01:09 +03:00 · 2018-06-05 05:49:51 +01:00 · 2018-06-05 05:49:51 +01:00 · 1bd7ece6e8
commit 1bd7ece6e8
parent 663733e72b
4 changed files with 236 additions and 3 deletions
--- a/doc/fast-sync.md
+++ b/doc/fast-sync.md
@ -0,0 +1,22 @@
+# Fast Sync
+
+In Grin, we call "sync" the process of synchronizing a new node or a node that
+hasn't been keeping up with the chain for a while, and bringing it up to the
+latest known most-worked block. Initial Block Download (or IBD) is often used
+by other blockchains, but this is problematic for Grin as it typically does not
+download full blocks.
+
+In short, a fast-sync in Grin does the following:
+
+1. Download all block headers, by chunks, on the most worked chain, as
+advertized by other nodes.
+2. Find a header sufficiently back from the chain head. This is called the node
+horizon as it's the furthest a node can reorganize its chain on a new fork if
+it were to occur without triggering another new full sync.
+3. Download the full state as it was at the horizon, including the unspent
+output, range proof and kernel data, as well as all corresponding MMRs. This is
+just one large zip file.
+4. Validate the full state.
+5. Download full blocks since the horizon to get to the chain head.
+
+In the rest of this section, we will elaborate on each of those steps.
--- a/doc/mmr.md
+++ b/doc/mmr.md
@ -0,0 +1,148 @@
+# Merkle Mountain Ranges
+
+## Structure
+
+Merkle Mountain Ranges [1] are an alternative to Merkle trees [2]. While the
+latter relies on perfectly balanced binary trees, the former can be seen
+either as list of perfectly balance binary trees or a single binary tree that
+would have been truncated from the top right. A Merkle Mountain Range (MMR) is
+strictly append-only, elements are added from the left to the right and
+the range fills up accordingly.
+
+This illustrates a range of 19 elements, where each node is annotated with
+its order of insertion.
+
+```
+Height
+
+3              14
+             /    \
+            /      \
+           /        \
+          /          \
+2        6            13
+       /   \        /    \
+1     2     5      9     12     17
+     / \   / \    / \   /  \   /  \
+0   0   1 3   4  7   8 10  11 15  16 18
+```
+
+This can be represented as a flat list, here storing the height of each node
+at their position of insertion:
+
+```
+0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
+0  0  1  0  0  1  2  0  0  1  0  0  1  2  3  0  0  1  0
+```
+
+This structure can be fully described simply from its size (19). It's also
+fairly simple, using fast binary operations, to navigate within a MMR.
+Given a node's position `n`, we can compute its height, the position of its
+parent, its siblings, etc.
+
+## Hashing and Bagging
+
+Just like with Merkle trees, parent nodes in a MMR have for value the hash of
+their 2 children. Grin uses the Blake2b hash function throughout, and always
+prepends the node's position in the MMR before hashing to avoid collisions. So
+for a leaf `l` at index `n` storing data `D` (in the case of an output, the
+data is its Pedersen commitment, for example), we have:
+
+```
+Node(l) = Blake2b(n | D)
+```
+
+And for any parent `p` at index `m`:
+
+```
+Node(p) = Blake2b(m | Node(left_child(p)) | Node(right_child(p)))
+```
+
+Contrarily to a Merkle tree, a MMR generally has no single root by construction
+so we need a method to compute one (otherwise it would defeat the purpose of
+using a hash tree). This process is called "bagging the peaks" for reasons
+described in [1].
+
+First, we identify the peaks of the MMR; we will define one method of doing so
+here. We first write another small example MMR but with the indexes written as
+binary (instead of decimal), starting from 1:
+
+```
+Height
+
+2        111
+       /     \
+1     11     110       1010
+     /  \    / \      /    \
+0   1   10 100 101  1000  1001  1011
+
+This MMR has 11 nodes and its peaks are at position 111 (7), 1010 (10) and
+1011 (11). We first notice how the first leftmost peak is always going to be
+the highest and always "all ones" when expressed in binary. Therefore that
+peak will have a position of the form `2^n - 1` and will always be the
+largest such position that is inside the MMR (its position is lesser than the
+total size). We process iteratively for a MMR of size 11:
+
+```
+2^0 - 1 = 0, and 0 < 11
+2^1 - 1 = 1, and 1 < 11
+2^2 - 1 = 3, and 3 < 11
+2^3 - 1 = 7, and 7 < 11
+2^4 - 1 = 15, and 15 is not < 11
+```
+
+Therefore the first peak is 7. To find the next peak, we then need to "jump" to
+its right sibling. If that node is not in the MMR (and it won't), take its left
+child. If that child is not in the MMR either, keep taking its left child
+until we have a node that exists in our MMR. Once we find that next peak,
+keep repeating the process until we're at the last node.
+
+All these operations are very simple. Jumping to the right sibling of a node at
+height `h` is adding `2^(h+1) - 1` to its position. Taking its left sibling is
+subtracting `2^h`.
+
+Finally, once all the positions of the peaks are known, "bagging" the peaks
+consists of hashing them iteratively from the right, using the total size of
+the MMR as prefix. For a MMR of size N with 3 peaks p1, p2 and p3 we get the
+final top peak:
+
+```
+P = Blake2b(N | Blake2b(N | Node(p3) | Node(p2)) | Node(p1))
+```
+
+## Pruning
+
+In Grin, a lot of the data that gets hashed and stored in MMRs can eventually
+be removed. As this happens, the presence of some leaf hashes in the
+corresponding MMRs become unnecessary abd their hash can be removed. When
+enough leaves are removed, the presence of their parents may become unnecessary
+as well. We can therefore prune a significant part of a MMR from the removal of
+its leaves.
+
+Pruning a MMR relies on a simple iterative process. `X` is first initialized as
+the leaf we wish to prune.
+
+1. Prune `X`.
+2. If `X` has a sibling, stop here.
+3. If 'X' has no sibling, assign the parent of `X` as `X`.
+
+To visualize the result, starting from our first MMR example and removing leaves
+[0, 3, 4, 8, 16] leads to the following pruned MMR:
+
+```
+Height
+
+3             14
+            /    \
+           /      \
+          /        \
+         /          \
+2       6            13
+       /            /   \
+1     2            9     12     17
+       \          /     /  \   /  
+0       1        7     10  11 15     18
+```
+
+[1] Peter Todd, https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
+[2] https://en.wikipedia.org/wiki/Merkl\_tree
--- a/doc/state.md
+++ b/doc/state.md
@ -0,0 +1,63 @@
+# State and Storage
+
+## The Grin State
+
+### Structure
+
+The full state of a Grin chain consists of all the following data:
+
+1. The full unspent output (UTXO) set.
+2. The range proof for each output.
+3. All the transaction kernels.
+4. A MMR for each of the above (with the exception that the output MMR includes
+hashes for *all* outputs, not only the unspent ones).
+
+In addition, all headers in the chain are required to anchor the above state
+with a valid proof of work (the state corresponds to the most worked chain).
+We note that once each range proof is validated and the sum of all kernels
+commitment is computed, range proofs and kernels are not stricly necessary for
+a node to function anymore.
+
+### Validation
+
+With a full Grin state, we can validate the following:
+
+1. The kernel signature is valid against its commitment (public key). This
+proves the kernel is valid.
+2. The sum of all kernel commitments equals the sum of all UTXO commitments
+minus the total supply. This proves that kernels and output commitments are all
+valid and no coins have unexpectedly been created.
+3. All UTXOs, range proofs and kernels hashes are present in their respective
+MMR and those MMRs hash to a valid root.
+4. A known block header with the most work at a given point in time includes
+the roots of the 3 MMRs. This validates the MMRs and proves that the whole
+state has been produced by the most worked chain.
+
+### MMRs and Pruning
+
+The data used to produce the hashes for leaf nodes in each MMR (in addition to
+their position is the following:
+
+* The output MMR hashes the feature field and the commitments of all outputs
+since genesis.
+* The range proof MMR hashes the whole range proof data.
+* The kernel MMR hashes all fields of the kernel: feature, fee, lock height,
+excess commitment and excess signature.
+
+Note that all outputs, range proofs and kernels are added in their respective
+MMRs in the order they occur in each block (recall that block data is required
+to be sorted).
+
+As outputs get spent, both their commitment and range proof data can be
+removed. In addition, the corresponding output and range proof MMRs can be
+pruned.
+
+## State Storage
+
+Data storage for outputs, range proofs and kernels in Grin is simple: a plain
+append-only file that's memory-mapped for data access. As outputs get spent,
+a remove log maintains which positions can be removed. Those positions nicely
+match MMR node positions as they're all inserted in the same order. When the
+remove log gets large, corresponding files can be occasionally compacted by
+rewriting them without the removed pieces (also append-only) and the remove
+log can be emptied. As for MMRs, we need to add a little more complexity.
--- a/doc/toc.md
+++ b/doc/toc.md
@ -21,9 +21,9 @@ more widely.
 		* Compact Block
 * Chain State and Merkle Mountain Range
 	* Motivation
-	* Merkle Mountain Range
-	* State and Storage
-	* Fast Sync
+	* [Merkle Mountain Range](mmr.md)
+	* [State and Storage](state.md)
+	* [Fast Sync](fast-sync.md)
 	* Merkle Proofs
 * Proof of Work
 	* Cuckoo Cycle