Doc drop, all unfinished drafts

2025-02-08 12:21:09 +03:00 · 2018-06-05 05:49:51 +01:00 · 2018-06-05 05:49:51 +01:00 · 1bd7ece6e8
commit 1bd7ece6e8
parent 663733e72b
4 changed files with 236 additions and 3 deletions
--- a/doc/fast-sync.md
+++ b/doc/fast-sync.md
@ -0,0 +1,22 @@
 # Fast Sync
 In Grin, we call "sync" the process of synchronizing a new node or a node that
 hasn't been keeping up with the chain for a while, and bringing it up to the
 latest known most-worked block. Initial Block Download (or IBD) is often used
 by other blockchains, but this is problematic for Grin as it typically does not
 download full blocks.
 In short, a fast-sync in Grin does the following:
 1. Download all block headers, by chunks, on the most worked chain, as
 advertized by other nodes.
 2. Find a header sufficiently back from the chain head. This is called the node
 horizon as it's the furthest a node can reorganize its chain on a new fork if
 it were to occur without triggering another new full sync.
 3. Download the full state as it was at the horizon, including the unspent
 output, range proof and kernel data, as well as all corresponding MMRs. This is
 just one large zip file.
 4. Validate the full state.
 5. Download full blocks since the horizon to get to the chain head.
 In the rest of this section, we will elaborate on each of those steps.
--- a/doc/mmr.md
+++ b/doc/mmr.md
@ -0,0 +1,148 @@
 # Merkle Mountain Ranges
 ## Structure
 Merkle Mountain Ranges [1] are an alternative to Merkle trees [2]. While the
 latter relies on perfectly balanced binary trees, the former can be seen
 either as list of perfectly balance binary trees or a single binary tree that
 would have been truncated from the top right. A Merkle Mountain Range (MMR) is
 strictly append-only, elements are added from the left to the right and
 the range fills up accordingly.
 This illustrates a range of 19 elements, where each node is annotated with
 its order of insertion.
 ```
 Height
 3              14
             /    \
            /      \
           /        \
          /          \
 2        6            13
       /   \        /    \
 1     2     5      9     12     17
     / \   / \    / \   /  \   /  \
 0   0   1 3   4  7   8 10  11 15  16 18
 ```
 This can be represented as a flat list, here storing the height of each node
 at their position of insertion:
 ```
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
 0  0  1  0  0  1  2  0  0  1  0  0  1  2  3  0  0  1  0
 ```
 This structure can be fully described simply from its size (19). It's also
 fairly simple, using fast binary operations, to navigate within a MMR.
 Given a node's position `n`, we can compute its height, the position of its
 parent, its siblings, etc.
 ## Hashing and Bagging
 Just like with Merkle trees, parent nodes in a MMR have for value the hash of
 their 2 children. Grin uses the Blake2b hash function throughout, and always
 prepends the node's position in the MMR before hashing to avoid collisions. So
 for a leaf `l` at index `n` storing data `D` (in the case of an output, the
 data is its Pedersen commitment, for example), we have:
 ```
 Node(l) = Blake2b(n | D)
 ```
 And for any parent `p` at index `m`:
 ```
 Node(p) = Blake2b(m | Node(left_child(p)) | Node(right_child(p)))
 ```
 Contrarily to a Merkle tree, a MMR generally has no single root by construction
 so we need a method to compute one (otherwise it would defeat the purpose of
 using a hash tree). This process is called "bagging the peaks" for reasons
 described in [1].
 First, we identify the peaks of the MMR; we will define one method of doing so
 here. We first write another small example MMR but with the indexes written as
 binary (instead of decimal), starting from 1:
 ```
 Height
 2        111
       /     \
 1     11     110       1010
     /  \    / \      /    \
 0   1   10 100 101  1000  1001  1011
 This MMR has 11 nodes and its peaks are at position 111 (7), 1010 (10) and
 1011 (11). We first notice how the first leftmost peak is always going to be
 the highest and always "all ones" when expressed in binary. Therefore that
 peak will have a position of the form `2^n - 1` and will always be the
 largest such position that is inside the MMR (its position is lesser than the
 total size). We process iteratively for a MMR of size 11:
 ```
 2^0 - 1 = 0, and 0 < 11
 2^1 - 1 = 1, and 1 < 11
 2^2 - 1 = 3, and 3 < 11
 2^3 - 1 = 7, and 7 < 11
 2^4 - 1 = 15, and 15 is not < 11
 ```
 Therefore the first peak is 7. To find the next peak, we then need to "jump" to
 its right sibling. If that node is not in the MMR (and it won't), take its left
 child. If that child is not in the MMR either, keep taking its left child
 until we have a node that exists in our MMR. Once we find that next peak,
 keep repeating the process until we're at the last node.
 All these operations are very simple. Jumping to the right sibling of a node at
 height `h` is adding `2^(h+1) - 1` to its position. Taking its left sibling is
 subtracting `2^h`.
 Finally, once all the positions of the peaks are known, "bagging" the peaks
 consists of hashing them iteratively from the right, using the total size of
 the MMR as prefix. For a MMR of size N with 3 peaks p1, p2 and p3 we get the
 final top peak:
 ```
 P = Blake2b(N | Blake2b(N | Node(p3) | Node(p2)) | Node(p1))
 ```
 ## Pruning
 In Grin, a lot of the data that gets hashed and stored in MMRs can eventually
 be removed. As this happens, the presence of some leaf hashes in the
 corresponding MMRs become unnecessary abd their hash can be removed. When
 enough leaves are removed, the presence of their parents may become unnecessary
 as well. We can therefore prune a significant part of a MMR from the removal of
 its leaves.
 Pruning a MMR relies on a simple iterative process. `X` is first initialized as
 the leaf we wish to prune.
 1. Prune `X`.
 2. If `X` has a sibling, stop here.
 3. If 'X' has no sibling, assign the parent of `X` as `X`.
 To visualize the result, starting from our first MMR example and removing leaves
 [0, 3, 4, 8, 16] leads to the following pruned MMR:
 ```
 Height
 3             14
            /    \
           /      \
          /        \
         /          \
 2       6            13
       /            /   \
 1     2            9     12     17
       \          /     /  \   /  
 0       1        7     10  11 15     18
 ```
 [1] Peter Todd, https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
 [2] https://en.wikipedia.org/wiki/Merkl\_tree
--- a/doc/state.md
+++ b/doc/state.md
@ -0,0 +1,63 @@
 # State and Storage
 ## The Grin State
 ### Structure
 The full state of a Grin chain consists of all the following data:
 1. The full unspent output (UTXO) set.
 2. The range proof for each output.
 3. All the transaction kernels.
 4. A MMR for each of the above (with the exception that the output MMR includes
 hashes for *all* outputs, not only the unspent ones).
 In addition, all headers in the chain are required to anchor the above state
 with a valid proof of work (the state corresponds to the most worked chain).
 We note that once each range proof is validated and the sum of all kernels
 commitment is computed, range proofs and kernels are not stricly necessary for
 a node to function anymore.
 ### Validation
 With a full Grin state, we can validate the following:
 1. The kernel signature is valid against its commitment (public key). This
 proves the kernel is valid.
 2. The sum of all kernel commitments equals the sum of all UTXO commitments
 minus the total supply. This proves that kernels and output commitments are all
 valid and no coins have unexpectedly been created.
 3. All UTXOs, range proofs and kernels hashes are present in their respective
 MMR and those MMRs hash to a valid root.
 4. A known block header with the most work at a given point in time includes
 the roots of the 3 MMRs. This validates the MMRs and proves that the whole
 state has been produced by the most worked chain.
 ### MMRs and Pruning
 The data used to produce the hashes for leaf nodes in each MMR (in addition to
 their position is the following:
 * The output MMR hashes the feature field and the commitments of all outputs
 since genesis.
 * The range proof MMR hashes the whole range proof data.
 * The kernel MMR hashes all fields of the kernel: feature, fee, lock height,
 excess commitment and excess signature.
 Note that all outputs, range proofs and kernels are added in their respective
 MMRs in the order they occur in each block (recall that block data is required
 to be sorted).
 As outputs get spent, both their commitment and range proof data can be
 removed. In addition, the corresponding output and range proof MMRs can be
 pruned.
 ## State Storage
 Data storage for outputs, range proofs and kernels in Grin is simple: a plain
 append-only file that's memory-mapped for data access. As outputs get spent,
 a remove log maintains which positions can be removed. Those positions nicely
 match MMR node positions as they're all inserted in the same order. When the
 remove log gets large, corresponding files can be occasionally compacted by
 rewriting them without the removed pieces (also append-only) and the remove
 log can be emptied. As for MMRs, we need to add a little more complexity.
--- a/doc/toc.md
+++ b/doc/toc.md
@ -21,9 +21,9 @@ more widely.
 		* Compact Block
 * Chain State and Merkle Mountain Range
 	* Motivation
-	* Merkle Mountain Range
+	* [Merkle Mountain Range](mmr.md)
-	* State and Storage
+	* [State and Storage](state.md)
-	* Fast Sync
+	* [Fast Sync](fast-sync.md)
 	* Merkle Proofs
 * Proof of Work
 	* Cuckoo Cycle