Doc drop, all unfinished drafts

This commit is contained in:
Ignotus Peverell 2018-06-05 05:49:51 +01:00
parent 663733e72b
commit 1bd7ece6e8
No known key found for this signature in database
GPG key ID: 99CD25F39F8F8211
4 changed files with 236 additions and 3 deletions

22
doc/fast-sync.md Normal file
View file

@ -0,0 +1,22 @@
# Fast Sync
In Grin, we call "sync" the process of synchronizing a new node or a node that
hasn't been keeping up with the chain for a while, and bringing it up to the
latest known most-worked block. Initial Block Download (or IBD) is often used
by other blockchains, but this is problematic for Grin as it typically does not
download full blocks.
In short, a fast-sync in Grin does the following:
1. Download all block headers, by chunks, on the most worked chain, as
advertized by other nodes.
2. Find a header sufficiently back from the chain head. This is called the node
horizon as it's the furthest a node can reorganize its chain on a new fork if
it were to occur without triggering another new full sync.
3. Download the full state as it was at the horizon, including the unspent
output, range proof and kernel data, as well as all corresponding MMRs. This is
just one large zip file.
4. Validate the full state.
5. Download full blocks since the horizon to get to the chain head.
In the rest of this section, we will elaborate on each of those steps.

148
doc/mmr.md Normal file
View file

@ -0,0 +1,148 @@
# Merkle Mountain Ranges
## Structure
Merkle Mountain Ranges [1] are an alternative to Merkle trees [2]. While the
latter relies on perfectly balanced binary trees, the former can be seen
either as list of perfectly balance binary trees or a single binary tree that
would have been truncated from the top right. A Merkle Mountain Range (MMR) is
strictly append-only, elements are added from the left to the right and
the range fills up accordingly.
This illustrates a range of 19 elements, where each node is annotated with
its order of insertion.
```
Height
3 14
/ \
/ \
/ \
/ \
2 6 13
/ \ / \
1 2 5 9 12 17
/ \ / \ / \ / \ / \
0 0 1 3 4 7 8 10 11 15 16 18
```
This can be represented as a flat list, here storing the height of each node
at their position of insertion:
```
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
0 0 1 0 0 1 2 0 0 1 0 0 1 2 3 0 0 1 0
```
This structure can be fully described simply from its size (19). It's also
fairly simple, using fast binary operations, to navigate within a MMR.
Given a node's position `n`, we can compute its height, the position of its
parent, its siblings, etc.
## Hashing and Bagging
Just like with Merkle trees, parent nodes in a MMR have for value the hash of
their 2 children. Grin uses the Blake2b hash function throughout, and always
prepends the node's position in the MMR before hashing to avoid collisions. So
for a leaf `l` at index `n` storing data `D` (in the case of an output, the
data is its Pedersen commitment, for example), we have:
```
Node(l) = Blake2b(n | D)
```
And for any parent `p` at index `m`:
```
Node(p) = Blake2b(m | Node(left_child(p)) | Node(right_child(p)))
```
Contrarily to a Merkle tree, a MMR generally has no single root by construction
so we need a method to compute one (otherwise it would defeat the purpose of
using a hash tree). This process is called "bagging the peaks" for reasons
described in [1].
First, we identify the peaks of the MMR; we will define one method of doing so
here. We first write another small example MMR but with the indexes written as
binary (instead of decimal), starting from 1:
```
Height
2 111
/ \
1 11 110 1010
/ \ / \ / \
0 1 10 100 101 1000 1001 1011
This MMR has 11 nodes and its peaks are at position 111 (7), 1010 (10) and
1011 (11). We first notice how the first leftmost peak is always going to be
the highest and always "all ones" when expressed in binary. Therefore that
peak will have a position of the form `2^n - 1` and will always be the
largest such position that is inside the MMR (its position is lesser than the
total size). We process iteratively for a MMR of size 11:
```
2^0 - 1 = 0, and 0 < 11
2^1 - 1 = 1, and 1 < 11
2^2 - 1 = 3, and 3 < 11
2^3 - 1 = 7, and 7 < 11
2^4 - 1 = 15, and 15 is not < 11
```
Therefore the first peak is 7. To find the next peak, we then need to "jump" to
its right sibling. If that node is not in the MMR (and it won't), take its left
child. If that child is not in the MMR either, keep taking its left child
until we have a node that exists in our MMR. Once we find that next peak,
keep repeating the process until we're at the last node.
All these operations are very simple. Jumping to the right sibling of a node at
height `h` is adding `2^(h+1) - 1` to its position. Taking its left sibling is
subtracting `2^h`.
Finally, once all the positions of the peaks are known, "bagging" the peaks
consists of hashing them iteratively from the right, using the total size of
the MMR as prefix. For a MMR of size N with 3 peaks p1, p2 and p3 we get the
final top peak:
```
P = Blake2b(N | Blake2b(N | Node(p3) | Node(p2)) | Node(p1))
```
## Pruning
In Grin, a lot of the data that gets hashed and stored in MMRs can eventually
be removed. As this happens, the presence of some leaf hashes in the
corresponding MMRs become unnecessary abd their hash can be removed. When
enough leaves are removed, the presence of their parents may become unnecessary
as well. We can therefore prune a significant part of a MMR from the removal of
its leaves.
Pruning a MMR relies on a simple iterative process. `X` is first initialized as
the leaf we wish to prune.
1. Prune `X`.
2. If `X` has a sibling, stop here.
3. If 'X' has no sibling, assign the parent of `X` as `X`.
To visualize the result, starting from our first MMR example and removing leaves
[0, 3, 4, 8, 16] leads to the following pruned MMR:
```
Height
3 14
/ \
/ \
/ \
/ \
2 6 13
/ / \
1 2 9 12 17
\ / / \ /
0 1 7 10 11 15 18
```
[1] Peter Todd, https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md
[2] https://en.wikipedia.org/wiki/Merkl\_tree

63
doc/state.md Normal file
View file

@ -0,0 +1,63 @@
# State and Storage
## The Grin State
### Structure
The full state of a Grin chain consists of all the following data:
1. The full unspent output (UTXO) set.
2. The range proof for each output.
3. All the transaction kernels.
4. A MMR for each of the above (with the exception that the output MMR includes
hashes for *all* outputs, not only the unspent ones).
In addition, all headers in the chain are required to anchor the above state
with a valid proof of work (the state corresponds to the most worked chain).
We note that once each range proof is validated and the sum of all kernels
commitment is computed, range proofs and kernels are not stricly necessary for
a node to function anymore.
### Validation
With a full Grin state, we can validate the following:
1. The kernel signature is valid against its commitment (public key). This
proves the kernel is valid.
2. The sum of all kernel commitments equals the sum of all UTXO commitments
minus the total supply. This proves that kernels and output commitments are all
valid and no coins have unexpectedly been created.
3. All UTXOs, range proofs and kernels hashes are present in their respective
MMR and those MMRs hash to a valid root.
4. A known block header with the most work at a given point in time includes
the roots of the 3 MMRs. This validates the MMRs and proves that the whole
state has been produced by the most worked chain.
### MMRs and Pruning
The data used to produce the hashes for leaf nodes in each MMR (in addition to
their position is the following:
* The output MMR hashes the feature field and the commitments of all outputs
since genesis.
* The range proof MMR hashes the whole range proof data.
* The kernel MMR hashes all fields of the kernel: feature, fee, lock height,
excess commitment and excess signature.
Note that all outputs, range proofs and kernels are added in their respective
MMRs in the order they occur in each block (recall that block data is required
to be sorted).
As outputs get spent, both their commitment and range proof data can be
removed. In addition, the corresponding output and range proof MMRs can be
pruned.
## State Storage
Data storage for outputs, range proofs and kernels in Grin is simple: a plain
append-only file that's memory-mapped for data access. As outputs get spent,
a remove log maintains which positions can be removed. Those positions nicely
match MMR node positions as they're all inserted in the same order. When the
remove log gets large, corresponding files can be occasionally compacted by
rewriting them without the removed pieces (also append-only) and the remove
log can be emptied. As for MMRs, we need to add a little more complexity.

View file

@ -21,9 +21,9 @@ more widely.
* Compact Block
* Chain State and Merkle Mountain Range
* Motivation
* Merkle Mountain Range
* State and Storage
* Fast Sync
* [Merkle Mountain Range](mmr.md)
* [State and Storage](state.md)
* [Fast Sync](fast-sync.md)
* Merkle Proofs
* Proof of Work
* Cuckoo Cycle