mirror of
https://github.com/mimblewimble/grin.git
synced 2025-01-21 03:21:08 +03:00
Add design document for Merkle tree structures
This commit is contained in:
parent
edc6c62577
commit
7b3095ac12
1 changed files with 157 additions and 0 deletions
157
doc/merkle.md
Normal file
157
doc/merkle.md
Normal file
|
@ -0,0 +1,157 @@
|
|||
# Merkle Structures
|
||||
|
||||
MimbleWimble is designed for users to verify the state of the system given
|
||||
only pruned data. To achieve this goal, all transaction data is committed
|
||||
to the blockchain by means of Merkle trees which should support efficient
|
||||
updates and serialization even when pruned.
|
||||
|
||||
Also, almost all transaction data (inputs, outputs, excesses and excess
|
||||
proofs) have the ability to be summed in some way, so it makes sense to
|
||||
treat Merkle sum trees as the default option, and address the sums here.
|
||||
|
||||
A design goal of Grin is that all structures be as easy to implement and
|
||||
as simple as possible. MimbleWimble introduces a lot of new cryptography
|
||||
so it should made as easy to understand as possible. Its validation rules
|
||||
are simple to specify (no scripts) and Grin is written in a language with
|
||||
very explicit semantics, so simplicity is also good to achieve well-understood
|
||||
consensus rules.
|
||||
|
||||
## Merkle Trees
|
||||
|
||||
There are four Merkle trees committed to by each block:
|
||||
|
||||
### Total Output Set
|
||||
|
||||
Each object is one of two things: a commitment indicating an unspent output
|
||||
or a NULL marker indicating a spent one. It is a sum-tree over all unspent
|
||||
outputs (spent ones contribute nothing to the sum). The output set should
|
||||
reflect the state of the chain *after* the current block has taken effect.
|
||||
|
||||
The root sum should be equal to the sum of all excesses since the genesis.
|
||||
|
||||
Design requirements:
|
||||
|
||||
1. Efficient additions and updating from unspent to spent
|
||||
2. Efficient proofs that a specific output was spent
|
||||
3. Efficient storage of diffs between UTXO roots.
|
||||
4. Efficient tree storage even with missing data, even with millions of entries.
|
||||
5. If a node commits to NULL, it has no unspent children and its data should
|
||||
eventually be able to be dropped forever.
|
||||
6. Support serializating and efficient merging of pruned trees from partial
|
||||
archival nodes.
|
||||
|
||||
### Output witnesses
|
||||
|
||||
This tree mirrors the total output set but has rangeproofs in place of commitments.
|
||||
It is never updated, only appended to, and does not sum over anything. When an
|
||||
output is spent it is sufficient to prune its rangeproof from the tree rather
|
||||
than deleting it.
|
||||
|
||||
Design requirements:
|
||||
|
||||
1. Support serializating and efficient merging of pruned trees from partial
|
||||
archival nodes.
|
||||
|
||||
### Inputs and Outputs
|
||||
|
||||
Each object is one of two things: an input (unambiguous reference to an old
|
||||
transaction output), or an output (a (commitment, rangeproof) pair). It is
|
||||
a sum-tree over the commitments of outputs, and the negatives of the commitments
|
||||
of inputs.
|
||||
|
||||
Input references are hashes of old commitments. It is a consensus rule that
|
||||
there are never two identical unspent outputs.
|
||||
|
||||
The root sum should be equal to the sum of excesses for this block. See the
|
||||
next section.
|
||||
|
||||
In general, validators will see either 100% of this Merkle tree or 0% of it,
|
||||
so it is compatible with any design. Design requirements:
|
||||
|
||||
1. Efficient inclusion proofs, for proof-of-publication.
|
||||
|
||||
### Excesses
|
||||
|
||||
Each object is of the form (excess, signature). It is a sum tree over the
|
||||
excesses.
|
||||
|
||||
In general, validators will always see 100% of this tree, so it is not even
|
||||
necessary to have a Merkle structure at all. However, to support partial
|
||||
archival nodes in the future we want to support efficient pruning.
|
||||
|
||||
Design requirements:
|
||||
|
||||
1. Support serializating and efficient merging of pruned trees from partial
|
||||
archival nodes.
|
||||
|
||||
|
||||
## Proposed Merkle Structure
|
||||
|
||||
**The following design is proposed for all trees: a sum-MMR where every node
|
||||
sums a count of its children _as well as_ the data it is supposed to sum.
|
||||
The result is that every node commits to the count of all its children.**
|
||||
|
||||
[MMRs, or Merkle Mountain Ranges](https://github.com/opentimestamps/opentimestamps-server/blob/master/doc/merkle-mountain-range.md)
|
||||
|
||||
The six design criteria for the output set are:
|
||||
|
||||
### Efficient insert/updates
|
||||
|
||||
Immediate (as is proof-of-inclusion). This is true for any balanced Merkle
|
||||
tree design.
|
||||
|
||||
### Efficient proof-of-spentness
|
||||
|
||||
Grin itself does not need proof-of-spentness but it is a good thing to support
|
||||
in the future for SPV clients.
|
||||
|
||||
The children-counts imply an index of each object in the tree, which does not
|
||||
change because insertions happen only at the far right of the tree.
|
||||
|
||||
This allows permanent proof-of-spentness, even if an identical output is later
|
||||
added to the tree, and prevents false proofs even for identical outputs. These
|
||||
properties are hard to achieve for a non-insertion-ordered tree.
|
||||
|
||||
### Efficient storage of diffs
|
||||
|
||||
Storing complete blocks should be sufficient for this. Updates are obviously
|
||||
as easy to undo as they are to do, and since blocks are always processed in
|
||||
order, rewinding them during reorgs is as simple as removing a contiguous
|
||||
set of outputs from the right of the tree. (This should be even faster than
|
||||
repeated deletions in a tree designed to support deletions.)
|
||||
|
||||
### Efficient tree storage even with missing data
|
||||
|
||||
To update the root hash when random outputs are spent, we do not want to need
|
||||
to store or compute the entire tree. Instead we can store only the hashes at
|
||||
depth 20, say, of which there will be at most a million. Then each update only
|
||||
needs to recompute hashes above this depth (Bitcoin has less than 2^29 outputs
|
||||
in its history, so this means computing a tree of size 2^9 = 512 for each update)
|
||||
and after all updates are done, the root hash can be recomputed.
|
||||
|
||||
This depth is configurable and may be changed as the output set grows, or
|
||||
depending on available disk space.
|
||||
|
||||
This is doable for any Merkle tree but may be complicated by PATRICIA trees or
|
||||
other prefix trees, depending how depth is computed.
|
||||
|
||||
### Dropping spent coins
|
||||
|
||||
Since coins never go from spent to unspent, the data on spent coins is not needed
|
||||
for any more updates or lookups.
|
||||
|
||||
### Efficient serialization of pruned trees
|
||||
|
||||
Since every node has a count of its children, validators can determine the
|
||||
structure of the tree without needing all the hashes, and can determine which
|
||||
nodes are siblings, and so on.
|
||||
|
||||
In the output set each node also commits to a sum of its unspent children, so
|
||||
a validator knows if it is missing data on unspent coins, by checking whether
|
||||
this sum on a pruned node is zero or not.
|
||||
|
||||
|
||||
## Algorithms
|
||||
|
||||
(To appear alongside an implementation.)
|
||||
|
Loading…
Reference in a new issue