2016-11-08 20:50:13 +03:00
|
|
|
# Pruning Blockchain Data
|
|
|
|
|
2020-09-10 16:45:14 +03:00
|
|
|
*Read this in other languages: [Korean](translations/pruning_KR.md), [简体中文](translations/pruning_ZH-CN.md).*
|
2019-03-23 23:50:26 +03:00
|
|
|
|
2019-11-19 13:49:32 +03:00
|
|
|
One of the principal attractions of Mimblewimble is its theoretical space
|
2016-11-08 20:50:13 +03:00
|
|
|
efficiency. Indeed, a trusted or pre-validated full blockchain state only
|
|
|
|
requires unspent transaction outputs, which could be tiny.
|
|
|
|
|
|
|
|
The grin blockchain includes the following types of data (we assume prior
|
2019-11-19 13:49:32 +03:00
|
|
|
understanding of the Mimblewimble protocol):
|
2016-11-08 20:50:13 +03:00
|
|
|
|
|
|
|
1. Transaction outputs, which include for each output:
|
2017-09-07 21:19:19 +03:00
|
|
|
1. A Pedersen commitment (33 bytes).
|
2019-11-26 16:57:01 +03:00
|
|
|
1. A range proof (over 5KB at this time).
|
|
|
|
1. Transaction inputs, which are just output references (32 bytes).
|
|
|
|
1. Transaction "proofs", which include for each transaction:
|
2017-09-07 21:19:19 +03:00
|
|
|
1. The excess commitment sum for the transaction (33 bytes).
|
2019-11-26 16:57:01 +03:00
|
|
|
1. A signature generated with the excess (71 bytes average).
|
|
|
|
1. A block header includes Merkle trees and proof of work (about 250 bytes).
|
2016-11-08 20:50:13 +03:00
|
|
|
|
|
|
|
Assuming a blockchain of a million blocks, 10 million transactions (2 inputs, 2.5
|
|
|
|
outputs average) and 100,000 unspent outputs, we get the following approximate
|
|
|
|
sizes with a full chain (no pruning, no cut-through):
|
|
|
|
|
|
|
|
* 128GB of transaction data (inputs and outputs).
|
|
|
|
* 1 GB of transaction proof data.
|
|
|
|
* 250MB of block headers.
|
|
|
|
* Total chain size around 130GB.
|
|
|
|
* Total chain size, after cut-through (but incl. headers) of 1.8GB.
|
|
|
|
* UTXO size of 520MB.
|
|
|
|
* Total chain size, without range proofs of 4GB.
|
|
|
|
* UTXO size, without range proofs of 3.3MB.
|
|
|
|
|
|
|
|
We note that out of all that data, once the chain has been fully validated, only
|
|
|
|
the set of UTXO commitments is strictly required for a node to function.
|
|
|
|
|
|
|
|
There may be several contexts in which data can be pruned:
|
|
|
|
|
|
|
|
* A fully validating node may get rid of some data it has already validated to
|
2018-10-03 23:31:28 +03:00
|
|
|
free space.
|
2017-01-10 02:16:44 +03:00
|
|
|
* A partially validating node (similar to SPV) may not be interested in either
|
2018-10-03 23:31:28 +03:00
|
|
|
receiving or keeping all the data.
|
2016-11-08 20:50:13 +03:00
|
|
|
* When a new node joins the network, it may temporarily behave as a partially
|
2018-10-03 23:31:28 +03:00
|
|
|
validating node to make it available for use faster, even if it ultimately becomes
|
|
|
|
a fully validating node.
|
2018-01-10 22:57:16 +03:00
|
|
|
|
2018-10-03 23:31:28 +03:00
|
|
|
## Validation of Fully Pruned State
|
2018-01-10 22:57:16 +03:00
|
|
|
|
|
|
|
Pruning needs to remove as much data as possible while keeping all the
|
2019-11-19 13:49:32 +03:00
|
|
|
guarantees of a full Mimblewimble-style validation. This is necessary to keep
|
2018-01-10 22:57:16 +03:00
|
|
|
a pruning node state's sane, but also on first fast sync, where only the
|
|
|
|
minimum amount of data is sent to a new node.
|
|
|
|
|
|
|
|
The full validation of the chain state requires that:
|
|
|
|
|
|
|
|
* All kernel signatures verify against their public keys.
|
|
|
|
* The sum of all UTXO commitments, minus the supply is a valid public key (can
|
2018-10-03 23:31:28 +03:00
|
|
|
be used to sign the empty string).
|
2018-01-10 22:57:16 +03:00
|
|
|
* The sum of all kernel pubkeys equals the sum of all UTXO commitments, minus
|
2018-10-03 23:31:28 +03:00
|
|
|
the supply.
|
2018-01-10 22:57:16 +03:00
|
|
|
* The root hashes of the UTXO PMMR, the range proofs PMMR and the kernels MMR
|
2018-10-03 23:31:28 +03:00
|
|
|
match a block header with a valid Proof of Work chain.
|
2018-01-10 22:57:16 +03:00
|
|
|
* All range proofs are valid.
|
|
|
|
|
|
|
|
In addition, while not necessary to validate the full chain state, to be able
|
|
|
|
to accept and validate new blocks additional data is required:
|
|
|
|
|
2019-02-28 01:20:20 +03:00
|
|
|
* The output features, making the full output data necessary for all UTXOs.
|
2018-01-10 22:57:16 +03:00
|
|
|
|
|
|
|
At minimum, this requires the following data:
|
|
|
|
|
|
|
|
* The block headers chain.
|
|
|
|
* All kernels, in order of inclusion in the chain. This also allows the
|
2018-10-03 23:31:28 +03:00
|
|
|
reconstruction of the kernel MMR.
|
2018-01-10 22:57:16 +03:00
|
|
|
* All unspent outputs.
|
|
|
|
* The UTXO MMR and the range proof MMR (to learn the hashes of pruned data).
|
|
|
|
|
|
|
|
Note that further pruning could be obtained by requiring the validation of
|
|
|
|
only a subset of the range proofs, chosen randomly by the validating node.
|