grin/doc/pruning.md
Yeastplume 5cb8025ddd
[1.1.0] Merge master into 1.1.0 (#2720)
* cleanup legacy "3 dot" check (#2625)

* Allow to peers behind NAT to get up to preferred_max connections (#2543)

Allow to peers behind NAT to get up to preffered_max connections

If peer has only outbound connections it's mot likely behind NAT and we should not stop it from getting more outbound connections

* Reduce usage of unwrap in p2p crate (#2627)

Also change store crate a bit

* Simplify (and fix) output_pos cleanup during chain compaction (#2609)

* expose leaf pos iterator
use it for various things in txhashset when iterating over outputs

* fix

* cleanup

* rebuild output_pos index (and clear it out first) when compacting the chain

* fixup tests

* refactor to match on (output, proof) tuple

* add comments to compact() to explain what is going on.

* get rid of some boxing around the leaf_set iterator

* cleanup

* [docs] Add switch commitment documentation (#2526)

* remove references to no-longer existing switch commitment hash

(as switch commitments were removed in ca8447f3bd
and moved into the blinding factor of the Pedersen Commitment)

* some rewording (points vs curves) and fix of small formatting issues

* Add switch commitment documentation

* [docs] Documents in grin repo had translated in Korean.  (#2604)

*  Start to M/W intro translate in Korean
*  translate in Korean
*  add korean translation  on intro
* table_of_content.md translate in Korean.
*  table_of_content_KR.md finish translate in Korean, start to translate State_KR.md
*  add state_KR.md & commit some translation in State_KR.md
*  WIP stat_KR.md translation
*  add build_KR.md && stratum_KR.md
*  finish translate stratum_KR.md & table_of_content_KR.md
*  rename intro.KR.md to intro_KR.md
*  add intro_KR.md file path each language's  intro.md
*  add Korean translation file path to stratum.md & table_of_contents.md
*  fix difference with grin/master

* Fix TxHashSet file filter for Windows. (#2641)

* Fix TxHashSet file filter for Windows.

* rustfmt

* Updating regexp

* Adding in test case

* Display the current download rate rather than the average when syncing the chain (#2633)

* When syncing the chain, calculate the displayed download speed using the current rate from the most recent iteration, rather than the average download speed from the entire syncing process.

* Replace the explicitly ignored variables in the pattern with an implicit ignore

* remove root = true from editorconfig (#2655)

* Add Medium post to intro (#2654)

Spoke to @yeastplume who agreed it makes sense to add the "Grin Transactions Explained, Step-by-Step" Medium post to intro.md

Open for suggestions on a better location.

* add a new configure item for log_max_files (#2601)

* add a new configure item for log_max_files

* rustfmt

* use a constant instead of multiple 32

* rustfmt

* Fix the build warning of deprecated trim_right_matches (#2662)

* [DOC] state.md, build.md and chain directory documents translate in Korean.  (#2649)

*  add md files for translation.

*  start to translation fast-sync, code_structure. add file build_KR.md, states_KR.md

* add dandelion_KR.md && simulation_KR.md for Korean translation.

*  add md files for translation.

*  start to translation fast-sync, code_structure. add file build_KR.md, states_KR.md

* add dandelion_KR.md && simulation_KR.md for Korean translation.

* remove some useless md files for translation. this is rearrange set up translation order.

*  add dot end of sentence & translate build.md in korean

*  remove fast-sync_KR.md

*  finish build_KR.md translation

*  finish build_KR.md translation

*  finish translation state_KR.md & add phrase in state.md to move other language md file

* translate blocks_and_headers.md && chain_sync.md in Korean

*  add . in chain_sync.md , translation finished in doc/chain dir.

* fix some miss typos

* Api documentation fixes (#2646)

* Fix the API documentation for Chain Validate (v1/chain/validate).  It was documented as a POST, but it is actually a GET request, which can be seen in its handler ChainValidationHandler
* Update the API V1 route list response to include the headers and merkleproof routes.  Also clarify that for the chain/outputs route you must specify either byids or byheight to select outputs.

* refactor(ci): reorganize CI related code (#2658)

Break-down the CI related code into smaller more maintainable pieces.

* Specify grin or nanogrins in API docs where applicable (#2642)

* Set Content-Type in API client (#2680)

* Reduce number of unwraps in chain crate (#2679)

* fix: the restart of state sync doesn't work sometimes (#2687)

* let check_txhashset_needed return true on abnormal case (#2684)

*  Reduce number of unwwaps in api crate  (#2681)

* Reduce number of unwwaps in api crate

* Format use section

* Small QoL improvements for wallet developers (#2651)

* Small changes for wallet devs

* Move create_nonce into Keychain trait

* Replace match by map_err

* Add flag to Slate to skip fee check

* Fix secp dependency

* Remove check_fee flag in Slate

* Add Japanese edition of build.md (#2697)

* catch the panic to avoid peer thread quit early (#2686)

* catch the panic to avoid peer thread quit before taking the chance to ban
* move catch wrapper logic down into the util crate
* log the panic info
* keep txhashset.rs untouched
* remove a warning

* [DOC] dandelion.md, simulation.md ,fast-sync.md and pruning.md documents translate in Korean. (#2678)

* Show response code in API client error message (#2683)

It's hard to investigate what happens when an API client error is
printed out

* Add some better logging for get_outputs_by_id failure states (#2705)

* Switch commitment doc fixes (#2645)

Fix some typos and remove the use of parentheses in a
couple of places to make the reading flow a bit better.

* docs: update/add new README.md badges (#2708)

Replace existing badges with SVG counterparts and add a bunch of new ones.

* Update intro.md (#2702)

Add mention of censoring attack prevented by range proofs

* use sandbox folder for txhashset validation on state sync (#2685)

* use sandbox folder for txhashset validation on state sync

* rustfmt

* use temp directory as the sandbox instead actual db_root txhashset dir

* rustfmt

* move txhashset overwrite to the end of full validation

* fix travis-ci test

* rustfmt

* fix: hashset have 2 folders including txhashset and header

* rustfmt

* 
(1)switch to rebuild_header_mmr instead of copy the sandbox header mmr 
(2)lock txhashset when overwriting and opening and rebuild

* minor improve on sandbox_dir

* add Japanese edition of state.md (#2703)

* Attempt to fix broken TUI locale (#2713)

Can confirm that on the same machine 1.0.2 TUI looks great and is broken on
the current master. Bump of `cursive` version fixed it for me.
Fixes #2676

* clean the header folder in sandbox (#2716)

* forgot to clean the header folder in sandbox in #2685

* Reduce number of unwraps in servers crate (#2707)

It doesn't include stratum server which is sufficiently changed in 1.1
branch and adapters, which is big enough for a separate PR.

* rustfmt

* change version to beta
2019-04-01 11:47:48 +01:00

79 lines
3.4 KiB
Markdown

# Pruning Blockchain Data
*Read this in other languages: [Korean](pruning_KR.md).*
One of the principal attractions of MimbleWimble is its theoretical space
efficiency. Indeed, a trusted or pre-validated full blockchain state only
requires unspent transaction outputs, which could be tiny.
The grin blockchain includes the following types of data (we assume prior
understanding of the MimbleWimble protocol):
1. Transaction outputs, which include for each output:
1. A Pedersen commitment (33 bytes).
2. A range proof (over 5KB at this time).
2. Transaction inputs, which are just output references (32 bytes).
3. Transaction "proofs", which include for each transaction:
1. The excess commitment sum for the transaction (33 bytes).
2. A signature generated with the excess (71 bytes average).
4. A block header includes Merkle trees and proof of work (about 250 bytes).
Assuming a blockchain of a million blocks, 10 million transactions (2 inputs, 2.5
outputs average) and 100,000 unspent outputs, we get the following approximate
sizes with a full chain (no pruning, no cut-through):
* 128GB of transaction data (inputs and outputs).
* 1 GB of transaction proof data.
* 250MB of block headers.
* Total chain size around 130GB.
* Total chain size, after cut-through (but incl. headers) of 1.8GB.
* UTXO size of 520MB.
* Total chain size, without range proofs of 4GB.
* UTXO size, without range proofs of 3.3MB.
We note that out of all that data, once the chain has been fully validated, only
the set of UTXO commitments is strictly required for a node to function.
There may be several contexts in which data can be pruned:
* A fully validating node may get rid of some data it has already validated to
free space.
* A partially validating node (similar to SPV) may not be interested in either
receiving or keeping all the data.
* When a new node joins the network, it may temporarily behave as a partially
validating node to make it available for use faster, even if it ultimately becomes
a fully validating node.
## Validation of Fully Pruned State
Pruning needs to remove as much data as possible while keeping all the
guarantees of a full MimbleWimble-style validation. This is necessary to keep
a pruning node state's sane, but also on first fast sync, where only the
minimum amount of data is sent to a new node.
The full validation of the chain state requires that:
* All kernel signatures verify against their public keys.
* The sum of all UTXO commitments, minus the supply is a valid public key (can
be used to sign the empty string).
* The sum of all kernel pubkeys equals the sum of all UTXO commitments, minus
the supply.
* The root hashes of the UTXO PMMR, the range proofs PMMR and the kernels MMR
match a block header with a valid Proof of Work chain.
* All range proofs are valid.
In addition, while not necessary to validate the full chain state, to be able
to accept and validate new blocks additional data is required:
* The output features, making the full output data necessary for all UTXOs.
At minimum, this requires the following data:
* The block headers chain.
* All kernels, in order of inclusion in the chain. This also allows the
reconstruction of the kernel MMR.
* All unspent outputs.
* The UTXO MMR and the range proof MMR (to learn the hashes of pruned data).
Note that further pruning could be obtained by requiring the validation of
only a subset of the range proofs, chosen randomly by the validating node.