Commit d3dbafa80b "Use blocking IO in P2P to reduce CPU load" (merged
into v2.1.0) introduced the constant IO_TIMEOUT, setting it to 1 second.
On nodes with high-latency connections, this short timeout causes the
txhashset archive download during step 2 of the IBD process to
invariably fail before it completes. Since there's no mechanism for
resuming a failed download, this means the node gets stuck at this stage
and never syncs.
Increasing IO_TIMEOUT to 10 seconds solves the issue on my node; others
might suggest a more optimal value for the constant.
* introduce protocol version to deserialize and read
* thread protocol version through our reader
* cleanup
* cleanup
* streaming_reader cleanup
* Pass protocol version into BinWriter to allow for version specific serialization rules.
* rustfmt
* read and write now protocol version specific
When we send a txhashet archive a peer's thread is busy with sending it
and can't send other messages, eg pings. If the network connection is
slow buffer capacity 10 may be not enough, hence the peer's drop.
Safer attempt to address #2929 in 2.0.0
* introduce protocol version to deserialize and read
* thread protocol version through our reader
* example protocol version access in kernel read
* fix our StreamingReader impl (WouldBlock woes)
* debug log progress of txhashset download
I made an suboptimal (aka stupid) decision to stop and wait for peers
one by one which makes shutdown very slow - O(n). This PR decouples sending
stop signal from waiting a thread to exit. On top of it in Peers we
first send stop signal to all peers and only after that start waiting
for them to exit. It gives us a constant time of shutdown in most of the
cases.
* connection no longer wrapped in an Option in peer
* introduce peer.send()
* remove some Arc indirection
* self.send() cleanup
* extract Peer:new() from connect and accept
* fixup
* cleanup
It turns out that we drop connection if we fail to process a message
because of chain/store/internal error, eg we have a header already, so
we refuse it and drop the peer.
This pr doesn't forward this error to the peer error channel so the
connection will not be dropped.
* headers msg is now "streamed" off the tcp stream
* rustfmt
* cleanup
* move StreamingReader into ser.rs
extract read_exact out into util crate
* rustfmt
* do not treat txhashset.zip download as abusive behavior
* count times, not bytes so we exclude quiet increments
* use inc_quiet when tracking sent bytes via an attachment
* add comment
* fixup "quiet" counter entries
* rustfmt
* use FixedLength to define serialized size in bytes of various structs
replace usages of mem::size_of() with ::LEN so we correctly calculate serialized sizes in bytes
* rustfmt
* Replace logging backend to flexi-logger and add log rotation
* Changed flexi_logger to log4rs
* Disable logging level filtering in Root logger
* Support different logging levels for file and stdout
* Don't log messages from modules other than Grin-related
* Fix formatting
* Place backed up compressed log copies into log file directory
* Increase default log file size to 16 MiB
* Add comment to config file on log_max_size option
* Add peers used bandwidth calculation and display in TUI
* Fix formatting
* Change Mutex to RwLock from peer's used bandwidth statistics in Tracker
* Make used bandwidth column in TUI peers list sort by sum of bytes
* improve: HeaderSync optimization (#1372)
* remove get_locator() optimization, which should be an independent pr for security review
* refactoring: move 'headers_streaming_body()' from Message to Protocol
* move 2 headers utils functions out of Protocol, and remove 'pub'
* support reading variable size of BlockHeader, from Cuckoo30 to Cuckoo36
* fix: use global::min_sizeshift() instead of hardcoded 30, because Cuckoo10 will be used for AutomatedTesting chain
* fix: should use global::proofsize() instead of hardcoded 42 when calculate serialized_size_of_header
* replace another 42 with global::proofsize()
* Fix and cleanup of fast sync triggering logic
* New txhashset on fast sync has to be applied, not rolled back
* Do not block if peer send buffer is full, fixes#912
* move some debug! to trace!
* more informative debugs
* standardising on always showing chain tips as "cumulative difficulty @ height [hash]"
* make 2 debug outputs into a single
* "no peers" as warning (not info) to let it stand out more clearly
* move fn param (used only in this one debug line)
* clarify difficulty "units"
* Util to zip and unzip directories
* First pass at sumtree request/response. Add message types, implement the exchange in the protocol, zip up the sumtree directory and stream the file over, with necessary adapter hooks.
* Implement the sumtree archive receive logicGets the sumtree archive data stream from the network and write it to a file. Unzip the file, place it at the right spot and reconstruct the sumtree data structure, rewinding where to the right spot.
* Sumtree hash structure validation
* Simplify sumtree backend buffering logic. The backend for a sumtree has to implement some in-memory buffering logic to provide a commit/rollback interface. The backend itself is an aggregate of 3 underlying storages (an append only file, a remove log and a skip list). The buffering was previously implemented both by the backend and some of the underlying storages. Now pushing back all buffering logic to the storages to keep the backend simpler.
* Add kernel append only store file to sumtrees. The chain sumtrees structure now also saves all kernels to a dedicated file. As that storage is implemented by the append only file wrapper, it's also rewind-aware.
* Full state validation. Checks that:
- MMRs are sane (hash and sum each node)
- Tree roots match the corresponding header
- Kernel signatures are valid
- Sum of all kernel excesses equals the sum of UTXO commitments
minus the supply
* Fast sync handoff to body sync. Once the fast-sync state is fully setup, get bacj in body sync
mode to get the full bodies of the last blocks we're missing.
* First fully working fast sync
* Facility in p2p conn to deal with attachments (raw binary after message).
* Re-introduced sumtree send and receive message handling using the above.
* Fixed test and finished updating all required db state after sumtree validation.
* Massaged a little bit the pipeline orphan check to still work after the new sumtrees have been setup.
* Various cleanup. Consolidated fast sync and full sync into a single function as they're very similar. Proper conditions to trigger a sumtree request and some checks on receiving it.