optimization: bulk reads(32%)/writes(298%) in [undo]block [de]serialization, ~6% faster IBD
Currently, obfuscation operations are performed byte-by-byte during serialization.
Buffering the reads allows batching these operations (implemented in https://github.com/bitcoin/bitcoin/pull/31144) and improves file access efficiency by reducing fread
calls and associated locking overhead. But even without that change, batching reads/writes (instead of just relying on the OS buffering them) has a measurable speed increase for block reading/writing.
For writes, buffering enables batched obfuscations directly on the internal buffer (avoiding the need to copy input spans for obfuscation). Also, for writes the Xor key offsets are now calculated based on the file position, and the batched obfuscation is applied before writing to disk only.
Microbenchmarks for [Read|Save]BlockBench
show a ~32%/298% speedup with Clang
, and ~24%/31% with GCC
(the followup XOR batching improves these further):
C++ compiler .......................... AppleClang 16.0.0.16000026
Before:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
2,289,743.62 | 436.73 | 0.3% | 11.03 | ReadBlockBench |
5,267,613.94 | 189.84 | 1.0% | 11.05 | SaveBlockBench |
After:
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
1,724,703.14 | 579.81 | 0.4% | 11.06 | ReadBlockBench |
1,767,367.40 | 565.81 | 1.6% | 10.86 | SaveBlockBench |
C++ compiler .......................... GNU 13.3.0
Before:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
7,786,309.20 | 128.43 | 0.0% | 70,832,812.80 | 23,803,523.16 | 2.976 | 5,073,002.56 | 0.4% | 10.72 | ReadBlockBench |
4,128,530.90 | 242.22 | 3.8% | 19,358,001.33 | 8,601,983.31 | 2.250 | 3,079,334.76 | 0.4% | 10.64 | SaveBlockBench |
After:
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
6,272,557.28 | 159.42 | 0.0% | 63,251,231.42 | 19,739,780.92 | 3.204 | 3,589,886.66 | 0.3% | 10.57 | ReadBlockBench |
3,130,556.05 | 319.43 | 4.7% | 17,305,378.56 | 6,457,946.37 | 2.680 | 2,579,854.87 | 0.3% | 10.83 | SaveBlockBench |
2 full IBD runs against master (compiled with GCC where the gains seem more modest) for 870k blocks (seeded from real nodes) indicates a ~6% total speedup.
Details
hyperfine --runs 2 --export-json /mnt/my_storage/ibd-xor-buffered.json --parameter-list COMMIT d73f37dda221835b5109ede1b84db2dc7c4b74a1,6853b2740851befffa3ca0b24d94212ed1e48d66 --prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0'
Benchmark 1: COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
Time (mean ± σ): 40216.674 s ± 113.132 s [User: 51496.289 s, System: 3541.340 s]
Range (min … max): 40136.678 s … 40296.671 s 2 runs
Benchmark 2: COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
Time (mean ± σ): 37964.015 s ± 624.115 s [User: 49086.037 s, System: 3375.072 s]
Range (min … max): 37522.699 s … 38405.331 s 2 runs
Summary
COMMIT=6853b2740851befffa3ca0b24d94212ed1e48d66 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0 ran
1.06 ± 0.02 times faster than COMMIT=d73f37dda221835b5109ede1b84db2dc7c4b74a1 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=870000 -printtoconsole=0
And rerun the whole thing after rebasing (with 3 runs for stability, since this is an IBD with real nodes):
COMMITS="5acf12bafeb126f2190b3f401f95199e0eea90c9 0e9fb2ed330960c0e4cd36b077c64ac7d0f84240"; \
STOP_HEIGHT=880000; DBCACHE=30000; \
hyperfine \
--runs 3 \
--parameter-list COMMIT ${COMMITS/ /,} \
--prepare "rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF -DCMAKE_C_COMPILER=$C_COMPILER -DCMAKE_CXX_COMPILER=$CXX_COMPILER && cmake --build build -j$(nproc) --target bitcoind && ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=1 -printtoconsole=0 || true" \
--cleanup "mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s).log" \
"COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -printtoconsole=0"
Benchmark 1: COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0
Time (mean ± σ): 39757.551 s ± 258.042 s [User: 52930.007 s, System: 2104.410 s]
Range (min … max): 39469.858 s … 39968.556 s 3 runs
Benchmark 2: COMMIT=0e9fb2ed330960c0e4cd36b077c64ac7d0f84240 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0
Time (mean ± σ): 37567.700 s ± 540.547 s [User: 50713.745 s, System: 1904.750 s]
Range (min … max): 37168.815 s … 38182.906 s 3 runs
Summary
COMMIT=0e9fb2ed330960c0e4cd36b077c64ac7d0f84240 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 ran
1.06 ± 0.02 times faster than COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0
Doin only reindex-chainstate
runs (i.e. without writing [undo]blocks) until 880k with GCC results in a ~4% speedup:
Details
COMMITS=5acf12bafeb126f2190b3f401f95199e0eea90c9,97b4a50c714c801e08bc02648a9c259f284069c2;
STOP_HEIGHT=880000; DBCACHE=30000; \
hyperfine --runs 2 --parameter-list COMMIT $COMMITS \
--prepare "rm -rf /mnt/my_storage/BitcoinData/debug.log && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=10000 -printtoconsole=0 || true" \
--cleanup "mv /mnt/my_storage/BitcoinData/debug.log /mnt/my_storage/logs/debug-{COMMIT}-$(date +%s).log" \
"COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE -printtoconsole=0 -reindex-chainstate -connect=0"
Benchmark 1: COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0
Time (mean ± σ): 19907.833 s ± 44.135 s [User: 41042.962 s, System: 867.501 s]
Range (min … max): 19876.625 s … 19939.041 s 2 runs
Benchmark 2: COMMIT=97b4a50c714c801e08bc02648a9c259f284069c2 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0
Time (mean ± σ): 19097.030 s ± 26.665 s [User: 40017.301 s, System: 739.273 s]
Range (min … max): 19078.174 s … 19115.885 s 2 runs
Summary
COMMIT=97b4a50c714c801e08bc02648a9c259f284069c2 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0 ran
1.04 ± 0.00 times faster than COMMIT=5acf12bafeb126f2190b3f401f95199e0eea90c9 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=880000 -dbcache=30000 -printtoconsole=0 -reindex-chainstate -connect=0