RFC: optimize `CheckBlock` input checks (duplicate detection & nulls)
This PR is in draft mode to gather comments since it contains consensus code refactorings.
CheckBlock
's latency is critical for efficiently validating correct inputs during transaction validation, including mempool acceptance and new block creation.
This PR improves performance and maintainability by introducing the following changes:
- Simplified checks for the most common cases (1 or 2 inputs - 70-90% of transactions have a single input).
- Optimized the general case by replacing
std::set
with sortedstd::vector
for improved locality. - Simplified Null prevout checks from linear to constant time.
The goal of this change is to make block validation faster via a low-risk alternative.
C++ compiler .......................... AppleClang 16.0.0.16000026
Before:
ns/block | block/s | err% | total | benchmark |
---|---|---|---|---|
372,743.63 | 2,682.81 | 1.1% | 10.99 | CheckBlockBench |
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
3,304,694.54 | 302.60 | 0.5% | 11.05 | DuplicateInputs |
9,585.10 | 104,328.55 | 0.1% | 11.03 | ProcessTransactionBench |
After:
ns/block | block/s | err% | total | benchmark |
---|---|---|---|---|
179,971.00 | 5,556.45 | 0.3% | 11.02 | CheckBlockBench |
ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|
963,177.98 | 1,038.23 | 1.7% | 10.92 | DuplicateInputs |
9,410.90 | 106,259.75 | 0.3% | 11.01 | ProcessTransactionBench |
- 2.07x faster
CheckBlockBench
- 3.4x faster
DuplicateInputs
- 1.8% faster
ProcessTransactionBench
C++ compiler .......................... GNU 13.3.0
Before:
ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
1,096,261.84 | 912.19 | 0.1% | 7,963,390.88 | 3,487,375.26 | 2.283 | 1,266,941.00 | 1.8% | 11.03 | CheckBlockBench |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
8,366,309.48 | 119.53 | 0.0% | 23,865,177.67 | 26,620,160.23 | 0.897 | 5,972,887.41 | 4.0% | 10.78 | DuplicateInputs |
56,199.57 | 17,793.73 | 0.1% | 229,263.01 | 178,766.31 | 1.282 | 15,509.97 | 0.5% | 10.91 | ProcessTransactionBench |
After:
ns/block | block/s | err% | ins/block | cyc/block | IPC | bra/block | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
834,855.94 | 1,197.81 | 0.0% | 6,518,548.86 | 2,656,039.78 | 2.454 | 919,160.84 | 1.5% | 10.78 | CheckBlockBench |
ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |
---|---|---|---|---|---|---|---|---|---|
4,261,492.75 | 234.66 | 0.0% | 17,379,823.40 | 13,559,793.33 | 1.282 | 4,265,714.28 | 3.4% | 11.00 | DuplicateInputs |
55,819.53 | 17,914.88 | 0.1% | 227,828.15 | 177,520.09 | 1.283 | 15,184.36 | 0.4% | 10.91 | ProcessTransactionBench |
- 31% faster
CheckBlockBench
- 2x faster
DuplicateInputs
- 0.5% faster
ProcessTransactionBench
While the point of the change isn't necessarily to speed up IBD, but you can see the measurements at https://github.com/bitcoin/bitcoin/pull/31682#issuecomment-2606765689
Related PRs:
- https://github.com/bitcoin/bitcoin/pull/14397 - very similar solution, I only found it after pushing the PR (was closed for inactivity)
- https://github.com/bitcoin/bitcoin/pull/14837 - a faster, but a lot more complex alternative (was closed for complexity and inactivity).