Skip to content

tracing: lock contention analysis

Currently there is a macro DEBUG_LOCKCONTENTION (see developer-notes.md) to enable logging of contention. The disadvantage of this method is that bitcoind needs to be especially compiled with that enabled, and when enabled this quickly produces huge log files, and has a relatively large overhead.

This PR introduces some tracepoints and adds a tracing script lock_contention_analysis.bt that can be started at any time while bitcoind is running. This has low overhead and gives better analysis possibilities.

When the script is attached with sudo bpftrace contrib/tracing/lock_contention_analysis.bt, you'll get a markdown formatted output that looks like this:

timestamp Lock waiting duration waiting at blocked by
12:41:05.512089 m_mutex 4491µs ./checkqueue.h ( 175) ./checkqueue.h ( 78)
12:41:05.861322 cs_main 535058µs net_processing.cpp (1607) validation.cpp (2994)
12:41:06.655427 m_mutex 4012µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)
12:41:06.675967 cs_main 650117µs net_processing.cpp (1607) validation.cpp (2994)
12:41:06.894090 m_mutex 2747µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)
12:41:06.961872 cs_main 142277µs net_processing.cpp (1607) validation.cpp (2994)
12:41:07.363535 cs_main 395851µs net_processing.cpp (1607) validation.cpp (2994)
12:41:07.583226 cs_main 69281µs net_processing.cpp (1607) validation.cpp (2994)
12:41:07.746665 cs_main 157604µs net_processing.cpp (1607) validation.cpp (2335)
12:41:07.958787 m_mutex 3041µs ./checkqueue.h ( 175) ./checkqueue.h ( 78)
12:41:07.958801 m_mutex 2970µs ./checkqueue.h ( 78) ./checkqueue.h ( 175)
12:41:07.958842 m_mutex 2956µs ./checkqueue.h ( 78) ./checkqueue.h ( 175)
12:41:07.958853 m_mutex 2898µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)
12:41:07.958862 m_mutex 2820µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)
12:41:07.959201 m_mutex 3028µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)
12:41:07.959218 m_mutex 2916µs ./checkqueue.h ( 78) ./checkqueue.h ( 78)

When monitoring is stopped (press Ctrl+C), the script prints the source locations where it has waited the longest while the script was running (unfortunately there seems to be no way to format this more nicely):

Monitored from 17:12:18.513400 to 17:42:08.565786. top 50 lock contention locations (file, lineNo): µs
[...]
@locked_at_source_line[./net.h, 1233]: 334654
@locked_at_source_line[txmempool.cpp, 823]: 401472
@locked_at_source_line[net_processing.cpp, 4504]: 661670
@locked_at_source_line[net_processing.cpp, 4872]: 668911
@locked_at_source_line[txmempool.cpp, 1096]: 681892
@locked_at_source_line[net_processing.cpp, 4857]: 825963
@locked_at_source_line[net_processing.cpp, 4724]: 826719
@locked_at_source_line[net_processing.cpp, 4677]: 827335
@locked_at_source_line[net_processing.cpp, 4154]: 827972
@locked_at_source_line[net_processing.cpp, 4222]: 828863
@locked_at_source_line[net.cpp, 2329]: 831334
@locked_at_source_line[net_processing.cpp, 4217]: 831451
@locked_at_source_line[net_processing.cpp, 4231]: 839408
@locked_at_source_line[net_processing.cpp, 4198]: 846353
@locked_at_source_line[util/system.cpp, 1035]: 883117
@locked_at_source_line[net.cpp, 1426]: 1398949
@locked_at_source_line[net.cpp, 1422]: 1416004
@locked_at_source_line[net.cpp, 1611]: 1564984
@locked_at_source_line[net_processing.cpp, 4205]: 1668173
@locked_at_source_line[net_processing.cpp, 1334]: 1704239
@locked_at_source_line[validation.cpp, 5173]: 2899013

For this to work I had to add traces to UniqueLock and AnnotatedMixin, and move the implementation into the cpp file (without moving, due to inlining the maximum number of traces of 512 is quickly exhausted)

Merge request reports

Loading