Debugging `clang` with `rr`
A couple of months ago, I set out to debug a tricky issue that caused crashes in clang
when compiling mp-units
.
The bug manifests as a non-deterministic stack overflow and, sometimes, false diagnostics.
The problem originated from the unexpected interaction between two components: ASTContext::getAutoTypeInternal
and llvm::FoldingSetBase::FindNodeOrInsertPos
.
The smallest reproducer that triggers the bug looks like this:
template <typename>
concept C1 = true;
template <typename, auto>
concept C2 = true;
template <C1 auto V, C2<V> auto>
struct S;
When defining the template S
, we use two non-type template parameters: V
and an unnamed parameter (let’s call it X
). These parameters are stored in the AutoTypes
member of ASTContext
, which was originally a llvm::ContextualFoldingSet
. During this storing process, a FoldingSetID
is generated using various pieces of information, including the value of a pointer (to a type, IIRC). This pointer can vary between runs due to ASLR, leading to different hash values and potentially placing X
and V
in the same bucket.
This situation wouldn’t be problematic if llvm::FoldingSetBase
stored the FoldingSetID
of each entry. However, it doesn’t. Instead, it recalculates the FoldingSetID
each time it needs to compare entries. When the calculation involves an auto type, it triggers a recursive call to ASTContext::getAutoTypeInternal
, which in turn calls (several frames after) llvm::FoldingSetBase::FindNodeOrInsertPos
again. This recursive loop continues until it causes a stack overflow, crashing clang
.
The tricky part of debugging this issue was its random nature, happening in only about 10% of the runs. Even with gdb
attached, there was a chance of mis-stepping and causing the crash, requiring multiple reruns to catch another failure.
This is where rr
came in handy. By running rr
in a loop until a crash happened, I could consistently capture the failure. The loop looked something like this:
while true; do
rr record ./llvm/cmake-linux-debug/bin/clang -std=c++20 crash.cpp -c
if [ $? -ne 0 ]; then
break;
fi
rr rm clang-0
done
Once I captured a crash, the execution was recorded! I could use rr replay clang-0
to replay the execution as many times as needed, with the same outcome each time.
Additionally, with commands like reverse-continue
, even if I made a mistake and caused the crash, I could jump back in time to before the function call and continue debugging as if nothing had happened.
rr
proved to be an invaluable tool, and I regret not discovering it sooner, especially considering it has been around for over a decade.