Bitten by Undefined Behaviour
When packaging Alexandria for
Fedora, starting with Fedora 35 I started having failures
only on the s390x
platform.
After pruning the failing test as much as I could, I reduced the problem to a few lines like:
double variable = 123.;
assert(Elements::isEqual(123., variable));
In fact, with a snippet like this it would start failing also
on x86_64
, but only when compiling with link-time optimizations (-flto
)!
Long story short, isEqual
has undefined behaviour
when it casts from double to UInt
:
using Bits = typename TypeWithSize<sizeof(RawType)>::UInt;
Bits x_bits = *reinterpret_cast<const Bits*>(&x);
Note that the UB behaviour is not the pointer cast from double*
to UInt*
,
but the indirection of the latter.
What is interesting is that this is an example of “nasal demons”. Depending on where and how the code is called, optimized and linked the results vary wildly.
- In some cases, when
isEqual
is visible on the same translation unit, the optimizer will be able to aggresively optimize away the call toElements::isEqual(123., variable)
, since it figures out it is a tautology and replaces it withtrue
. - When it is not, on one translation unit the compiler has no idea what goes on inside the call, so it will push the two values into the stack and call the function. On the other side, the code will execute as one would (but should not) expect.
- With link-time optimization, the compiler will be able to peek at what
isEqual
is doing. Due to strict aliasing rules, it will asume that the pointer toUInt
has nothing to do with the pointer todouble
. It will conclude that thedouble
is not used, and just skip pushing the values into the stack.
Why did it originally fail only on s390x
? The actual value being compared
was 0.
. Just our of sheer luck the stack happened to be 0 initialized on
other platforms and it didn’t matter the caller was not pushing the values
into the stack.
I wonder why UndefinedBehaviorSanitizer
didn’t see this, though…
My guess is that the pointer casting is defined and the pointer indirection is
also defined. What is undefined is what happens if two pointers with different
types point to the same address.
P.S The fix is to use memcpy
or std::bit_cast
starting from C++20.
#include <bit>
#include <cstdint>
#include <cstring>
uint64_t bitcast_undefined(const double& v) {
return *reinterpret_cast<const uint64_t*>(&v);
}
uint64_t bitcast_memcpy(const double& v) {
uint64_t dst;
memcpy(&dst, &v, sizeof(dst));
return dst;
}
uint64_t bistcast_cpp20(const double& v) {
return std::bit_cast<uint64_t>(v);
}
The assembler code generated by the three versions are identical,
(for >= -O1
) but the first version has undefined behavior.