C++20 Modules support in SonarQube

Since the end of October 2024, SonarQube Cloud (aka SonarCloud) supports C++20 modules!

Let’s see what copilot has to say about C++20 modules, I am lazy as that:

C++20 modules are a feature introduced in the C++20 standard to improve the modularity and compilation speed of C++ programs. They provide a way to organize and encapsulate code, reducing dependencies and improving build times by allowing the compiler to process modules independently. Modules replace the traditional header files and include guards, offering a more efficient and reliable mechanism for code reuse and distribution.

Pretty much.

When analyzing C++ code, textual inclusion is pretty handy. It makes each individual translation unit look like one big source file, where every declaration is available. But the catch is, we have to reparse everything every time we analyze the code, unless we use Precompiled Headers (PCHs).

With modules, we can skip the repetitive parsing! Just import a binary version of the AST for the module and you’re good!

Except… it is not that easy, of course.

One does not simply deserialize the Binary Module Interface (BMI) created by the compiler. Each compiler has its own format for serializing the AST, which is tightly coupled with their internal representation.

On top of that, the binary representation isn’t stable between different versions of the compiler (even minor patches!). It can also change based on compilation flags like macros, type sizes, and feature flags.

So, we need to create our own BMI that matches the internal representation of our (patched) version of Clang, while also respecting the project’s compilation flags.

Ok, let’s do that… oh wait. Of course:

// a.cpp
import foo;

// foo.cppm
export module foo;
export import bar;

// bar.cppm
export module bar;

We need to respect the dependency order 🤦

Without modules, the CFamily analyzer didn’t have to worry about dependencies when analyzing C++ code. Each Translation Unit (TU) was self-contained because of the “textual inclusion” we talked about earlier. This meant we could analyze everything in parallel, using as many cores as we wanted. But now, things have changed! We can’t analyze a.cpp until we’ve generated a BMI for foo.cppm, and we can’t do that until we’ve generated a BMI for bar.cppm.

So, to support C++20 modules, we need to create a dependency graph of the project we’re analyzing. Then, we traverse this graph and generate the BMIs in the right order.

Ok, so first we need to scan every source file and see what do they export and import, so we can add the edges between TU’s.

Another interesting challenge is handling disjoint sets of dependencies. Ideally, we want to analyze the code incrementally. We need to flag source files as “changed” if they have been modified, “needs rebuild” if they are a BMI required by a changed source file, and “unchanged” if no action is needed. By identifying and isolating these independent sets, we can avoid unnecessary recompilation and speed up the analysis process. This incremental approach ensures that we only rebuild and reanalyze the parts of the code that have actually been modified, making the whole process much more efficient.

Once this is done, then we can analyze the code.

In summary, our analyzer has basically turned into a mini build system. It now tracks dependencies and only builds what needs to be built. This means we scan all source files to figure out what they export and import, create a dependency graph, and generate BMIs in the correct order. By doing this, we can handle the complexities of C++20 modules, respect the dependency order, and even support incremental analysis to avoid unnecessary recompilation.


*Disclaimer: The content of this article was first written by hand, and then rephrased with GitHub Copilot as an experiment 🤖.