Integration of AST Hashing into the GCC compiler

Software written in a compiled language is normally build with the help of a build system. Initially, the build system invokes the compiler for every source file (.c file) in order to translate it to an object file. Later on, all object files are combined into a single binary. This partitioning of the source code into translation units allows the employment of incremental build: When a source file, or any of its included headers, are changed only the corresponding object file is rebuild.

However, no every textual change in the source file, or any of its included headers, leads to an altered binary. For example, adding a type declaration that is nowhere used in the translation unit leads do an redundant build operation; the resulting object file will not change.

In previous work, we have developed a semantic hashing algorithm on the abstract-syntax-tree level of the compiler to generate a fingerprint of the actual translation unit. In case of an unused declaration, the type definition is not included into the hash value. By comparing the hash value with the hash from the previous compilation, we can determine, if we can abort the compiler execution directly after the parsing step, before the costly compiler optimization phases are started.

Goal of this thesis is to integrate the hashing mechanism into the GCC compiler via its plugin interface. As an evaluation target, the plugin should be able to work on the Linux kernel.

Related Publications

USENIX Conference Best Paper Award
cHash: Detection of Redundant Compilations via AST Hashing.
Christian Dietrich, Valentin Rothberg, Ludwig Füracker, Andreas Ziegler, Daniel Lohmann; In: Proceedings of the 2017 USENIX Annual Technical Conference; USENIX Association, 2017. Best Paper Award.
[PDF] [URL] [Raw Data] [BibTex]