Clang vs Other Open Source Compilers
Building an entirely new compiler front-end is a big task, and it isn't always clear to people why we decided to do this. Here we compare clang and its goals to other open source compiler front-ends that are available. We restrict the discussion to very specific objective points to avoid controversy where possible. Also, software is infinitely mutable, so we don't talk about little details that can be fixed with a reasonable amount of effort: we'll talk about issues that are difficult to fix for architectural or political reasons.
The goal of this list is to describe how differences in goals lead to different strengths and weaknesses, not to make some compiler look bad. This will hopefully help you to evaluate whether using clang is a good idea for your personal goals. Because we don't know specifically what you want to do, we describe the features of these compilers in terms of our goals: if you are only interested in static analysis, you may not care that something lacks codegen support, for example.
Please email cfe-dev if you think we should add another compiler to this list or if you think some characterization is unfair here.
- Clang vs GCC (GNU Compiler Collection)
- Clang vs Elsa (Elkhound-based C++ Parser)
- Clang vs PCC (Portable C Compiler)
Pro's of GCC vs clang:
- GCC supports languages that clang does not aim to, such as Java, Ada, FORTRAN, etc.
- GCC supports more targets than LLVM.
- GCC supports many language extensions, some of which are not implemented by Clang. For instance, in C mode, GCC supports nested functions and has an undocumented extension allowing VLAs in structs.
Pro's of clang vs GCC:
- The Clang ASTs and design are intended to be easily understandable by anyone who is familiar with the languages involved and who has a basic understanding of how a compiler works. GCC has a very old codebase which presents a steep learning curve to new developers.
- Clang is designed as an API from its inception, allowing it to be reused by source analysis tools, refactoring, IDEs (etc) as well as for code generation. GCC is built as a monolithic static compiler, which makes it extremely difficult to use as an API and integrate into other tools. Further, its historic design and current policy makes it difficult to decouple the front-end from the rest of the compiler.
- Various GCC design decisions make it very difficult to reuse: its build system is difficult to modify, you can't link multiple targets into one binary, you can't link multiple front-ends into one binary, it uses a custom garbage collector, uses global variables extensively, is not reentrant or multi-threadable, etc. Clang has none of these problems.
- For every token, clang tracks information about where it was written and where it was ultimately expanded into if it was involved in a macro. GCC does not track information about macro instantiations when parsing source code. This makes it very difficult for source rewriting tools (e.g. for refactoring) to work in the presence of (even simple) macros. This appears to be partially or fully addressed in recent releases of GCC.
- Clang does not implicitly simplify code as it parses it like GCC does. Doing so causes many problems for source analysis tools: as one simple example, if you write "x-x" in your source code, the GCC AST will contain "0", with no mention of 'x'. This is extremely bad for a refactoring tool that wants to rename 'x'.
- Clang can serialize its AST out to disk and read it back into another program, which is useful for whole program analysis. GCC does not have this. GCC's PCH mechanism (which is just a dump of the compiler memory image) is related, but is architecturally only able to read the dump back into the exact same executable as the one that produced it (it is not a structured format).
- Clang is much faster and uses far less memory than GCC.
- Clang aims to provide extremely clear and concise diagnostics (error and warning messages), and includes support for expressive diagnostics. GCC's warnings are sometimes acceptable, but are often confusing and it does not support expressive diagnostics. Clang also preserves typedefs in diagnostics consistently, showing macro expansions and many other features.
- GCC is licensed under the GPL license. clang uses a BSD license, which allows it to be embedded in software that is not GPL-licensed.
- Clang inherits a number of features from its use of LLVM as a backend, including support for a bytecode representation for intermediate code, pluggable optimizers, link-time optimization support, Just-In-Time compilation, ability to link in multiple code generators, etc.
- Clang's support for C++ is more compliant than GCC's in many ways.
- Clang supports many language extensions, some of which are not implemented by GCC. For instance, Clang provides attributes for checking thread safety and extended vector types.
Pro's of Elsa vs clang:
- Elsa's parser and AST is designed to be easily extensible by adding grammar rules. Clang has a very simple and easily hackable parser, but requires you to write C++ code to do it.
Pro's of clang vs Elsa:
- Clang's C and C++ support is far more mature and practically useful than Elsa's, and includes many C++'11 features.
- The Elsa community is extremely small and major development work seems to have ceased in 2005. Work continued to be used by other small projects (e.g. Oink), but Oink is apparently dead now too. Clang has a vibrant community including developers that are paid to work on it full time. In practice this means that you can file bugs against Clang and they will often be fixed for you. If you use Elsa, you are (mostly) on your own for bug fixes and feature enhancements.
- Elsa is not built as a stack of reusable libraries like clang is. It is very difficult to use part of Elsa without the whole front-end. For example, you cannot use Elsa to parse C/ObjC code without building an AST. You can do this in Clang and it is much faster than building an AST.
- Elsa does not have an integrated preprocessor, which makes it extremely difficult to accurately map from a source location in the AST back to its original position before preprocessing. Like GCC, it does not keep track of macro expansions.
- Elsa is even slower and uses more memory than GCC, which itself requires far more space and time than clang.
- Elsa only does partial semantic analysis. It is intended to work on code that is already validated by GCC, so it does not do many semantic checks required by the languages it implements.
- Elsa does not support Objective-C.
- Elsa does not support native code generation.
Pro's of PCC vs clang:
- The PCC source base is very small and builds quickly with just a C compiler.
Pro's of clang vs PCC:
- PCC dates from the 1970's and has been dormant for most of that time. The clang + llvm communities are very active.
- PCC doesn't support Objective-C or C++ and doesn't aim to support C++.
- PCC's code generation is very limited compared to LLVM. It produces very inefficient code and does not support many important targets.
- Like Elsa, PCC's does not have an integrated preprocessor, making it extremely difficult to use it for source analysis tools.