Multilib

Introduction

This document describes how multilib is implemented in Clang.

What is multilib and why might you care? If you’re cross compiling then you can’t use native system headers and libraries. To address this, you can use a combination of --sysroot, -isystem and -L options to point Clang at suitable directories for your target. However, when there are many possible directories to choose from, it’s not necessarily obvious which one to pick. Multilib allows a toolchain designer to imbue the toolchain with the ability to pick a suitable directory automatically, based on the options the user provides to Clang. For example, if the user specifies --target=arm-none-eabi -mcpu=cortex-m4 the toolchain can choose a directory containing headers and libraries suitable for Armv7E-M, because it knows that’s a suitable architecture for Arm Cortex-M4. Multilib can also choose between libraries for the same architecture based on other options. For example if the user specifies -fno-exceptions then a toolchain could select libraries built without exception support, thereby reducing the size of the resulting binary.

Design

Clang supports GCC’s -print-multi-lib and -print-multi-directory options. These are described in GCC Developer Options.

There are two ways to configure multilib in Clang: hard-coded or via a configuration file.

Hard-coded Multilib

The available libraries can be hard-coded in Clang. Typically this is done using the MultilibBuilder interface in clang/include/clang/Driver/MultilibBuilder.h. There are many examples of this in lib/Driver/ToolChains/Gnu.cpp. The remainder of this document will not focus on this type of multilib.

EXPERIMENTAL Multilib via configuration file

Some Clang toolchains support loading multilib configuration from a multilib.yaml configuration file.

A multilib.yaml configuration file specifies which multilib variants are available, their relative location, what compilation options were used to build them, and the criteria by which they are selected.

Multilib processing

Clang goes through the following steps to use multilib from a configuration file:

  1. Normalize command line options. Clang can accept the same information via different options - for example, --target=arm-none-eabi -march=armv7-m and --target=armv7m-none-eabi are equivalent. Clang normalizes the command line before passing them to the multilib system. To see what flags are emitted for a given set of command line options, use the -print-multi-flags-experimental command line option along with the rest of the options you want to use.

  2. Load multilib.yaml from sysroot.

  3. Generate additional flags. multilib.yaml contains a Mappings section, which specifies how to generate additional flags based on the flags derived from command line options. Flags are matched using regular expressions. These regular expressions shall use the POSIX extended regular expression syntax.

  4. Match flags against multilib variants. If the generated flags are a superset of the flags specified for a multilib variant then the variant is considered a match. If more than one variant matches then a toolchain may opt to either use only the last matching multilib variant, or may use all matching variants, thereby layering them.

  5. Generate -isystem and -L options. Iterate in reverse order over the matching multilib variants, and generate -isystem and -L options based on the multilib variant’s directory.

Multilib layering

When Clang selects multilib variants, it may find that more than one variant matches.

It is up to the ToolChain subclass to decide what to do in this case. There are two options permitted:

  1. Use only the last matching multilib variant. This option exists primarily for compatibility with the previous multilib design.

  2. Use all matching variants, thereby layering them.

This decision is hard-coded per ToolChain subclass. The latter option is preferred for ToolChain subclasses without backwards compatibility requirements.

If the latter option is chosen then -isystem and -L options will be generated for each matching multilib variant, in reverse order.

This means that the compiler or linker will find files in the last matching multilib variant that has the given file. This behaviour permits multilib variants with only a partial set of files. This means a toolchain can be distributed with one base multilib variant containing all system headers and includes, and more specialised multilib variants containing only files that are different to those in the base variant.

For example, a multilib variant could be compiled with -fno-exceptions. This option doesn’t affect the content of header files, nor does it affect the C libraries. Therefore if multilib layering is supported by the ToolChain subclass and a suitable base multilib variant is present then the -fno-exceptions multilib variant need only contain C++ libraries.

It is the responsibility of layered multilib authors to ensure that headers and libraries in each layer are complete enough to mask any incompatibilities.

Stability

Multilib via configuration file shall be considered an experimental feature until LLVM 18, at which point -print-multi-flags-experimental should be renamed to -print-multi-flags. A toolchain can opt in to using this feature by including a multilib.yaml file in its distribution, once support for it is added in relevant ToolChain subclasses. Once stability is reached, flags emitted by -print-multi-flags should not be removed or changed, although new flags may be added.

Restrictions

Despite the name, multilib is used to locate both include and lib directories. Therefore it is important that consistent options are passed to the Clang driver when both compiling and linking. Otherwise inconsistent include and lib directories may be used, and the results will be undefined.

EXPERIMENTAL multilib.yaml

The below example serves as a small of a possible multilib, and documents the available options.

For a more comprehensive example see clang/test/Driver/baremetal-multilib.yaml in the llvm-project sources.

# multilib.yaml

# This format is experimental and is likely to change!

# Syntax is YAML 1.2

# This required field defines the version of the multilib.yaml format.
# Clang will emit an error if this number is greater than its current multilib
# version or if its major version differs, but will accept lesser minor
# versions.
MultilibVersion: 1.0

# The rest of this file is in two parts:
# 1. A list of multilib variants.
# 2. A list of regular expressions that may match flags generated from
#    command line options, and further flags that shall be added if the
#    regular expression matches.
# It is acceptable for the file to contain properties not documented here,
# and these will be ignored by Clang.

# List of multilib variants. Required.
# The ordering of items in the variants list is important if more than one
# variant can match the same set of flags. See the docs on multilib layering
# for more info.
Variants:

# Example of a multilib variant targeting Arm v6-M.
# Dir is the relative location of the directory containing the headers
# and/or libraries.
# Exactly how Dir is used is left up to the ToolChain subclass to define, but
# typically it will be joined to the sysroot.
- Dir: thumb/v6-m
  # List of one or more normalized command line options, as generated by Clang
  # from the command line options or from Mappings below.
  # Here, if the flags are a superset of {target=thumbv6m-unknown-none-eabi}
  # then this multilib variant will be considered a match.
  Flags: [--target=thumbv6m-unknown-none-eabi]

# Similarly, a multilib variant targeting Arm v7-M with an FPU (floating
# point unit).
- Dir: thumb/v7-m
  # Here, the flags generated by Clang must be a superset of
  # {--target=thumbv7m-none-eabi, -mfpu=fpv4-sp-d16} for this multilib variant
  # to be a match.
  Flags: [--target=thumbv7m-none-eabi, -mfpu=fpv4-sp-d16]


# The second section of the file is a list of regular expressions that are
# used to map from flags generated from command line options to custom flags.
# This is optional.
# Each regular expression must match a whole flag string.
# Flags in the "Flags" list will be added if any flag generated from command
# line options matches the regular expression.
Mappings:

# Set a "--target=thumbv7m-none-eabi" flag if the regular expression matches
# any of the flags generated from the command line options.
# Match is a POSIX extended regular expression string.
- Match: --target=thumbv([7-9]|[1-9][0-9]+).*
  # Flags is a list of one or more strings.
  Flags: [--target=thumbv7m-none-eabi]

Design principles

Stable interface

multilib.yaml and -print-multi-flags-experimental are new interfaces to Clang. In order for them to be usable over time and across LLVM versions their interfaces should be stable. The new multilib system will be considered experimental in LLVM 17, but in LLVM 18 it will be stable. In particular this is important to which multilib selection flags Clang generates from command line options. Once a flag is generated by a released version of Clang it may be used in multilib.yaml files that exist independently of the LLVM release cycle, and therefore ceasing to generate the flag would be a breaking change and should be avoided.

However, an exception is the normalization of -march. -march for Arm architectures contains a list of enabled and disabled extensions and this list is likely to grow. Therefore -march flags are unstable.

Incomplete interface

The new multilib system does multilib selection based on only a limited set of command line options, and limits which flags can be used for multilib selection. This is in order to avoid committing to too large an interface. Later LLVM versions can add support for multilib selection from more command line options as needed.

Extensible

It is likely that the configuration format will need to evolve in future to adapt to new requirements. Using a format like YAML that supports key-value pairs helps here as it’s trivial to add new keys alongside existing ones.

Backwards compatibility

New versions of Clang should be able to use configuration written for earlier Clang versions. To avoid behaving in a way that may be subtly incorrect, Clang should be able to detect if the configuration is too new and emit an error.

Forwards compatibility

As an author of a multilib configuration, it should be possible to design the configuration in such a way that it is likely to work well with future Clang versions. For example, if a future version of Clang is likely to add support for newer versions of an architecture and the architecture is known to be designed for backwards compatibility then it should be possible to express compatibility for such architecture versions in the multilib configuration.

Not GNU spec files

The GNU spec files standard is large and complex and there’s little desire to import that complexity to LLVM. It’s also heavily oriented towards processing command line argument strings which is hard to do correctly, hence the large amount of logic dedicated to that task in the Clang driver. While compatibility with GNU would bring benefits, the cost in this case is deemed too high.

Avoid re-inventing feature detection in the configuration

A large amount of logic in the Clang driver is dedicated to inferring which architectural features are available based on the given command line options. It is neither desirable nor practical to repeat such logic in each multilib configuration. Instead the configuration should be able to benefit from the heavy lifting Clang already does to detect features.

Low maintenance

Multilib is a relatively small feature in the scheme of things so supporting it should accordingly take little time. Where possible this should be achieved by implementing it in terms of existing features in the LLVM codebase.

Minimal additional API surface

The greater the API surface, the greater the difficulty of keeping it stable. Where possible the additional API surface should be kept small by defining it in relation to existing APIs. An example of this is keeping a simple relationship between flag names and command line options where possible. Since the command line options are part of a stable API they are unlikely to change, and therefore the flag names get the same stability.

Low compile-time overhead

If the process of selecting multilib directories must be done on every invocation of the Clang driver then it must have a negligible impact on overall compile time.