Clang Offload Wrapper

Introduction

This tool is used in OpenMP offloading toolchain to embed device code objects (usually ELF) into a wrapper host llvm IR (bitcode) file. The wrapper host IR is then assembled and linked with host code objects to generate the executable binary. See Image Binary Embedding and Execution for OpenMP for more details.

Usage

This tool can be used as follows:

$ clang-offload-wrapper -help
OVERVIEW: A tool to create a wrapper bitcode for offload target binaries.
Takes offload target binaries as input and produces bitcode file containing
target binaries packaged as data and initialization code which registers
target binaries in offload runtime.
USAGE: clang-offload-wrapper [options] <input files>
OPTIONS:
Generic Options:
  --help                             - Display available options (--help-hidden for more)
  --help-list                        - Display list of available options (--help-list-hidden for more)
  --version                          - Display the version of this program
clang-offload-wrapper options:
  -o=<filename>                      - Output filename
  --target=<triple>                  - Target triple for the output module

Example

clang-offload-wrapper -target host-triple -o host-wrapper.bc gfx90a-binary.out

OpenMP Device Binary Embedding

Various structures and functions used in the wrapper host IR form the interface between the executable binary and the OpenMP runtime.

Enum Types

Offloading Declare Target Flags Enum lists different flag for offloading entries.

Offloading Declare Target Flags Enum
Name Value Description
OMP_DECLARE_TARGET_LINK 0x01 Mark the entry as having a ‘link’ attribute (w.r.t. link clause)
OMP_DECLARE_TARGET_CTOR 0x02 Mark the entry as being a global constructor
OMP_DECLARE_TARGET_DTOR 0x04 Mark the entry as being a global destructor

Structure Types

__tgt_offload_entry structure, __tgt_device_image structure, and __tgt_bin_desc structure are the structures used in the wrapper host IR.

__tgt_offload_entry structure
Type Identifier Description
void* addr Address of global symbol within device image (function or global)
char* name Name of the symbol
size_t size Size of the entry info (0 if it is a function)
int32_t flags Flags associated with the entry (see Offloading Declare Target Flags Enum)
int32_t reserved Reserved, to be used by the runtime library.
__tgt_device_image structure
Type Identifier Description
void* ImageStart Pointer to the target code start
void* ImageEnd Pointer to the target code end
__tgt_offload_entry* EntriesBegin Begin of table with all target entries
__tgt_offload_entry* EntriesEnd End of table (non inclusive)
__tgt_bin_desc structure
Type Identifier Description
int32_t NumDeviceImages Number of device types supported
__tgt_device_image* DeviceImages Array of device images (1 per dev. type)
__tgt_offload_entry* HostEntriesBegin Begin of table with all host entries
__tgt_offload_entry* HostEntriesEnd End of table (non inclusive)

Global Variables

Global Variables lists various global variables, along with their type and their explicit ELF sections, which are used to store device images and related symbols.

Global Variables
Variable Type ELF Section Description
__start_omp_offloading_entries __tgt_offload_entry .omp_offloading_entries Begin symbol for the offload entries table.
__stop_omp_offloading_entries __tgt_offload_entry .omp_offloading_entries End symbol for the offload entries table.
__dummy.omp_offloading.entry __tgt_offload_entry .omp_offloading_entries Dummy zero-sized object in the offload entries section to force linker to define begin/end symbols defined above.
.omp_offloading.device_image __tgt_device_image .omp_offloading_entries ELF device code object of the first image.
.omp_offloading.device_image.N __tgt_device_image .omp_offloading_entries ELF device code object of the (N+1)th image.
.omp_offloading.device_images __tgt_device_image .omp_offloading_entries Array of images.
.omp_offloading.descriptor __tgt_bin_desc .omp_offloading_entries Binary descriptor object (see details below).

Binary Descriptor for Device Images

This object is passed to the offloading runtime at program startup and it describes all device images available in the executable or shared library. It is defined as follows:

__attribute__((visibility("hidden")))
extern __tgt_offload_entry *__start_omp_offloading_entries;
__attribute__((visibility("hidden")))
extern __tgt_offload_entry *__stop_omp_offloading_entries;
static const char Image0[] = { <Bufs.front() contents> };
...
static const char ImageN[] = { <Bufs.back() contents> };
static const __tgt_device_image Images[] = {
  {
    Image0,                            /*ImageStart*/
    Image0 + sizeof(Image0),           /*ImageEnd*/
    __start_omp_offloading_entries,    /*EntriesBegin*/
    __stop_omp_offloading_entries      /*EntriesEnd*/
  },
  ...
  {
    ImageN,                            /*ImageStart*/
    ImageN + sizeof(ImageN),           /*ImageEnd*/
    __start_omp_offloading_entries,    /*EntriesBegin*/
    __stop_omp_offloading_entries      /*EntriesEnd*/
  }
};
static const __tgt_bin_desc BinDesc = {
  sizeof(Images) / sizeof(Images[0]),  /*NumDeviceImages*/
  Images,                              /*DeviceImages*/
  __start_omp_offloading_entries,      /*HostEntriesBegin*/
  __stop_omp_offloading_entries        /*HostEntriesEnd*/
};

Global Constructor and Destructor

Global constructor (.omp_offloading.descriptor_reg()) registers the library of images with the runtime by calling __tgt_register_lib() function. The cunstructor is explicitly defined in .text.startup section. Similarly, global destructor (.omp_offloading.descriptor_unreg()) calls __tgt_unregister_lib() for the unregistration and is also defined in .text.startup section.

Image Binary Embedding and Execution for OpenMP

For each offloading target, device ELF code objects are generated by clang, opt, llc, and lld pipeline. These code objects are passed to the clang-offload-wrapper.

  • At compile time, the clang-offload-wrapper tool takes the following actions:
  • At execution time:
    • The global constructor gets run and it registers the device image.