OpenMP Support

Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64, PPC64[LE] and has basic support for Cuda devices.

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.

For the list of supported features from OpenMP 5.0 see OpenMP implementation details.

General improvements

Cuda devices support

Directives execution modes

Clang code generation for target regions supports two modes: the SPMD and non-SPMD modes. Clang chooses one of these two modes automatically based on the way directives and clauses on those directives are used. The SPMD mode uses a simplified set of runtime functions thus increasing performance at the cost of supporting some OpenMP features. The non-SPMD mode is the most generic mode and supports all currently available OpenMP features. The compiler will always attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:

  • The target region contains user code (other than OpenMP-specific directives) in between the target and the parallel directives.

Data-sharing modes

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions.

Features not supported or with limited support for Cuda devices

  • Cancellation constructs are not supported.
  • Doacross loop nest is not supported.
  • User-defined reductions are supported only for trivial types.
  • Nested parallelism: inner parallel regions are executed sequentially.
  • Static linking of libraries containing device code is not supported yet.
  • Automatic translation of math functions in target regions to device-specific math functions is not implemented yet.
  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

OpenMP 5.0 Implementation Details

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please contact openmp-dev at lists.llvm.org for more information or if you want to help with the implementation.

Category Feature Status Reviews
loop extension support != in the canonical loop form done D54441
loop extension #pragma omp loop (directive) worked on  
loop extension collapse imperfectly nested loop done  
loop extension collapse non-rectangular nested loop done  
loop extension C++ range-base for loop done  
loop extension clause: if for SIMD directives done  
loop extension inclusive scan extension (matching C++17 PSTL) done  
memory mangagement memory allocators done r341687,r357929
memory mangagement allocate directive and allocate clause done r355614,r335952
OMPD OMPD interfaces not upstream https://github.com/OpenMPToolsInterface/LLVM-openmp/tree/ompd-tests
OMPT OMPT interfaces mostly done  
thread affinity extension thread affinity extension done  
task extension taskloop reduction done  
task extension task affinity not upstream  
task extension clause: depend on the taskwait construct worked on  
task extension depend objects and detachable tasks done  
task extension mutexinoutset dependence-type for tasks done D53380,D57576
task extension combined taskloop constructs done  
task extension master taskloop done  
task extension parallel master taskloop done  
task extension master taskloop simd done  
task extension parallel master taskloop simd done  
SIMD extension atomic and simd constructs inside SIMD code done  
SIMD extension SIMD nontemporal done  
device extension infer target functions from initializers worked on  
device extension infer target variables from initializers worked on  
device extension OMP_TARGET_OFFLOAD environment variable done D50522
device extension support full ‘defaultmap’ functionality done D69204
device extension device specific functions done  
device extension clause: device_type done  
device extension clause: extended device done  
device extension clause: uses_allocators clause done  
device extension clause: in_reduction worked on r308768
device extension omp_get_device_num() worked on D54342
device extension structure mapping of references unclaimed  
device extension nested target declare done D51378
device extension implicitly map ‘this’ (this[:1]) done D55982
device extension allow access to the reference count (omp_target_is_present) worked on  
device extension requires directive partial  
device extension clause: unified_shared_memory done D52625,D52359
device extension clause: unified_address partial  
device extension clause: reverse_offload unclaimed parts D52780
device extension clause: atomic_default_mem_order done D53513
device extension clause: dynamic_allocators unclaimed parts D53079
device extension user-defined mappers worked on D56326,D58638,D58523,D58074,D60972,D59474
device extension mapping lambda expression done D51107
device extension clause: use_device_addr for target data done  
device extension support close modifier on map clause done D55719,D55892
device extension teams construct on the host device worked on Clang part is done, r371553.
device extension support non-contiguous array sections for target update done  
device extension pointer attachment unclaimed  
device extension map clause reordering based on map types unclaimed  
atomic extension hints for the atomic construct done D51233
base language C11 support done  
base language C++11/14/17 support done  
base language lambda support done  
misc extension array shaping done D74144
misc extension library shutdown (omp_pause_resource[_all]) unclaimed parts D55078
misc extension metadirectives worked on  
misc extension conditional modifier for lastprivate clause done  
misc extension iterator and multidependences done  
misc extension depobj directive and depobj dependency kind done  
misc extension user-defined function variants worked on D67294, D64095, D71847, D71830
misc extension pointer/reference to pointer based array reductions unclaimed  
misc extension prevent new type definitions in clauses done  
memory model extension memory model update (seq_cst, acq_rel, release, acquire,…) done  

OpenMP 5.1 Implementation Details

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status, as defined in the technical report 8 (TR8). Please contact openmp-dev at lists.llvm.org for more information or if you want to help with the implementation.

Category Feature Status Reviews
atomic extension ‘compare’ and ‘fail’ clauses on atomic construct unclaimed  
base language C++ attribute specifier syntax unclaimed  
device extension ‘present’ map type modifier done D83061, D83062, D84422
device extension ‘present’ motion modifier done D84711, D84712
device extension ‘present’ in defaultmap clause worked on D92427
device extension map clause reordering reordering based on ‘present’ modifier unclaimed  
device extension device-specific environment variables unclaimed  
device extension omp_target_is_accessible routine unclaimed  
device extension omp_get_mapped_ptr routine unclaimed  
device extension new async target memory copy routines unclaimed  
device extension thread_limit clause on target construct unclaimed  
device extension has_device_addr clause on target construct unclaimed  
device extension iterators in map clause or motion clauses unclaimed  
device extension indirect clause on declare target directive unclaimed  
device extension allow virtual functions calls for mapped object on device unclaimed  
device extension interop construct unclaimed  
device extension assorted routines for querying interoperable properties unclaimed  
loop extension Loop tiling transformation worked on D76342
loop extension Loop unrolling transformation unclaimed  
loop extension ‘reproducible’/’unconstrained’ modifiers in ‘order’ clause unclaimed  
memory management alignment extensions for allocate directive and clause unclaimed  
memory management new memory management routines unclaimed  
memory management changes to omp_alloctrait_key enum unclaimed  
memory model extension seq_cst clause on flush construct unclaimed  
misc extension ‘omp_all_memory’ keyword and use in ‘depend’ clause unclaimed  
misc extension error directive unclaimed  
misc extension scope construct unclaimed  
misc extension routines for controlling and querying team regions unclaimed  
misc extension changes to ompt_scope_endpoint_t enum unclaimed  
misc extension omp_display_env routine unclaimed  
misc extension extended OMP_PLACES syntax unclaimed  
misc extension OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars unclaimed  
misc extension ‘target_device’ selector in context specifier unclaimed  
misc extension begin/end declare variant done D71179
misc extension dispatch construct and function variant argument adjustment unclaimed  
misc extension assume and assumes directives worked on  
misc extension nothing directive unclaimed  
misc extension masked construct and related combined constructs unclaimed  
misc extension default(firstprivate) & default(private) partial firstprivate done: D75591
other deprecating master construct unclaimed  
OMPT new barrier types added to ompt_sync_region_t enum unclaimed  
OMPT async data transfers added to ompt_target_data_op_t enum unclaimed  
OMPT new barrier state values added to ompt_state_t enum unclaimed  
OMPT new ‘emi’ callbacks for external monitoring interfaces unclaimed  
task extension ‘strict’ modifier for taskloop construct unclaimed  
task extension inoutset in depend clause unclaimed  
task extension nowait clause on taskwait unclaimed