OpenMP Support

Clang fully supports OpenMP 4.5, almost all of 5.0 and most of 5.1/2. Clang supports offloading to X86_64, AArch64, PPC64[LE], NVIDIA GPUs (all models) and AMD GPUs (all models).

In addition, the LLVM OpenMP runtime libomp supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS. OMPT is also supported for NVIDIA and AMD GPUs.

For the list of supported features from OpenMP 5.0 and 5.1 see OpenMP implementation details and OpenMP 51 implementation details.

General improvements

  • New collapse clause scheme to avoid expensive remainder operations. Compute loop index variables after collapsing a loop nest via the collapse clause by replacing the expensive remainder operation with multiplications and additions.

  • When using the collapse clause on a loop nest the default behavior is to automatically extend the representation of the loop counter to 64 bits for the cases where the sizes of the collapsed loops are not known at compile time. To prevent this conservative choice and use at most 32 bits, compile your program with the -fopenmp-optimistic-collapse.

GPU devices support

Data-sharing modes

Clang supports two data-sharing models for Cuda devices: Generic and Cuda modes. The default mode is Generic. Cuda mode can give an additional performance and can be activated using the -fopenmp-cuda-mode flag. In Generic mode all local variables that can be shared in the parallel regions are stored in the global memory. In Cuda mode local variables are not shared between the threads and it is user responsibility to share the required data between the threads in the parallel regions. Often, the optimizer is able to reduce the cost of Generic mode to the level of Cuda mode, but the flag, as well as other assumption flags, can be used for tuning.

Features not supported or with limited support for Cuda devices

  • Cancellation constructs are not supported.

  • Doacross loop nest is not supported.

  • User-defined reductions are supported only for trivial types.

  • Nested parallelism: inner parallel regions are executed sequentially.

  • Debug information for OpenMP target regions is supported, but sometimes it may be required to manually specify the address class of the inspected variables. In some cases the local variables are actually allocated in the global memory, but the debug info may be not aware of it.

OpenMP 5.0 Implementation Details

The following table provides a quick overview over various OpenMP 5.0 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

loop

support != in the canonical loop form

done

D54441

loop

#pragma omp loop (directive)

partial

D145823 (combined forms)

loop

#pragma omp loop bind

worked on

D144634 (needs review)

loop

collapse imperfectly nested loop

done

loop

collapse non-rectangular nested loop

done

loop

C++ range-base for loop

done

loop

clause: if for SIMD directives

done

loop

inclusive scan (matching C++17 PSTL)

done

memory management

memory allocators

done

r341687,r357929

memory management

allocate directive and allocate clause

done

r355614,r335952

OMPD

OMPD interfaces

done

https://reviews.llvm.org/D99914 (Supports only HOST(CPU) and Linux

OMPT

OMPT interfaces (callback support)

done

thread affinity

thread affinity

done

task

taskloop reduction

done

task

task affinity

not upstream

https://github.com/jklinkenberg/openmp/tree/task-affinity

task

clause: depend on the taskwait construct

done

D113540 (regular codegen only)

task

depend objects and detachable tasks

done

task

mutexinoutset dependence-type for tasks

done

D53380,D57576

task

combined taskloop constructs

done

task

master taskloop

done

task

parallel master taskloop

done

task

master taskloop simd

done

task

parallel master taskloop simd

done

SIMD

atomic and simd constructs inside SIMD code

done

SIMD

SIMD nontemporal

done

device

infer target functions from initializers

worked on

device

infer target variables from initializers

done

D146418

device

OMP_TARGET_OFFLOAD environment variable

done

D50522

device

support full ‘defaultmap’ functionality

done

D69204

device

device specific functions

done

device

clause: device_type

done

device

clause: extended device

done

device

clause: uses_allocators clause

done

device

clause: in_reduction

worked on

r308768

device

omp_get_device_num()

done

D54342,D128347

device

structure mapping of references

unclaimed

device

nested target declare

done

D51378

device

implicitly map ‘this’ (this[:1])

done

D55982

device

allow access to the reference count (omp_target_is_present)

done

device

requires directive

done

device

clause: unified_shared_memory

done

D52625,D52359

device

clause: unified_address

partial

device

clause: reverse_offload

partial

D52780,D155003

device

clause: atomic_default_mem_order

done

D53513

device

clause: dynamic_allocators

unclaimed parts

D53079

device

user-defined mappers

done

D56326,D58638,D58523,D58074,D60972,D59474

device

map array-section with implicit mapper

done

https://github.com/llvm/llvm-project/pull/101101

device

mapping lambda expression

done

D51107

device

clause: use_device_addr for target data

done

device

support close modifier on map clause

done

D55719,D55892

device

teams construct on the host device

done

r371553

device

support non-contiguous array sections for target update

done

device

pointer attachment

done

atomic

hints for the atomic construct

done

D51233

base language

C11 support

done

base language

C++11/14/17 support

done

base language

lambda support

done

misc

array shaping

done

D74144

misc

library shutdown (omp_pause_resource[_all])

done

D55078

misc

metadirectives

mostly done

D91944

misc

conditional modifier for lastprivate clause

done

misc

iterator and multidependences

done

misc

depobj directive and depobj dependency kind

done

misc

user-defined function variants

done.

D67294, D64095, D71847, D71830, D109635

misc

pointer/reference to pointer based array reductions

done

misc

prevent new type definitions in clauses

done

memory model

memory model update (seq_cst, acq_rel, release, acquire,…)

done

OpenMP 5.1 Implementation Details

The following table provides a quick overview over various OpenMP 5.1 features and their implementation status. Please post on the Discourse forums (Runtimes - OpenMP category) for more information or if you want to help with the implementation.

Category

Feature

Status

Reviews

atomic

‘compare’ clause on atomic construct

done

D120290, D120007, D118632, D120200, D116261, D118547, D116637

atomic

‘fail’ clause on atomic construct

worked on

D123235 (in progress)

base language

C++ attribute specifier syntax

done

D105648

device

‘present’ map type modifier

done

D83061, D83062, D84422

device

‘present’ motion modifier

done

D84711, D84712

device

‘present’ in defaultmap clause

done

D92427

device

map clause reordering based on ‘present’ modifier

unclaimed

device

device-specific environment variables

unclaimed

device

omp_target_is_accessible routine

unclaimed

device

omp_get_mapped_ptr routine

done

D141545

device

new async target memory copy routines

done

D136103

device

thread_limit clause on target construct

partial

D141540 (offload), D152054 (host, in progress)

device

has_device_addr clause on target construct

unclaimed

device

iterators in map clause or motion clauses

unclaimed

device

indirect clause on declare target directive

unclaimed

device

allow virtual functions calls for mapped object on device

partial

device

interop construct

partial

parsing/sema done: D98558, D98834, D98815

device

assorted routines for querying interoperable properties

partial

D106674

loop

Loop tiling transformation

done

D76342

loop

Loop unrolling transformation

done

D99459

loop

‘reproducible’/’unconstrained’ modifiers in ‘order’ clause

partial

D127855

memory management

alignment for allocate directive and clause

done

D115683

memory management

‘allocator’ modifier for allocate clause

done

https://github.com/llvm/llvm-project/pull/114883

memory management

new memory management routines

unclaimed

memory management

changes to omp_alloctrait_key enum

unclaimed

memory model

seq_cst clause on flush construct

done

https://github.com/llvm/llvm-project/pull/114072

misc

‘omp_all_memory’ keyword and use in ‘depend’ clause

done

D125828, D126321

misc

error directive

done

D139166

misc

scope construct

done

D157933, https://github.com/llvm/llvm-project/pull/109197

misc

routines for controlling and querying team regions

partial

D95003 (libomp only)

misc

changes to ompt_scope_endpoint_t enum

unclaimed

misc

omp_display_env routine

done

D74956

misc

extended OMP_PLACES syntax

unclaimed

misc

OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT env vars

done

D138769

misc

‘target_device’ selector in context specifier

worked on

misc

begin/end declare variant

done

D71179

misc

dispatch construct and function variant argument adjustment

worked on

D99537, D99679

misc

assumes directives

worked on

misc

assume directive

done

misc

nothing directive

done

D123286

misc

masked construct and related combined constructs

worked on

D99995, D100514

misc

default(firstprivate) & default(private)

done

D75591 (firstprivate), D125912 (private)

other

deprecating master construct

unclaimed

OMPT

new barrier types added to ompt_sync_region_t enum

unclaimed

OMPT

async data transfers added to ompt_target_data_op_t enum

unclaimed

OMPT

new barrier state values added to ompt_state_t enum

unclaimed

OMPT

new ‘emi’ callbacks for external monitoring interfaces

done

OMPT

device tracing interface

unclaimed

task

‘strict’ modifier for taskloop construct

unclaimed

task

inoutset in depend clause

done

D97085, D118383

task

nowait clause on taskwait

partial

parsing/sema done: D131830, D141531

OpenMP Extensions

The following table provides a quick overview over various OpenMP extensions and their implementation status. These extensions are not currently defined by any standard, so links to associated LLVM documentation are provided. As these extensions mature, they will be considered for standardization. Please post on the Discourse forums (Runtimes - OpenMP category) to provide feedback.

Category

Feature

Status

Reviews

atomic extension

‘atomic’ strictly nested within ‘teams’

prototyped

D126323

device extension

‘ompx_hold’ map type modifier

prototyped

D106509, D106510

device extension

‘ompx_bare’ clause on ‘target teams’ construct

prototyped

#66844, #70612

device extension

Multi-dim ‘num_teams’ and ‘thread_limit’ clause on ‘target teams ompx_bare’ construct

partial

#99732, #101407, #102715