LLVM IR Generation for EH and Cleanups

Overview

This document describes how Clang’s LLVM IR generation represents exception handling (EH) and C++ cleanups. It focuses on the data structures and control flow patterns used to model normal and exceptional exits, and it outlines how the generated IR differs across common ABI models.

For details on the LLVM IR representation of exception handling, see LLVM Exception Handling.

Core Model

EH and cleanup handling is centered around an EHScopeStack that records nested scopes for:

  • Cleanups, which run on normal control flow, exceptional control flow, or both. These are used for destructors, full-expression cleanups, and other scope-exit actions.

  • Catch scopes, which represent try/catch handlers.

  • Filter scopes, used to model dynamic exception specifications and some platform-specific filters.

  • Terminate scopes, used for noexcept and similar termination paths.

Each cleanup is a small object with an Emit method. When a cleanup scope is popped, the IR generator decides whether it must materialize a normal cleanup block (for fallthrough, branch-through, or unresolved goto fixups) and/or an EH cleanup entry (when exceptional control flow can reach the cleanup). This results in a flattened CFG where cleanup lifetime is represented by the blocks and edges that flow into those blocks.

Key Components

The LLVM IR generation for EH and cleanups is spread across several core components:

  • CodeGenModule owns module-wide state such as the LLVM module, target information, and the selected EH personality function. It provides access to ABI helpers via CGCXXABI and target-specific hooks.

  • CodeGenFunction manages per-function state and IR building. It owns the EHScopeStack, tracks the current insertion point, and emits blocks, calls, and branches. Most cleanup and EH control flow is built here.

  • EHScopeStack is the central stack of scopes used to model EH and cleanup semantics. It stores EHCleanupScope entries for cleanups, along with EHCatchScope, EHFilterScope, and EHTerminateScope for handlers and termination logic.

  • EHCleanupScope stores the cleanup object plus state data (active flags, fixup depth, and enclosing scope links). When a cleanup scope is popped, CodeGenFunction decides whether to emit a normal cleanup block, an EH cleanup entry, or both.

  • Cleanup emission helpers implement the mechanics of branching through cleanups, threading fixups, and emitting cleanup blocks.

  • Exception emission helpers implement landing pads, dispatch blocks, personality selection, and helper routines for try/catch, filters, and terminate handling.

  • CGCXXABI (and its ABI-specific implementations such as ItaniumCXXABI and MicrosoftCXXABI) provide ABI-specific lowering for throws, catch handling, and destructor emission details.

  • The cleanup and exception handling code generation is driven by the flow of CodeGenFunction and its helper classes traversing the AST to emit IR for C++ expressions, classes, and statements.

AST traversal in CodeGenFunction emits code and pushes cleanups or EH scopes, EHScopeStack records scope nesting, cleanup and exception helpers materialize the CFG as scopes are popped, and CGCXXABI supplies ABI-specific details for landing pads or funclets.

Cleanup Destination Routing

When multiple control flow exits (return, break, continue, fallthrough) pass through the same cleanup, the generated IR shares a single cleanup block among them. Before entering the cleanup, each exit path stores a unique index into a “cleanup destination” slot. After the cleanup code runs, a switch instruction loads this index and dispatches to the appropriate final destination. This avoids duplicating cleanup code for each exit while preserving correct control flow.

For example, if a function has both a return and a break that exit through the same destructor cleanup, both paths branch to the shared cleanup block after storing their respective destination indices. The cleanup epilogue then switches on the stored index to reach either the return block or the loop-exit block.

When only a single exit passes through a cleanup (the common case), the switch is unnecessary and the cleanup block branches directly to its sole destination.

Branch Fixups for Forward Gotos

A goto statement that jumps forward to a label not yet seen poses a special problem. The destination’s enclosing cleanup scope is unknown at the point the goto is emitted. This is handled by emitting an optimistic branch and recording a “fixup.” When the cleanup scope is later popped, any recorded fixups are resolved by rewriting the branch to thread through the cleanup block and adding the destination to the cleanup’s switch.

Exceptional Cleanups and EH Dispatch

Exceptional exits (throw, invoke unwinds) are routed through EH cleanup entries, which are reached via a landing pad or a funclet dispatch block, depending on the target ABI.

For Itanium-style EH (such as is used on x86-64 Linux), the IR uses invoke to call potentially-throwing operations and a landingpad instruction to capture the exception and selector values. The landing pad aggregates any catch and cleanup clauses for the current scope, and branches to a dispatch block that compares the selector to type IDs and jumps to the appropriate handler.

For Windows, LLVM IR uses funclet-style EH: catchswitch and catchpad for handlers, and cleanuppad for cleanups, with catchret and cleanupret edges to resume normal flow. The personality function determines how these pads are interpreted by the backend.

Personality and ABI Selection

Each function with exception handling constructs is associated with a personality function (e.g. __gxx_personality_v0 for C++ on Linux). The personality function determines the ABI-specifc EH behavior of the function. The IR generation selects a personality function based on language options and the target ABI (e.g., Itanium, MSVC SEH, SJLJ, Wasm EH). This decision affects:

  • Whether the IR uses landing pads or funclet pads.

  • The shape of dispatch logic for catch and filter scopes.

  • How termination or rethrow paths are modeled.

  • Whether certain helper functions such as exception filters must be outlined.

Because the personality choice is made during IR generation, the CFG shape directly reflects ABI-specific details.

Example: Array of Objects with Throwing Constructor

Consider:

class MyClass {
public:
  MyClass(); // may throw
  ~MyClass();
};
void doSomething(); // may throw
void f() {
  MyClass arr[4];
  doSomething();
}

High-level behavior

  • Construction of arr proceeds element-by-element. If an element constructor throws, destructors must run for any elements that were successfully constructed before the throw in reverse order of construction.

  • After full construction, the call to doSomething may throw, in which case the destructors for all constructed elements must run, in reverse order.

  • On normal exit, destructors for all elements run in reverse order.

Codegen flow and key components

  • The surrounding compound statement enters a CodeGenFunction::LexicalScope, which is a RunCleanupsScope and is responsible for popping local cleanups at the end of the block.

  • CodeGenFunction::EmitDecl routes the local variable to CodeGenFunction::EmitVarDecl and then CodeGenFunction::EmitAutoVarDecl, which in turn calls EmitAutoVarAlloca, EmitAutoVarInit, and EmitAutoVarCleanups.

  • CodeGenFunction::EmitCXXAggrConstructorCall emits the array constructor loop. While emitting the loop body, it enters a RunCleanupsScope and uses CodeGenFunction::pushRegularPartialArrayCleanup to register a cleanup before calling CodeGenFunction::EmitCXXConstructorCall for one element in the loop iteration. If this constructor were to throw an exception, the cleanup handler would destroy the previously constructed elements in reverse order.

  • CodeGenFunction::EmitAutoVarCleanups calls emitAutoVarTypeCleanup, which ultimately registers a DestroyObject cleanup via CodeGenFunction::pushDestroy / pushFullExprCleanup for the full-array destructor path.

  • DestroyObject uses CodeGenFunction::destroyCXXObject, which emits the actual destructor call via CodeGenFunction::EmitCXXDestructorCall.

  • Cleanup emission helpers (e.g., CodeGenFunction::PopCleanupBlock and CodeGenFunction::EmitBranchThroughCleanup) thread both normal and EH exits through the cleanup blocks as scopes are popped.

  • The cleanup is represented as an EHCleanupScope on EHScopeStack, and its Emit method generates a loop that calls the destructor on the initialized range in reverse order.

The above function names and flow are accurate as of LLVM 22.0, but this is subject to change as the code evolves, and this document might not be updated to reflect the exact functions used.

Example: Temporary object materialization

Consider:

class MyClass {
public:
  MyClass();
  ~MyClass();
};
void useMyClass(MyClass &);
void f() {
  useMyClass(MyClass());
}

High-level behavior

  • The temporary MyClass is materialized for the call argument.

  • The temporary must be destroyed at the end of the full-expression, both on the normal path and on the exceptional path if useMyClass throws.

  • If the constructor throws, the temporary is not considered constructed and no destructor runs.

Codegen flow and key functions

  • CodeGenFunction::EmitExprWithCleanups wraps the full-expression in a RunCleanupsScope so that full-expression cleanups are run after the call.

  • CodeGenFunction::EmitMaterializeTemporaryExpr creates storage for the temporary via createReferenceTemporary and initializes it. For record temporaries this flows through EmitAnyExprToMem and CodeGenFunction::EmitCXXConstructExpr, which calls CodeGenFunction::EmitCXXConstructorCall.

  • pushTemporaryCleanup registers the destructor as a full-expression cleanup by calling CodeGenFunction::pushDestroy for SD_FullExpression temporaries.

  • The cleanup ultimately uses DestroyObject and CodeGenFunction::destroyCXXObject, which emits CodeGenFunction::EmitCXXDestructorCall.

The above function names and flow are accurate as of LLVM 22.0, but this is subject to change as the code evolves, and this document might not be updated to reflect the exact functions used.