Skip to content

Compile-phase subprocess isolation to survive Triton C++ aborts #2531

@choijon5

Description

@choijon5

Problem

C++ abort() calls during Triton compile (e.g. assertion failures in LinearLayout::reshapeOuts, or any future internal Triton invariant failure) kill the parent Python process and take down the entire session. The autotuner has no chance to discard the offending config and try another. The user sees an unrecoverable crash that looks like "Shape Failure" per shape.

Concrete recent example: rope-bwd autotune on H100/B200/MI350X aborts on the tt.reshape <1xAxBxbf16> -> <AxBxbf16> pattern emitted from emit_tl_dot_with_padding's squeeze path. Triton's assertion fires, parent dies, all 8 rope-bwd shapes are reported as Shape Failures. PR #2494 worked around the specific reshape, but the underlying robustness gap remains: any other compile-time C++ crash will reproduce the same outcome.

Proposal

Compile-phase precompile should be isolated from the parent process the same way the benchmark phase already is. The benchmark side (PR #2111, extended in #2487) uses a long-lived spawn worker so a hung or crashing benchmark kernel can be killed without losing autotune progress. The same pattern can be extended to precompile.

Some prior attempts:

Constraint

Compile time must not regress versus the current fork default. Past experiments (see #2128) showed plain spawn is about 2x slower than fork on total compile time. Any new isolation mechanism needs to preserve fork-level speed, e.g. via a long-lived worker pool, reuse of imported modules across compile calls, or amortizing the spawn cost across many configs.

Acceptance criteria

  • A compile-time C++ abort in any one config skips that config and lets the process continue with the rest.
  • Geomean compile time on dashboard (https://helionlang.com/dashboard/) stays the same as of the current fork default.

Related

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions