-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Proposal
The nvptx64-nvidia-cuda target is a virtual target for NVIDIA GPUs. In addition to manufacturing the hardware, NVIDIA provides both the ISA specification for the virtual compilation target (PTX) and the CUDA Driver, which further lowers PTX to assembly suitable for a specific GPU. The hardware, ISA, and driver are separately versioned, and NVIDIA provides compatibility guidelines for combinations of these.
Rust currently supports emitting code for a wide range of hardware architectures and PTX ISA versions. This proposal is based on the premise that the benefits of this breadth of compatibility are not justified by the engineering effort required to maintain it while holding the nvptx64-nvidia-cuda target to a reasonable quality standard.
The concrete proposal is to drop support for:
- PTX ISA versions older than 7.0
- GPU architectures older than SM 7.0
This means that even though SM 7.0, 7.2, and 7.5 are supported by PTX ISA 6.5, the new behavior would be to default to PTX ISA version 7.0 for these architectures.
The motivation for selecting PTX ISA 7.0 is an LLVM limitation in debug symbol generation for older ISAs (rust-lang/rust#147672). As a result, code generated for PTX ISA versions older than 7.0 is severely limited in terms of debugging capabilities. The justification for accepting this cutoff is that CUDA versions older than 12 are end-of-life, and PTX ISA 7.0 is supported starting with CUDA 11.0. One could argue for dropping support for anything older than PTX ISA 8.0, as PTX 8.0 is supported starting with CUDA 12.0. However, CUDA 11 remains in use despite being end-of-life, and there is little immediate benefit in dropping support for it. As such, this should instead be revisited in the future.
When it comes to hardware architectures, the proposal is more aggressive, as it suggests dropping architectures that are not yet fully dropped by the CUDA Driver. CUDA 13 supports SM 7.5–12.1, while CUDA 12.9 supports SM 5.0–12.1. By selecting SM 7.0 as the oldest supported architecture, this proposal places Rust support between these two toolchain baselines. The reason this is necessary is that atomic ordering is not sufficiently supported on architectures older than SM 7.0 (rust-lang/rust#150515). For SM 7.0 and newer, LLVM follows the guarantees described in this formal analysis when emitting atomic instructions, while on older architectures LLVM emits atomic instructions on a best-effort basis. Dropping support for older architectures earlier is considered preferable to dropping support for atomics for the target as a whole.
The "Platform Support" section for nvptx64-nvidia-cuda will clearly document which ISAs and architectures are not supported starting with which Rust versions. This section will serve as the authoritative reference for what can be expected to work, even if unstable features exist that can be leveraged to bypass these limitations and control LLVM features directly.
For the implementation, making rustc --print target-cpus respect unsupported targets and handling attempts to use unsupported CPUs will be achieved either by adding an unsupported_cpus list to TargetOptions, or alternatively by special-casing nvptx64 if introducing this new field is undesirable. Since setting the PTX ISA version explicitly is based on features on the LLVM side, this unfortunately requires introducing a special case for nvptx64 in target_spec_to_backend_features or a similar location. Expressing PTX ISA versions as target features has proven to be incompatible with Rust target features, and when a better mechanism is added on the Rust side, the minimum supported PTX ISA version should be reconsidered in the implementation of this change.
Mentors or Reviewers
@ZuseZ4 was in favor of this MCP and @workingjubilee have traditionally been more than helpful when it comes to "shaping up" GPU targets. I hope that one of them has the capacity to review this proposal. I implemented a prototype to verify my assumptions while writing this MCP.
Process
The main points of the Major Change Process are as follows:
- File an issue describing the proposal.
- A compiler team member who is knowledgeable in the area can second by writing
@rustbot secondor kickoff a team FCP with@rfcbot fcp $RESOLUTION.- Refer to Proposals, Approvals and Stabilization docs for when a second is sufficient, or when a full team FCP is required.
- Once an MCP is seconded, the Final Comment Period begins.
- Final Comment Period lasts for 10 days after all outstanding concerns are solved.
- Outstanding concerns will block the Final Comment Period from finishing. Once all concerns are resolved, the 10 day countdown is restarted.
- If no concerns are raised after 10 days since the resolution of the last outstanding concern, the MCP is considered approved.
You can read more about Major Change Proposals on forge.