Skip to content
This repository was archived by the owner on Oct 5, 2025. It is now read-only.

hgl71964/cuasmrl

Repository files navigation

Triton logo

Wheels

We're hiring! If you are interested in working on Triton at OpenAI, we have roles open for Compiler Engineers and Kernel Engineers.

Documentation
Documentation

Triton

This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.

The foundations of this project are described in the following MAPL2019 publication: Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations. Please consider citing this work if you use Triton!

The official documentation contains installation instructions and tutorials.

Install from source

python -m venv .venv --prompt triton;
source .venv/bin/activate;

pip install ninja cmake wheel; # build-time dependencies
pip install -e python


pip install pyelftools

pip install tensorboard


CUDA Driver Version: 545.23.08
ptxas --version: 12.2


git clone https://github.com/hgl71964/CuAssembler.git

export PYTHONPATH={path-to-CuAssembler}:{path-to-CuAssembler/bin}:{path-to-CuAssembler/CuAsm}

pip install -e cuasmrl

Building

  1. build triton; in setup, ensure the ptxas, cuobjdump etc version matches the host version

  2. build pytorch; note that pytorch also needs to match the version

  3. pip uninstall triton (install by pytorch); and re-build

Tips for building

  • Set TRITON_BUILD_WITH_CLANG_LLD=true as an environment variable to use clang and lld. lld in particular results in faster builds.

  • Set TRITON_BUILD_WITH_CCACHE=true to build with ccache.

  • Pass --no-build-isolation to pip install to make nop builds faster. Without this, every invocation of pip install uses a different symlink to cmake, and this forces ninja to rebuild most of the .a files.

  • vscode intellisense has some difficulty figuring out how to build Triton's C++ (probably because, in our build, users don't invoke cmake directly, but instead use setup.py). Teach vscode how to compile Triton as follows.

    • Do a local build.
    • Get the full path to the compile_commands.json file produced by the build: find python/build -name 'compile_commands.json | xargs readlink -f'
    • In vscode, install the C/C++ extension, then open the command palette (Shift + Command + P on Mac, or Shift + Ctrl + P on Windows/Linux) and open C/C++: Edit Configurations (UI).
    • Open "Advanced Settings" and paste the full path to compile_commands.json into the "Compile Commands" textbox.

Compatibility

Supported Platforms:

  • Linux

Supported Hardware:

  • NVIDIA GPUs (Compute Capability 7.0+)
  • Under development: AMD GPUs, CPUs

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published