Skip to content

llvm 19 support #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

llvm 19 support #227

wants to merge 2 commits into from

Conversation

brandonros
Copy link

@brandonros brandonros commented Jun 8, 2025

potentially addresses all of:

@brandonros
Copy link
Author

i'm on the fence about renaming llvm to llvm7 and llvm19

i think it might not actually be needed and i might put it back

@brandonros
Copy link
Author

@LegNeato my measuring stick here is does vecadd example build with LLVM v19

could you tell me if I'm close or if I'm actually missing something huge like a mountain of work I'm not seeing?

@brandonros
Copy link
Author

 DEBUG: About to call LLVMRunPasses - THIS IS THE CRITICAL POINT
  DEBUG: Parameters:
  DEBUG:   llmod: 0x770f3d33a580
  DEBUG:   pipeline: "default<O0>"
  DEBUG:   tm: 0x770f27dafb00
  DEBUG:   pass_options: 0x770f27e00060
  DEBUG: LLVMRunPasses returned: 0
  DEBUG: LLVMRunPasses completed successfully
  DEBUG: Cleaning up pass builder options
  DEBUG: Pass builder options disposed
  DEBUG: optimize function completed successfully
  DEBUG: About to verify module before prepare_thin
  DEBUG: Module verification result: 0
  DEBUG: LLVMRustThinLTOBufferCreate called with is_thin=1, emit_summary=1
  DEBUG: Taking ThinLTO path
  DEBUG: About to run ThinLTO pass
  error: rustc interrupted by SIGSEGV, printing backtrace
(gdb) bt
#0  0x0000772679b89e10 in llvm::ValueEnumerator::EnumerateType(llvm::Type*) ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#1  0x0000772679b8d240 in llvm::ValueEnumerator::incorporateFunction(llvm::Function const&) ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#2  0x0000772679b615b2 in (anonymous namespace)::ModuleBitcodeWriter::write() ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#3  0x0000772679b5b341 in llvm::BitcodeWriter::writeModule(llvm::Module const&, bool, llvm::ModuleSummaryIndex const*, bool, std::array<unsigned int, 5ul>*) () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#4  0x0000772679b66802 in llvm::WriteBitcodeToFile(llvm::Module const&, llvm::raw_ostream&, bool, llvm::ModuleSummaryIndex const*, bool, std::array<unsigned int, 5ul>*) () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#5  0x000077267944549c in llvm::ThinLTOBitcodeWriterPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#6  0x00007726786b34a7 in llvm::detail::PassModel<llvm::Module, llvm::ThinLTOBitcodeWriterPass, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#7  0x000077267a2e0dc8 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
--Type <RET> for more, q to quit, c to continue without paging--
#8  0x000077267867e178 in LLVMRustThinLTOBufferCreate ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#9  0x000077267859d7be in rustc_codegen_nvvm::lto::ThinBuffer::new ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#10 0x000077267858512a in <rustc_codegen_nvvm::NvvmCodegenBackend as rustc_codegen_ssa::traits::write::WriteBackendMethods>::prepare_thin () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#11 0x00007726785ec605 in rustc_codegen_ssa::back::write::execute_optimize_work_item ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#12 0x00007726785e3cd5 in rustc_codegen_ssa::back::write::spawn_work::{{closure}} ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#13 0x0000772678404d97 in std::sys::backtrace::__rust_begin_short_backtrace ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#14 0x000077267847357f in std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}} ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#15 0x0000772678520144 in <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
    () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#16 0x0000772678466e24 in std::panicking::try::do_call ()
--Type <RET> for more, q to quit, c to continue without paging--
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#17 0x000077267847381b in __rust_try () from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#18 0x00007726784730f1 in std::thread::Builder::spawn_unchecked_::{{closure}} ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#19 0x000077267848f987 in core::ops::function::FnOnce::call_once{{vtable.shim}} ()
   from /workspaces/Rust-CUDA/target/debug/deps/librustc_codegen_nvvm.so
#20 0x000077268c55b8ab in std::sys::pal::unix::thread::Thread::new::thread_start ()
   from /root/.rustup/toolchains/nightly-2025-03-02-x86_64-unknown-linux-gnu/bin/../lib/librustc_driver-e3b06f91230294e6.so
#21 0x000077268668aaa4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
#22 0x0000772686717a34 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

pain

@LegNeato
Copy link
Contributor

I don't know, llvm is an area of the project I have not touched.

Copy link
Contributor

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay the same, right? The newer support should be optional / only enabled when

static PREBUILT_LLVM_URL: &str =
"https://github.com/rust-gpu/rustc_codegen_nvvm-llvm/releases/download/LLVM-7.1.0/";

static REQUIRED_MAJOR_LLVM_VERSION: u8 = 7;
static REQUIRED_MAJOR_LLVM_VERSION: u8 = 19;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The requirement doesn't bump unless targeting a higher arch? So the logic is:

  • 7 if targeting arch supported by 7 and 19
  • 19 if targeting arch not supported by 7

@LegNeato
Copy link
Contributor

We should probably break this down, seeing as neither of us understand this space.

First, we should probably add various values for arch and stuff to enums on the rust side. Some of these might need to be gated if on 7 vs 19.

Then, we should get switching between 7 and a stubbed out / non working 19 via target arch.

Then, we should systematically fix each issue and refactor common code on the way.

@brandonros
Copy link
Author

seeing as neither of us understand this space.

ok... kind of a strange remark...

i'm just going to work on my local branch and make kernels compile with 19.1

then i was going to work backwards and "make it upstream worthy"...

@brandonros brandonros closed this Jun 10, 2025
@brandonros
Copy link
Author

brandonros commented Jun 14, 2025

@LegNeato

i rented some powerful big aws spot VM in the cloud and built llvm-19 debug with assertions enabled and am using it in devcontainer (it's a 15gb .tar.xz and 67gb uncompressed)

it's helping find issues more than a typical llvm release build fyi, just a little tip i'd share

i have the "shaved yaks" opinionated way to get that instance if you want, cost me about $3 total

https://github.com/brandonros/cloud-llvm-build

@brandonros brandonros reopened this Jun 15, 2025
@brandonros brandonros force-pushed the llvm-19 branch 4 times, most recently from 3cac1eb to f50708b Compare June 15, 2025 01:14
@brandonros
Copy link
Author

brandonros commented Jun 15, 2025

@LegNeato i'm on the fence about having two rustc_codegen_nvvm_v{{version}} crates 85% copy and pasted but.... i got this to work. vecadd is working at least, going to see if ed25519_vanity_rs compiles later

proof:
vecadd_kernel.ptx.txt

@brandonros
Copy link
Author

brandonros commented Jun 20, 2025

@LegNeato i've read multiple conflicting things from multiple "official nvidia sources/documentation" that the new cuda 12.9 toolkit is based on/adds support for either llvm 18, llvm 19, or llvm 20.

i can see in their cicc binary references of llvm 20, so....

i also question if we need this. this might sound dumb but... what if we used a simple official rust target like riscv64gc-unknown-none-elf, use official rust compiler (no custom nightly, no custom codegen llvm integration) to spit out llvm ir, and then patch it to work with cuda...

i totally agree with you/the project's view on "use nvvm to compile llvm ir to ptx"

https://github.com/brandonros/vanity-miner-rs/pull/8/files i haven't had a chance to test it yet (working on it) but in terms of thinking outside the box, i was for sure able to get nvvm to accept llvm ir and make what seems to be a "valid ptx"

@brandonros
Copy link
Author

i got this working for blackwell a different way. this branch/pr/LLVM v19 integration might work just fine but it's kind of a lot to maintain if there's an "easier" (albeit hackier) way to solve this

https://github.com/brandonros/vanity-miner-rs/actions/runs/15809309968/job/44558442212

Build Pipeline

  1. compile no_std Rust logic + kernels libraries (specifically 1.86.0 because it was built against LLVM 19) targeting riscv64gc-unknown-none-elf due to its simplicity in instruction set
  2. make it emit LLVM IR instead of an actual binary
  3. Adapt the RISC-V LLVM IR to NVPTX64 LLVM IR
  4. assemble the NVPTX64 LLVM IR to NVPTX64 LLVM bitcode
  5. feed the NVPTX64 LLVM bitcode to new CUDA toolkit 12.9 libNVVM which adds support for LLVM19 for Blackwell (previous architectures only support LLVM v7 which is very old) to get Nvidia's PTX (Parallel Thread Execution)
  6. feed the PTX to ptxas to get CUBIN SaSS (Streaming ASSembler)
  7. run the CUBIN on device with gpu_runner

let me know if you actually want this/to put time into it, otherwise blackwell+ might be able to avoid cuda_builder or i'd need to make a cuda_builder that makes this super opinionated rust -> cubin pipeline i made

@brandonros brandonros closed this Jun 22, 2025
@LegNeato LegNeato reopened this Aug 1, 2025
@LegNeato
Copy link
Contributor

LegNeato commented Aug 1, 2025

I'm re-opening because I think we want to go this route. Totally understand if you go a different route for your project!

@LegNeato
Copy link
Contributor

LegNeato commented Aug 1, 2025

I plan to poke at it this week. Apologies for the previous response saying we don't know the space, I don't really know the LLVM side of the house and I misspoke.

@brandonros
Copy link
Author

Totally understand if you go a different route for your project!

Nope! Here to help, let's land this!

#216

Let's land that and then I'll rebase?

@LegNeato
Copy link
Contributor

LegNeato commented Aug 3, 2025

I don't have time to jam on this with you until later in the week, but I think this is a great start! Ideally both are compiled in statically or as dylibs and runtime chooses based on arch selected. But distribution with dylibs is annoying, and compiling 2 llvm versions in the same process will be annoying. So I think the first step is what this is doing, manually switching, but we should be aware where we would like it to go.

@tyler274
Copy link

Im willing to put some time in here if I can get some pointers.

@devillove084
Copy link

@tyler274 I'm also spending time on this and have followed the same approach as @brandonros. Writing extensive conditional compilation is truly a pain. Perhaps I think we should focus our efforts on this branch? Maybe we could explore how to improve this together?

@devillove084
Copy link

@tyler274 #229 (comment) Here are some previous hints from @LegNeato.

@devillove084
Copy link

@LegNeato Additionally, based on my previous open-source contributions and discussions, I've learned that both the Graphite and Turso are exploring GPU acceleration. I believe this represents a significant opportunity for us. I'm highly motivated to drive this initiative forward and position our project as the leading GPU-accelerated solution within the Rust ecosystem.

@LegNeato
Copy link
Contributor

@Firestar99 is working on Graphite's support via rust-gpu!

@devillove084
Copy link

@Firestar99 is working on Graphite's support via rust-gpu!

@LegNeato That's awesome! The good news is, after a month of learning and hands-on practice, I've basically figured out the workflow of LLVM backend generation. The bad news is debugging conditional compilation remains quite painful. What do you think – would it be better to write conditional compilation, or maintain two separate branches?

@LegNeato
Copy link
Contributor

I think conditional is better, as I think upgrading drops support for a bunch of devices? Or is that not the case?

@brandonros
Copy link
Author

what would it take to land this?

@devillove084
Copy link

@LegNeato Based on my research, here are the facts: Taking PassManager as an example, this component orchestrates LLVM optimizations and analyses during backend code generation. Typically, multiple passes need to collaborate (e.g., constant propagation followed by dead code elimination). The PassManager schedules passes in a predefined order or based on dependencies to ensure logical execution sequence.

Prior to LLVM 9/10, implementations required inheriting from specific PassManager virtual base classes. However, starting from LLVM 14, everything has been unified into a CRTP (Curiously Recurring Template Pattern) code structure. Additionally, header files may have been relocated or modified across different LLVM versions.

Therefore, if we choose conditional compilation, I would need to create CI workflows for every single version from LLVM 7 to LLVM 19 to ensure compatibility.

@brandonros
Copy link
Author

I would need to create CI workflows for every single version from LLVM 7 to LLVM 19 to ensure compatibility.

I think this is a misunderstanding. NVIDIA cards either run LLVM 7 or LLVM 19, nothing in between. Please correct me if I am wrong.

@devillove084
Copy link

devillove084 commented Aug 22, 2025

I would need to create CI workflows for every single version from LLVM 7 to LLVM 19 to ensure compatibility.

I think this is a misunderstanding. NVIDIA cards either run LLVM 7 or LLVM 19, nothing in between. Please correct me if I am wrong.

@brandonros I'm referring specifically to modifications in the LLVM backend generation logic within wrapper files like rustc_llvm_wrapper. This is not targeting NVVM specifically.

@devillove084
Copy link

@brandonros @LegNeato https://gist.github.com/ax3l/9489132 This source perfectly highlights that while NVIDIA ​​officially certifies specific LLVM versions per CUDA release​​ (as shown in the version matrix), the reality is more nuanced:

  • The crt/host_config.hhack (as noted) allows unofficial flexibility for newer LLVM versions
  • Production environments often use ​​LLVM 11-15​​ (especially with CUDA 11.x/12.x)
  • NVIDIA’s Enhanced Compatibility(since CUDA 11.1) intentionally supports cross-version compatibility

@devillove084
Copy link

@brandonros I think we could start by refining the branches for ​​LLVM 7​​ and ​​LLVM 19​​ based on your existing work, then progressively extend support to other components like ​​NVVM (CUDA)​​ and additional LLVM versions.

@devillove084
Copy link

@LegNeato I strongly agree that prioritizing support for the newer Rust toolchain, CUDA, and LLVM versions is critical. Our project's future hinges on optimizing for cutting-edge hardware like the H100, A100, and even the GH200 (which I currently have access to). This strategic focus will enable major enterprise customers to integrate our solution into their infrastructure—the key to maximizing our long-term growth and impact.

@brandonros
Copy link
Author

I think we could start by refining the branches for ​​LLVM 7​​ and ​​LLVM 19​​ based on your existing work,

I will have rebased this massive thing twice now and it continues to go stale at almost 2+ months old. Are we serious about upstreaming this? Otherwise I'm hesitant to keep doing this same song and dance of "get it ready for merge, put it on the shelf".

@devillove084
Copy link

@LegNeato Based on the current modifications, what are the primary remaining challenges? Let's explore what additional efforts and adjustments we can make.

@devillove084
Copy link

devillove084 commented Aug 22, 2025

@LegNeato After reviewing his code changes, I can roughly understand that it seems you don't want to create code for different LLVM versions in multiple folders. Instead, you may prefer to achieve a balance between NVVM and LLVM through conditional compilation.

I sincerely request that you take the time to delve into this part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants