LLMs on any GPU: True Open Source to run LLMs #670

githubuser-64 · 2024-07-05T07:41:35Z

githubuser-64
Jul 5, 2024

While the open source models are open, since one can only run them through CUDA and therefore only on NVIDIA GPUs which few can afford, why not replace CUDA to run LLMs on all GPUs?

LLMs on all GPUs?

Yes

77%

No, only NVIDIA

22%

No. NVIDIA's hardware is too good to think of other hardware

0%

9 votes

dagelf · 2024-07-12T11:25:33Z

dagelf
Jul 12, 2024

The reason nobody is answering is because everyone is too busy working on the details.... and that "everybody" is not as many people as you think. I think it's common knowledge that everyone who knows about this, agrees with you, but much fewer people are able to make it happen. This project is just one of many projects that contribute to making it easier to support other hardware. The smart thing that Nvidia did here, was to create tooling that makes their hardware much easier to use, and many if not most people who have specialized in data science are not going to spend time becoming experts in compilers and hardware if there is already a way for them to continue with the work that they are most interested in. The 'invisible hand" of the market worked here, by driving the costs of Nvidia hardware up enough, so that it started making economic sense for more people to build support for alternative hardware, and there are a number of projects doing this. The hip fork of llm.c, Pytorch, tinygrad are just some examples that already work with some AMD gpus.

You might not realize or believe it, but within maybe 2 or 3 years you can skill yourself up to being able to contribute to things like this too, Maybe even sooner. Just start coding... try to get a computer to do something that it can't do... however silly or small, just make it do something that will amuse you... at some point it might make mathematics more interesting to you, and then might in turn make coding more interesting again. Just one of the many things that is likely to happen along the way. If nothing else, it will almost be impossible to do this, without it helping you to understand and appreciate more about the world we live in!

0 replies

austinvhuang · 2024-07-12T16:05:31Z

austinvhuang
Jul 12, 2024

Very timely... I'm preparing to announce (probably early next week) a project I've been working on to address this.

It's called gpu.cpp - a minimalist library that makes portable GPU compute with C++ simple, using the WebGPU API specification as a portable low-level GPU interface:

https://github.com/AnswerDotAI/gpu.cpp/tree/main

I think the scope of llm.c is probably already settled on CUDA, but one of my short-term goals is to make the llm.c cuda kernels to WebGPU and make them available as part of the library. I was thinking of submitting a PR here to add a link to the related projects section of the README here if Andrej is okay with including it there.

2 replies

ngc92 Jul 12, 2024

@austinvhuang That certainly sounds like a cool project, but I think the fracturedness of alternatives to CUDA really doesn't help.
There is

OpenMP offloading
OpenACc
OpenCL
Sycl
Kokkos
just to list from the top of my hat.

austinvhuang Jul 12, 2024

I agree it's a scattered landscape, I went around in circles for a long time trying to find an answer to this question and it's easy to have framework/API fatigue.

FWIW i wouldn't say gpu.cpp is in competition with any of these, the goal is to make the power that's already there with WebGPU to a cuda-like level of usability for compute use cases.

For reference, I included a small example here showing a compute dispatch with the raw WebGPU API and I think few people who enjoy the usability of cuda would put up with that.

The thing that's uniquely compelling about WebGPU isn't so much anything technical that stands out (there's even some obvious criticisms and limitations). The thing that stands out to me is that there's strong aligned incentives from powerful institutions + latent network effects that seem more plausible to me than just about any other n+1th technically-adequate-looking solution.

These things are always hard to predict for sure, but WebGPU is what I've converged on for now as most plausible in an uncertain future after exploring this question for years (including an unproductive rabbit hole with Vulkan but that's another story ...).

oof-baroomf · 2024-07-18T23:28:51Z

oof-baroomf
Jul 18, 2024

If you read the README, you would see that there are forks for Gaudi 2, AMD, Metal, and WebGPU. Support has not been added to the main repo to maintain simplicity (by keeping it in C/CUDA, according to #112).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMs on any GPU: True Open Source to run LLMs #670

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

LLMs on any GPU: True Open Source to run LLMs #670

githubuser-64 Jul 5, 2024

Replies: 3 comments · 2 replies

dagelf Jul 12, 2024

austinvhuang Jul 12, 2024

ngc92 Jul 12, 2024

austinvhuang Jul 12, 2024

oof-baroomf Jul 18, 2024

githubuser-64
Jul 5, 2024

Replies: 3 comments 2 replies

dagelf
Jul 12, 2024

austinvhuang
Jul 12, 2024

oof-baroomf
Jul 18, 2024