Conversation
|
Thanks for the contribution! Left a couple of comments. This parameter needs to also be listed in the docs |
|
@jeremyfowers is about to overhaul the config system and this will get caught in the cross hairs. Can you hold off before doing more work and revisit after that's done? I don't want to see your work thrown away. |
jeremyfowers
left a comment
There was a problem hiding this comment.
@genrtul can you help me understand the need for this feature better? if you want CPU, you can do --llamacpp cpu already. If you want GPU, you can do --llamacpp vulkan/rocm.
Why do we need a new flag?
This is for the case where someone has multiple GPUs available on the system, but only wants to use a subset of them, or a different set than is used by default. My personal case is I have a Strix Halo with an eGPU, and I use |
Thanks, that makes sense! So this is the GPU device ID number? Would you kindly update your PR description to include a couple examples of usage, such as
|
|
I've made the changes suggested by bitgamma and added the option in the docs where it seemed relevant. Taking into account superm1's comment I won't work on this further for now. I've also added some examples to the OP |
First off, I'm inexperienced with contributing code, so sorry if there are any mistakes, and I appreciate tips!
The purpose of this simple PR is to allow passing llama-cpp's
--deviceoption directly tolemonade-server. This option selects which accelerator devices should be used by lemonade. The option is renamed--llamacpp-device.The motivation for this change is that currently when using JSON recipes to pass particular parameters to models, the
--llamacpp_argsoption is used, which overrides this option if it gets passed on the command line. Now, choosing which particular devices to use for acceleration strikes me as a runtime option and not appropriate for including in a JSON recipe. This is a quick fix in lieu of somehow allowing--llamacpp_argsto merge arguments from both the command line and recipes, which might be the better fix.I have only implemented this for the llamacpp backend for now, as that's the only one I'm familiar with. However in principle is seems like it could be generic over backends and useful so long as a backend allows choosing accelerator devices. I don't know if the option would be relevant for the other ones. SD.cpp at least doesn't seem to expose such an option currently (https://github.com/leejet/stable-diffusion.cpp/blob/master/examples/cli/README.md). I'm certainly willing to try and add it for other backends, though I won't be able to test it.
The new option --llamacpp_device is relevant for systems with two or more GPUs (for example an integrated GPU and a discrete one, or one connected over Thunderbolt/USB4)
Example usage for a system with three devices:
(no flag) - default behavior which is seemingly to attempt to utilize all devices. This hasn't been changed in this PR.
lemonade-server serve --llamacpp rocm --llamacpp-device Rocm0- Only the first rocm device (usually a GPU) will be used. Which particular device this is is system-dependent.lemonade-server serve --llamacpp vulkan --llamacpp-device Vulkan0,Vulkan2- The two vulkan devices Vulkan0 and Vulkan2 will be used.Note that on my system, attempting to use two devices causes llama-server to repeatedly crash, but this is a separate bug.