Skip to content

Device option sep#1397

Open
genrtul wants to merge 7 commits intolemonade-sdk:mainfrom
genrtul:device-option-sep
Open

Device option sep#1397
genrtul wants to merge 7 commits intolemonade-sdk:mainfrom
genrtul:device-option-sep

Conversation

@genrtul
Copy link
Copy Markdown

@genrtul genrtul commented Mar 18, 2026

First off, I'm inexperienced with contributing code, so sorry if there are any mistakes, and I appreciate tips!

The purpose of this simple PR is to allow passing llama-cpp's --device option directly to lemonade-server. This option selects which accelerator devices should be used by lemonade. The option is renamed --llamacpp-device.

The motivation for this change is that currently when using JSON recipes to pass particular parameters to models, the --llamacpp_args option is used, which overrides this option if it gets passed on the command line. Now, choosing which particular devices to use for acceleration strikes me as a runtime option and not appropriate for including in a JSON recipe. This is a quick fix in lieu of somehow allowing --llamacpp_args to merge arguments from both the command line and recipes, which might be the better fix.

I have only implemented this for the llamacpp backend for now, as that's the only one I'm familiar with. However in principle is seems like it could be generic over backends and useful so long as a backend allows choosing accelerator devices. I don't know if the option would be relevant for the other ones. SD.cpp at least doesn't seem to expose such an option currently (https://github.com/leejet/stable-diffusion.cpp/blob/master/examples/cli/README.md). I'm certainly willing to try and add it for other backends, though I won't be able to test it.

The new option --llamacpp_device is relevant for systems with two or more GPUs (for example an integrated GPU and a discrete one, or one connected over Thunderbolt/USB4)

Example usage for a system with three devices:

(no flag) - default behavior which is seemingly to attempt to utilize all devices. This hasn't been changed in this PR.

lemonade-server serve --llamacpp rocm --llamacpp-device Rocm0 - Only the first rocm device (usually a GPU) will be used. Which particular device this is is system-dependent.

lemonade-server serve --llamacpp vulkan --llamacpp-device Vulkan0,Vulkan2 - The two vulkan devices Vulkan0 and Vulkan2 will be used.

Note that on my system, attempting to use two devices causes llama-server to repeatedly crash, but this is a separate bug.

@bitgamma
Copy link
Copy Markdown
Member

Thanks for the contribution! Left a couple of comments. This parameter needs to also be listed in the docs

@superm1
Copy link
Copy Markdown
Member

superm1 commented Mar 18, 2026

@jeremyfowers is about to overhaul the config system and this will get caught in the cross hairs. Can you hold off before doing more work and revisit after that's done? I don't want to see your work thrown away.

Copy link
Copy Markdown
Member

@jeremyfowers jeremyfowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@genrtul can you help me understand the need for this feature better? if you want CPU, you can do --llamacpp cpu already. If you want GPU, you can do --llamacpp vulkan/rocm.

Why do we need a new flag?

@genrtul
Copy link
Copy Markdown
Author

genrtul commented Mar 19, 2026

@genrtul can you help me understand the need for this feature better? if you want CPU, you can do --llamacpp cpu already. If you want GPU, you can do --llamacpp vulkan/rocm.

Why do we need a new flag?

This is for the case where someone has multiple GPUs available on the system, but only wants to use a subset of them, or a different set than is used by default. My personal case is I have a Strix Halo with an eGPU, and I use --device to prevent the eGPU from being used.

@jeremyfowers
Copy link
Copy Markdown
Member

@genrtul can you help me understand the need for this feature better? if you want CPU, you can do --llamacpp cpu already. If you want GPU, you can do --llamacpp vulkan/rocm.
Why do we need a new flag?

This is for the case where someone has multiple GPUs available on the system, but only wants to use a subset of them, or a different set than is used by default. My personal case is I have a Strix Halo with an eGPU, and I use --device to prevent the eGPU from being used.

Thanks, that makes sense! So this is the GPU device ID number?

Would you kindly update your PR description to include a couple examples of usage, such as

  • default: llamacpp does X
  • --device Y: now you see llamacpp running on the iGPU
  • --device Z: now you see llamacpp running on the eGPU

@genrtul
Copy link
Copy Markdown
Author

genrtul commented Mar 25, 2026

I've made the changes suggested by bitgamma and added the option in the docs where it seemed relevant. Taking into account superm1's comment I won't work on this further for now.

I've also added some examples to the OP

Copy link
Copy Markdown
Member

@bitgamma bitgamma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants