Add a user guide for Claude Code integration#1334
Conversation
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Co-authored-by: Jeremy Fowers <80718789+jeremyfowers@users.noreply.github.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
jeremyfowers
left a comment
There was a problem hiding this comment.
This guide writeup is a really useful tool for thinking about the current workflow and how it could be simplified. Let's take our time and really nail this. Sound good?
| ### Step 1: Start Lemonade Server | ||
| In one terminal window, ensure the server is running: | ||
| ```bash | ||
| lemonade-server serve --ctx-size 32768 | ||
| ``` | ||
|
|
||
| We recommend starting the server with a context window size starting at 32768 tokens to accomodate for Claude Code's system prompt (20k+ tokens). Note that you might need to change this value depending on your hardware and project size. | ||
|
|
||
| ### Step 2: Launch the Agent | ||
| Navigate to your project directory in another terminal and run: | ||
| ```bash | ||
| lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF | ||
| ``` | ||
|
|
||
| **What happens under the hood?** | ||
| When you execute the `launch` command, Lemonade Server initiates a **concurrent load** of the specified model on the backend. This means the server starts loading the model into memory in a background thread, allowing the Claude Code CLI interface to start instantly without blocking. |
There was a problem hiding this comment.
Brainstorming here...
Option 1
Would this workflow be simpler if we used lemonade-server run Qwen3.5-35B-A3B-GGUF --ctx-size 32768 ?
That way:
- No need for a separate pull command
- No need to explain "Lemonade Server initiates a concurrent load" since
runmakes the load intentional. - Downside: right now
runalways pops open the Lemonade app, which is not desirable here.
Option 2
Should lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF:
- Start the lemonade server, pull, and load the model (just like
rundoes) - Accept any argument that
rundoes: that way--ctx-size,--llamapp-args, etc. all come from the same place. (note: there is helper code already for generating the CLI forrun-like commands)
Then the whole thing is one command.
There was a problem hiding this comment.
I'm leaning more towards option 2 here, using the API gives us more flexibility.
|
|
||
| ```bash | ||
| # Recommended for most coding tasks | ||
| lemonade-server pull Qwen3.5-35B-A3B-GGUF |
There was a problem hiding this comment.
Does this model work well on its own, or is it best to use the ThinkingCoder.json recipe? That will impact whether we need to introduce the concept of custom model recipes in this guide, and how they should interact with the launch command.
There was a problem hiding this comment.
So the ThinkingCoder recipe does have better performance for our agentic coding use case (the options are recommended by unsloth: https://unsloth.ai/docs/models/qwen3.5#recommended-settings). I do reference the wiki as a source of model recipes, so if we update that location, I think we should be ok.
I also elaborate on how launch should interact with recipes in my comment below, let me know what you think.
| ```bash | ||
| lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF | ||
| ``` | ||
|
|
There was a problem hiding this comment.
This would be a good spot for a screenshot. We can upload screenshots to https://github.com/lemonade-sdk/assets to avoid bloating this repo (see how that is used in the open webui guide in this same folder).
| While you can manually pass arguments with `--llamacpp-args`, a more scalable approach is to use the model's saved configuration by passing `--use-recipe`. | ||
|
|
||
| ```bash | ||
| lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF --use-recipe |
There was a problem hiding this comment.
Here's my quandry with the workflow: Qwen3.5-35B-A3B-ThinkingCoder.json uses some different settings than what you recommend above.
I need a way to combine your recommendations with the general Qwen3.5 Thinking suggestions from that recipe.
Option 1
The simplest way would be to populate the Lemonade Wiki with some specific recipe.json files for Claude Code. That way it is easy to use those as a starting point with --use-recipe. In that case, this guide should reference those recipe files and not the base models like Qwen3.5-35B-A3B-GGUF throughout.
Option 2
lemonade-server launch itself needs a way to programmatically do graceful blends of Claude Code specific args with generic args from recipes like Qwen3.5-35B-A3B-ThinkingCoder.json.
But Option 1 seems way easier.
wdyt?
| **Importing via Web UI** | ||
| Instead of manually editing the JSON file, you can also easily add recipes using the Lemonade Web Interface: | ||
| 1. Open the Lemonade Web Interface (usually `http://localhost:8000`). | ||
| 2. Navigate to the model management section. | ||
| 3. Click on **"Import a model"**. | ||
| 4. Upload the recipe configuration. |
There was a problem hiding this comment.
@bitgamma is there a way to import a recipe.json file using the CLI? That would blend a lot smoother into this guide.
| **Settings Priority** | ||
| When loading a model for a launched agent, Lemonade Server resolves settings in this order (highest priority first): | ||
| 1. Explicit values passed in the load request (e.g., using `--llamacpp-args` via CLI). | ||
| 2. Per-model values defined in `recipe_options.json` (used when `--use-recipe` is active). | ||
| 3. Global environment variables (e.g., `LEMONADE_MAX_LOADED_MODELS`). | ||
| 4. Hardcoded system defaults. |
There was a problem hiding this comment.
I hope that by streamlining the workflow above we can avoid the need for documentation like this. Let's simplify!
|
Thanks for the review! I guess I'll explain my vision for what the The user starts out with
|
Nice, I like this interactivity. I wouldn't mind if I always end up calling
Programmatically downloading recipes probably requires some kind of recipe repo, right? I doubt the wiki will scale to this. We could start such a repo now, for use specifically with Claude code, and see how it goes? If you think it would add value. If the base Qwen3.5-35B model is sufficiently strong, then we can skip this for now and just set reasonable default |
|
Setting to draft until we merge Spring Cleaning 1 and come up with a final plan. |
|
I was considering adding two or three more features/flags to
There are two routes we could go with, the first being only enabling an The second approach is to explicitly define One thing to consider is that this would be a
|
Resolves #1330, follow up PR to #1307