Add a user guide for Claude Code integration by sawansri · Pull Request #1334 · lemonade-sdk/lemonade

sawansri · 2026-03-10T07:26:49Z

Resolves #1330, follow up PR to #1307

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

docs/server/apps/claude-code.md

Co-authored-by: Jeremy Fowers <80718789+jeremyfowers@users.noreply.github.com>

docs/server/apps/claude-code.md

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

jeremyfowers

This guide writeup is a really useful tool for thinking about the current workflow and how it could be simplified. Let's take our time and really nail this. Sound good?

jeremyfowers · 2026-03-10T17:42:29Z

docs/server/apps/claude-code.md

+### Step 1: Start Lemonade Server
+In one terminal window, ensure the server is running:
+```bash
+lemonade-server serve --ctx-size 32768
+```
+
+We recommend starting the server with a context window size starting at 32768 tokens to accomodate for Claude Code's system prompt (20k+ tokens). Note that you might need to change this value depending on your hardware and project size.
+
+### Step 2: Launch the Agent
+Navigate to your project directory in another terminal and run:
+```bash
+lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF
+```
+
+**What happens under the hood?**
+When you execute the `launch` command, Lemonade Server initiates a **concurrent load** of the specified model on the backend. This means the server starts loading the model into memory in a background thread, allowing the Claude Code CLI interface to start instantly without blocking.


Brainstorming here...

Option 1

Would this workflow be simpler if we used lemonade-server run Qwen3.5-35B-A3B-GGUF --ctx-size 32768 ?

That way:

No need for a separate pull command

No need to explain "Lemonade Server initiates a concurrent load" since run makes the load intentional.

Downside: right now run always pops open the Lemonade app, which is not desirable here.

Option 2

Should lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF:

Start the lemonade server, pull, and load the model (just like run does)

Accept any argument that run does: that way --ctx-size, --llamapp-args, etc. all come from the same place. (note: there is helper code already for generating the CLI for run-like commands)

Then the whole thing is one command.

I'm leaning more towards option 2 here, using the API gives us more flexibility.

Sounds good!

jeremyfowers · 2026-03-10T17:46:59Z

docs/server/apps/claude-code.md

+
+```bash
+# Recommended for most coding tasks
+lemonade-server pull Qwen3.5-35B-A3B-GGUF


Does this model work well on its own, or is it best to use the ThinkingCoder.json recipe? That will impact whether we need to introduce the concept of custom model recipes in this guide, and how they should interact with the launch command.

So the ThinkingCoder recipe does have better performance for our agentic coding use case (the options are recommended by unsloth: https://unsloth.ai/docs/models/qwen3.5#recommended-settings). I do reference the wiki as a source of model recipes, so if we update that location, I think we should be ok.

I also elaborate on how launch should interact with recipes in my comment below, let me know what you think.

jeremyfowers · 2026-03-10T17:48:24Z

docs/server/apps/claude-code.md

+```bash
+lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF
+```
+


This would be a good spot for a screenshot. We can upload screenshots to https://github.com/lemonade-sdk/assets to avoid bloating this repo (see how that is used in the open webui guide in this same folder).

jeremyfowers · 2026-03-10T20:01:23Z

docs/server/apps/claude-code.md

+While you can manually pass arguments with `--llamacpp-args`, a more scalable approach is to use the model's saved configuration by passing `--use-recipe`.
+
+```bash
+lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF --use-recipe


Here's my quandry with the workflow: Qwen3.5-35B-A3B-ThinkingCoder.json uses some different settings than what you recommend above.

I need a way to combine your recommendations with the general Qwen3.5 Thinking suggestions from that recipe.

Option 1

The simplest way would be to populate the Lemonade Wiki with some specific recipe.json files for Claude Code. That way it is easy to use those as a starting point with --use-recipe. In that case, this guide should reference those recipe files and not the base models like Qwen3.5-35B-A3B-GGUF throughout.

Option 2

lemonade-server launch itself needs a way to programmatically do graceful blends of Claude Code specific args with generic args from recipes like Qwen3.5-35B-A3B-ThinkingCoder.json.

But Option 1 seems way easier.

wdyt?

jeremyfowers · 2026-03-10T20:02:39Z

docs/server/apps/claude-code.md

+**Importing via Web UI**
+Instead of manually editing the JSON file, you can also easily add recipes using the Lemonade Web Interface:
+1. Open the Lemonade Web Interface (usually `http://localhost:8000`).
+2. Navigate to the model management section.
+3. Click on **"Import a model"**.
+4. Upload the recipe configuration.


@bitgamma is there a way to import a recipe.json file using the CLI? That would blend a lot smoother into this guide.

jeremyfowers · 2026-03-10T20:03:13Z

docs/server/apps/claude-code.md

+**Settings Priority**
+When loading a model for a launched agent, Lemonade Server resolves settings in this order (highest priority first):
+1. Explicit values passed in the load request (e.g., using `--llamacpp-args` via CLI).
+2. Per-model values defined in `recipe_options.json` (used when `--use-recipe` is active).
+3. Global environment variables (e.g., `LEMONADE_MAX_LOADED_MODELS`).
+4. Hardcoded system defaults.


I hope that by streamlining the workflow above we can avoid the need for documentation like this. Let's simplify!

sawansri · 2026-03-10T23:41:33Z

Thanks for the review! I guess I'll explain my vision for what the launch subcommand should look like.

The user starts out with lemonade-server launch claude.

This queries the models endpoint which returns a list of valid models for the user to choose from. (If they have no models installed, possibly query with show all and let them pull any model as well?)
After the user selects the model that they want to load, they are presented with the option of using model recipe or not (this option is skipped altogether if they use the --use-recipe flag). The recipe is automatically downloaded and imported (if this option exists through CLI).
Launch claude

jeremyfowers · 2026-03-11T20:25:29Z

Thanks for the review! I guess I'll explain my vision for what the launch subcommand should look like.

1. This queries the `models` endpoint which returns a list of valid models for the user to choose from. (If they have no models installed, possibly query with show all and let them pull any model as well?)

Nice, I like this interactivity. I wouldn't mind if run inherited it as well (since launch may be borrowing from run's code path as it is, perhaps this can all be shared?)

I always end up calling list before run so this will be great!

2. After the user selects the model that they want to load, they are presented with the option of using model recipe or not (this option is skipped altogether if they use the `--use-recipe` flag). The recipe is automatically downloaded and imported (if this option exists through CLI).

Programmatically downloading recipes probably requires some kind of recipe repo, right? I doubt the wiki will scale to this.

We could start such a repo now, for use specifically with Claude code, and see how it goes? If you think it would add value. If the base Qwen3.5-35B model is sufficiently strong, then we can skip this for now and just set reasonable default --llamacpp-args. This seems like the pivotal decision here.

jeremyfowers · 2026-03-23T14:03:53Z

Setting to draft until we merge Spring Cleaning 1 and come up with a final plan.

sawansri · 2026-03-26T23:52:16Z

I was considering adding two or three more features/flags to launch on top of #1454, would love to get some thoughts. They are:

support for --resume session-id flag and maybe a general --args: Claude Code offers session resuming as a feature, we currently don't have a way to leverage this.

There are two routes we could go with, the first being only enabling an --arg flag in which the user can pass in --resume session-id plus additional arguments (similar to the --llamacpp-args arg we already have but for agents).

The second approach is to explicitly define --resume session-id as a flag while also enabling --arg, imo this makes the command a lot cleaner and easier to type out. It also still exposes --arg for any additional arguments to claude the user might have.

One thing to consider is that this would be a lemonade launch claude only flag since agents like codex implement resume as an explicit subcommand and not a cli flag.

--env flag - give user an option to pass in env variables as key-value pairs (this one's less crucial because users can prepend the launch command with their env variable or just set it through their shell)

sawansri added 2 commits March 10, 2026 00:13

update default llama-server args for launch

aa7057f

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

add preliminary user guide

640bbbd

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

sawansri changed the title ~~Adds a user guide for Claude Code integration~~ Add a user guide for Claude Code integration Mar 10, 2026

sawansri added 2 commits March 10, 2026 09:33

more edits

1533fb9

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

add context window explanation

44d3f6b

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

sawansri marked this pull request as ready for review March 10, 2026 16:45

sawansri requested a review from jeremyfowers March 10, 2026 16:45

jeremyfowers reviewed Mar 10, 2026

View reviewed changes

docs/server/apps/claude-code.md Outdated Show resolved Hide resolved

Update docs/server/apps/claude-code.md

c24f0a8

Co-authored-by: Jeremy Fowers <80718789+jeremyfowers@users.noreply.github.com>

jeremyfowers reviewed Mar 10, 2026

View reviewed changes

docs/server/apps/claude-code.md Outdated Show resolved Hide resolved

add min ram warning

5d2738d

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>

jeremyfowers requested changes Mar 10, 2026

View reviewed changes

jeremyfowers assigned sawansri Mar 18, 2026

jeremyfowers marked this pull request as draft March 23, 2026 14:03

sawansri mentioned this pull request Mar 25, 2026

Implement improved launch subcommand #1454

Open

jeremyfowers added this to the Release v10.1.0 milestone Mar 26, 2026

Conversation

sawansri commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

jeremyfowers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Option 1

Option 2

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Option 1

Option 2

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sawansri commented Mar 10, 2026

Uh oh!

jeremyfowers commented Mar 11, 2026

Uh oh!

jeremyfowers commented Mar 23, 2026

Uh oh!

sawansri commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sawansri commented Mar 26, 2026 •

edited

Loading