Skip to content

Add a user guide for Claude Code integration#1334

Draft
sawansri wants to merge 6 commits intomainfrom
sawansri/cc-user-guide
Draft

Add a user guide for Claude Code integration#1334
sawansri wants to merge 6 commits intomainfrom
sawansri/cc-user-guide

Conversation

@sawansri
Copy link
Copy Markdown
Collaborator

Resolves #1330, follow up PR to #1307

Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
@sawansri sawansri changed the title Adds a user guide for Claude Code integration Add a user guide for Claude Code integration Mar 10, 2026
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
@sawansri sawansri marked this pull request as ready for review March 10, 2026 16:45
@sawansri sawansri requested a review from jeremyfowers March 10, 2026 16:45
Co-authored-by: Jeremy Fowers <80718789+jeremyfowers@users.noreply.github.com>
Signed-off-by: Sawan Srivastava <sawan1210@gmail.com>
Copy link
Copy Markdown
Member

@jeremyfowers jeremyfowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guide writeup is a really useful tool for thinking about the current workflow and how it could be simplified. Let's take our time and really nail this. Sound good?

Comment on lines +40 to +55
### Step 1: Start Lemonade Server
In one terminal window, ensure the server is running:
```bash
lemonade-server serve --ctx-size 32768
```

We recommend starting the server with a context window size starting at 32768 tokens to accomodate for Claude Code's system prompt (20k+ tokens). Note that you might need to change this value depending on your hardware and project size.

### Step 2: Launch the Agent
Navigate to your project directory in another terminal and run:
```bash
lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF
```

**What happens under the hood?**
When you execute the `launch` command, Lemonade Server initiates a **concurrent load** of the specified model on the backend. This means the server starts loading the model into memory in a background thread, allowing the Claude Code CLI interface to start instantly without blocking.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brainstorming here...

Option 1

Would this workflow be simpler if we used lemonade-server run Qwen3.5-35B-A3B-GGUF --ctx-size 32768 ?

That way:

  • No need for a separate pull command
  • No need to explain "Lemonade Server initiates a concurrent load" since run makes the load intentional.
  • Downside: right now run always pops open the Lemonade app, which is not desirable here.

Option 2

Should lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF:

  1. Start the lemonade server, pull, and load the model (just like run does)
  2. Accept any argument that run does: that way --ctx-size, --llamapp-args, etc. all come from the same place. (note: there is helper code already for generating the CLI for run-like commands)

Then the whole thing is one command.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning more towards option 2 here, using the API gives us more flexibility.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


```bash
# Recommended for most coding tasks
lemonade-server pull Qwen3.5-35B-A3B-GGUF
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model work well on its own, or is it best to use the ThinkingCoder.json recipe? That will impact whether we need to introduce the concept of custom model recipes in this guide, and how they should interact with the launch command.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the ThinkingCoder recipe does have better performance for our agentic coding use case (the options are recommended by unsloth: https://unsloth.ai/docs/models/qwen3.5#recommended-settings). I do reference the wiki as a source of model recipes, so if we update that location, I think we should be ok.

I also elaborate on how launch should interact with recipes in my comment below, let me know what you think.

```bash
lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF
```

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a good spot for a screenshot. We can upload screenshots to https://github.com/lemonade-sdk/assets to avoid bloating this repo (see how that is used in the open webui guide in this same folder).

While you can manually pass arguments with `--llamacpp-args`, a more scalable approach is to use the model's saved configuration by passing `--use-recipe`.

```bash
lemonade-server launch claude -m Qwen3.5-35B-A3B-GGUF --use-recipe
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my quandry with the workflow: Qwen3.5-35B-A3B-ThinkingCoder.json uses some different settings than what you recommend above.

I need a way to combine your recommendations with the general Qwen3.5 Thinking suggestions from that recipe.

Option 1

The simplest way would be to populate the Lemonade Wiki with some specific recipe.json files for Claude Code. That way it is easy to use those as a starting point with --use-recipe. In that case, this guide should reference those recipe files and not the base models like Qwen3.5-35B-A3B-GGUF throughout.

Option 2

lemonade-server launch itself needs a way to programmatically do graceful blends of Claude Code specific args with generic args from recipes like Qwen3.5-35B-A3B-ThinkingCoder.json.

But Option 1 seems way easier.

wdyt?

Comment on lines +107 to +112
**Importing via Web UI**
Instead of manually editing the JSON file, you can also easily add recipes using the Lemonade Web Interface:
1. Open the Lemonade Web Interface (usually `http://localhost:8000`).
2. Navigate to the model management section.
3. Click on **"Import a model"**.
4. Upload the recipe configuration.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitgamma is there a way to import a recipe.json file using the CLI? That would blend a lot smoother into this guide.

Comment on lines +114 to +119
**Settings Priority**
When loading a model for a launched agent, Lemonade Server resolves settings in this order (highest priority first):
1. Explicit values passed in the load request (e.g., using `--llamacpp-args` via CLI).
2. Per-model values defined in `recipe_options.json` (used when `--use-recipe` is active).
3. Global environment variables (e.g., `LEMONADE_MAX_LOADED_MODELS`).
4. Hardcoded system defaults.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that by streamlining the workflow above we can avoid the need for documentation like this. Let's simplify!

@sawansri
Copy link
Copy Markdown
Collaborator Author

Thanks for the review! I guess I'll explain my vision for what the launch subcommand should look like.

The user starts out with lemonade-server launch claude.

  1. This queries the models endpoint which returns a list of valid models for the user to choose from. (If they have no models installed, possibly query with show all and let them pull any model as well?)
  2. After the user selects the model that they want to load, they are presented with the option of using model recipe or not (this option is skipped altogether if they use the --use-recipe flag). The recipe is automatically downloaded and imported (if this option exists through CLI).
  3. Launch claude

@jeremyfowers
Copy link
Copy Markdown
Member

Thanks for the review! I guess I'll explain my vision for what the launch subcommand should look like.

1. This queries the `models` endpoint which returns a list of valid models for the user to choose from. (If they have no models installed, possibly query with show all and let them pull any model as well?)

Nice, I like this interactivity. I wouldn't mind if run inherited it as well (since launch may be borrowing from run's code path as it is, perhaps this can all be shared?)

I always end up calling list before run so this will be great!

2. After the user selects the model that they want to load, they are presented with the option of using model recipe or not (this option is skipped altogether if they use the `--use-recipe` flag). The recipe is automatically downloaded and imported (if this option exists through CLI).

Programmatically downloading recipes probably requires some kind of recipe repo, right? I doubt the wiki will scale to this.

We could start such a repo now, for use specifically with Claude code, and see how it goes? If you think it would add value. If the base Qwen3.5-35B model is sufficiently strong, then we can skip this for now and just set reasonable default --llamacpp-args. This seems like the pivotal decision here.

@jeremyfowers jeremyfowers marked this pull request as draft March 23, 2026 14:03
@jeremyfowers
Copy link
Copy Markdown
Member

Setting to draft until we merge Spring Cleaning 1 and come up with a final plan.

@sawansri
Copy link
Copy Markdown
Collaborator Author

sawansri commented Mar 26, 2026

I was considering adding two or three more features/flags to launch on top of #1454, would love to get some thoughts. They are:

  1. support for --resume session-id flag and maybe a general --args: Claude Code offers session resuming as a feature, we currently don't have a way to leverage this.

There are two routes we could go with, the first being only enabling an --arg flag in which the user can pass in --resume session-id plus additional arguments (similar to the --llamacpp-args arg we already have but for agents).

The second approach is to explicitly define --resume session-id as a flag while also enabling --arg, imo this makes the command a lot cleaner and easier to type out. It also still exposes --arg for any additional arguments to claude the user might have.

One thing to consider is that this would be a lemonade launch claude only flag since agents like codex implement resume as an explicit subcommand and not a cli flag.

  1. --env flag - give user an option to pass in env variables as key-value pairs (this one's less crucial because users can prepend the launch command with their env variable or just set it through their shell)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Launch command followup: user guide

2 participants