feat: add chat_template_kwargs param to v1/chat/completion #3016

qimcis · 2025-09-11T16:06:10Z

Overview:

Add chat_template_kwargs to v1/chat/completions to pass per‑request kwargs into the chat template allowing for request‑level behavior control without separate replicas.

Details:

Extended NvCreateChatCompletionRequest with optional chat_template_kwargs.
Works for streaming and non‑streaming; backward compatible (defaults to None).

Tested with Qwen3

Where should the reviewer start?

lib/llm/src/protocols/openai/chat_completions.rs
lib/llm/src/preprocessor/prompt.rs
lib/llm/src/preprocessor/prompt/template/oai.rs

Related Issues:

Resolves issue [FEATURE]: Add chat_template_kwargs Parameter to v1/chat/completion for Enhanced Model Behavior Control #3013

Summary by CodeRabbit

New Features
- Added support for optional chat template keyword arguments in chat completion requests.
- Allows passing extra context to influence template rendering on a per-request basis.
- Fully backward-compatible: if not provided, behavior remains unchanged.
- Enables finer control over output formatting without affecting streaming or error handling.

Signed-off-by: Chi McIsaac <[email protected]>

copy-pr-bot · 2025-09-11T16:06:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-09-11T16:06:19Z

👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-09-11T16:15:19Z

Walkthrough

Adds an optional chat_template_kwargs field to NvCreateChatCompletionRequest across construction sites, tests, and protocol types, and wires it into template rendering by exposing it via OAIChatLikeRequest and merging it into the Jinja context. Defaults to None and is skipped in serialization when absent.

Changes

Cohort / File(s)	Summary
Entrypoints: add `chat_template_kwargs` to request construction `lib/llm/src/entrypoint/input/batch.rs`, `lib/llm/src/entrypoint/input/text.rs`	Include `chat_template_kwargs: None` when building `NvCreateChatCompletionRequest`; no control-flow changes.
Protocol type: extend OpenAI request struct `lib/llm/src/protocols/openai/chat_completions.rs`	Add `pub chat_template_kwargs: Option<HashMap<String, serde_json::Value>>` with serde defaults/skip. Public API change.
Protocol conversion: populate new field `lib/llm/src/protocols/openai/responses.rs`	Set `chat_template_kwargs: None` in `TryFrom<NvCreateResponse> for NvCreateChatCompletionRequest`.
Preprocessor trait and template rendering `lib/llm/src/preprocessor/prompt.rs`, `lib/llm/src/preprocessor/prompt/template/oai.rs`	Add `OAIChatLikeRequest::chat_template_kwargs()` (default None). Implement for `NvCreateChatCompletionRequest`. Merge provided kwargs into Jinja context in `HfTokenizerConfigJsonFormatter::render` (kwargs override existing keys).
HTTP service tests updated `lib/llm/src/http/service/openai.rs`	Update test initializations to include `chat_template_kwargs: None`.
Integration and unit tests updated `lib/llm/tests/http-service.rs`, `lib/llm/tests/preprocessor.rs`, `lib/llm/tests/test_common_ext.rs`	Adjust constructors/fixtures to include `chat_template_kwargs: None`; no assertion logic changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant HTTP as HTTP Service
    participant Proto as OpenAI Protocol
    participant Pre as Preprocessor
    participant Tpl as Template Renderer

    Client->>HTTP: NvCreateChatCompletionRequest { ..., chat_template_kwargs? }
    HTTP->>Proto: Forward request (struct includes chat_template_kwargs)
    Proto->>Pre: Build OAIChatLikeRequest view
    Pre->>Tpl: render(messages, system, context)
    note over Pre,Tpl: New: pass optional chat_template_kwargs
    Tpl->>Tpl: Merge base context + chat_template_kwargs (kwargs override)
    Tpl-->>Pre: Rendered prompt
    Pre-->>Proto: Prepared inputs
    Proto-->>HTTP: Completion response
    HTTP-->>Client: Stream/response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

[FEATURE]: Add chat_template_kwargs Parameter to v1/chat/completion for Enhanced Model Behavior Control #3013 — Implements the optional chat_template_kwargs parameter for v1/chat/completions and merges it into template rendering.

Possibly related PRs

fix: Do not apply chat template to completions #2718 — Both modify the chat/template preprocessing path; this PR adds per-request kwargs while that one adjusts when templates apply.
chore: deprecate duplicate params in nvext #2754 — Both change the OpenAI chat completions surface and accessors for request fields.
feat: Add frontend support for min_tokens and ignore_eos (outside of nvext) and Structured Output / Guided Decoding #2380 — Both alter the NvCreateChatCompletionRequest struct signature by adding top-level fields.

Poem

A whisk of keys, a sprinkle of args,
I hop through templates, dodging lags.
New kwargs bloom in Jinja light,
Merged at dusk, they guide the night.
Thump-thump! The prompts now sing—
A rabbit’s tweak, a subtle spring. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	The PR description follows the repository template and includes the required sections (Overview, Details, Where should the reviewer start, Related Issues); it describes the change, lists files to review, notes testing with Qwen3, and references issue #3013, so it is largely complete and actionable for reviewers.
Title Check	✅ Passed	The title clearly and concisely describes the primary change: adding a chat_template_kwargs parameter to the v1 chat completion endpoint, and it matches the PR objectives and file-level changes that add the optional field and plumbing for per-request template kwargs. It is specific, not vague, and suitable for scan/readability.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

lib/llm/src/protocols/openai/chat_completions.rs (1)
57-60: Field addition looks good; consider a small type alias for readability and future reuse.

Def/new serde config is correct. To reduce repetition of the long type across modules, consider introducing a ChatTemplateKwargs alias and using it here.
+// Near other use statements or in a common module:
+type ChatTemplateKwargs = std::collections::HashMap<String, serde_json::Value>;

 // ...
-    pub chat_template_kwargs: Option<std::collections::HashMap<String, serde_json::Value>>,
+    pub chat_template_kwargs: Option<ChatTemplateKwargs>,
lib/llm/src/protocols/openai/responses.rs (1)

176-192: Defaulting chat_template_kwargs to None is fine; flag for future propagation.

If NvCreateResponse ever accepts per-request template kwargs, remember to propagate them here rather than forcing None.
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
226-232: Be explicit about reserved-key overrides (messages/tools/bos/eos/add_generation_prompt).

Merging extra kwargs last means request-level data can override built-ins. If this is intentional, add a brief comment. If not, filter reserved keys or namespace the kwargs (e.g., request.*) to avoid foot-guns.
-        let ctx = if let Some(kwargs) = req.chat_template_kwargs() {
-            let extra = Value::from_serialize(kwargs);
-            context! { ..ctx, ..extra }
+        let ctx = if let Some(kwargs) = req.chat_template_kwargs() {
+            // Optional: prevent overriding reserved context names
+            // let reserved = ["messages","tools","bos_token","eos_token","unk_token","add_generation_prompt"];
+            // let filtered = kwargs.iter().filter(|(k,_)| !reserved.contains(&k.as_str()))
+            //     .collect::<std::collections::HashMap<_,_>>();
+            // let extra = Value::from_serialize(&filtered);
+            let extra = Value::from_serialize(kwargs);
+            context! { ..ctx, ..extra } // request-level values take precedence
         } else {
             ctx
         };
lib/llm/tests/preprocessor.rs (1)
269-275: Consider adding a precedence test for chat_template_kwargs.

Add a small test asserting that a kwarg (e.g., add_generation_prompt=false) or a custom key is visible in the rendered template and that precedence works as expected.

Example (new test function):
#[tokio::test]
async fn test_chat_template_kwargs_precedence() {
    // Build an MDC/formatter as in other tests
    let mdc = make_mdc_from_repo(
        "tests/data/sample-models",
        "meta-llama/Llama-3.1-70B-Instruct",
        "1605565",
        Some(vec![PromptContextMixin::Llama3DateTime]),
    ).await;
    let formatter = match PromptFormatter::from_mdc(&mdc).unwrap() {
        PromptFormatter::OAI(f) => f,
    };
    let mut req = Request::from(SINGLE_CHAT_MESSAGE, None, None, mdc.slug().to_string());
    // Inject a kwarg and ensure it's respected
    let mut map = std::collections::HashMap::new();
    map.insert("custom_var".to_string(), serde_json::json!("custom"));
    req.chat_template_kwargs = Some(map);
    let rendered = formatter.render(&req).unwrap();
    assert!(rendered.contains("custom")); // adapt to your template
}
lib/llm/tests/http-service.rs (1)
767-773: Add minimal coverage for non-None and BYOT JSON paths

To ensure deserialization and wiring don’t regress, add one positive test that sends chat_template_kwargs via the BYOT JSON client and succeeds (no need to validate templating here).

Example addition (outside these ranges) to the generic BYOT test:
// After the first successful BYOT streaming case
let request_with_kwargs = serde_json::json!({
    "model": "foo",
    "messages": [{ "role": "user", "content": "Hi" }],
    "stream": true,
    "max_tokens": 50,
    "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true }
});
let result = generic_byot_client.chat_stream(request_with_kwargs).await;
assert!(result.is_ok(), "chat_template_kwargs should deserialize and be accepted");
Also applies to: 807-813, 848-854
lib/llm/tests/test_common_ext.rs (2)
331-332: Assert it skips serialization when None

Strengthen serialization test to ensure the field is omitted when None.

Add (outside these lines) in test_serialization_preserves_structure after computing json:
assert!(json.get("chat_template_kwargs").is_none(), "Field should be skipped when None");
63-72: Add roundtrip test for Some(...)

Cover positive case to catch schema regressions (types and serde attributes).

New test (outside these lines):
#[test]
fn test_chat_template_kwargs_roundtrip() {
    use dynamo_llm::protocols::openai::chat_completions::NvCreateChatCompletionRequest;

    let req_json = serde_json::json!({
        "model": "test-model",
        "messages": [{ "role": "user", "content": "Hello" }],
        "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true }
    });

    let request: NvCreateChatCompletionRequest = serde_json::from_value(req_json).unwrap();

    // Serialize back and verify presence and contents
    let out = serde_json::to_value(&request).unwrap();
    assert!(out.get("chat_template_kwargs").is_some());
    assert_eq!(out["chat_template_kwargs"]["echo"], "on");
    assert_eq!(out["chat_template_kwargs"]["n"], 3);
    assert_eq!(out["chat_template_kwargs"]["flag"], true);
}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2826a and 4a8afd0.

📒 Files selected for processing (10)

lib/llm/src/entrypoint/input/batch.rs (1 hunks)
lib/llm/src/entrypoint/input/text.rs (1 hunks)
lib/llm/src/http/service/openai.rs (2 hunks)
lib/llm/src/preprocessor/prompt.rs (2 hunks)
lib/llm/src/preprocessor/prompt/template/oai.rs (2 hunks)
lib/llm/src/protocols/openai/chat_completions.rs (1 hunks)
lib/llm/src/protocols/openai/responses.rs (1 hunks)
lib/llm/tests/http-service.rs (3 hunks)
lib/llm/tests/preprocessor.rs (1 hunks)
lib/llm/tests/test_common_ext.rs (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

lib/llm/src/entrypoint/input/batch.rs (1)

lib/llm/tests/http_metrics.rs (1)

None (17-17)

lib/llm/src/preprocessor/prompt.rs (1)

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

chat_template_kwargs (130-132)

lib/llm/src/entrypoint/input/text.rs (1)

lib/llm/tests/http_metrics.rs (1)

None (17-17)

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

lib/llm/src/preprocessor/prompt.rs (1)

chat_template_kwargs (74-76)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (6)

lib/llm/src/entrypoint/input/text.rs (1)

121-126: LGTM.

Initializing chat_template_kwargs to None keeps behavior unchanged for the CLI path.

lib/llm/src/http/service/openai.rs (2)

1318-1329: LGTM (tests).

Adding chat_template_kwargs: None in the empty-messages test maintains parity with the struct definition.

1340-1356: LGTM (tests).

Same here; request construction reflects the new optional field.

lib/llm/src/entrypoint/input/batch.rs (1)

226-233: LGTM.

Batch path initializes the new field explicitly; no behavior change.

lib/llm/tests/http-service.rs (1)

767-773: LGTM: explicit None preserves backward compatibility

Setting chat_template_kwargs: None in all constructions is correct and keeps existing behavior unchanged.

Also applies to: 807-813, 848-854

lib/llm/tests/test_common_ext.rs (1)

63-72: LGTM: explicit None in struct initializers

The additions align with the new optional field and keep tests stable.

Also applies to: 327-333, 372-383

coderabbitai · 2025-09-11T16:15:23Z

lib/llm/src/preprocessor/prompt.rs

@@ -32,6 +32,7 @@

 use anyhow::Result;
 use minijinja::value::Value;
+use std::collections::HashMap;


🛠️ Refactor suggestion

Avoid cloning: return a reference from the trait.

Returning an owned HashMap forces a clone in implementers. Prefer borrowing to reduce allocations while keeping the trait object-safe.

-use std::collections::HashMap; +use std::collections::HashMap; // ... - /// Optional additional kwargs to merge into the chat template context - fn chat_template_kwargs(&self) -> Option<HashMap<String, serde_json::Value>> { - None - } + /// Optional additional kwargs to merge into the chat template context + fn chat_template_kwargs(&self) -> Option<&HashMap<String, serde_json::Value>> { + None + }

Also applies to: 73-76

🤖 Prompt for AI Agents

In lib/llm/src/preprocessor/prompt.rs around lines 35 and also applying to lines 73-76, the trait methods currently return an owned HashMap which forces implementers to clone; change the trait signatures to return a reference tied to self (e.g. fn some_map(&self) -> &HashMap<...>) so implementers can return a reference without cloning, update all implementations to return references to their internal maps, and then update call sites to use the borrowed &HashMap instead of expecting ownership (or clone there if ownership is truly required).

coderabbitai · 2025-09-11T16:15:23Z

lib/llm/src/preprocessor/prompt/template/oai.rs

+    fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {
+        self.chat_template_kwargs.clone()
+    }


🛠️ Refactor suggestion

Align with borrowed return to avoid cloning.

If the trait is changed to return a reference, update this impl accordingly.

- fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> { - self.chat_template_kwargs.clone() - } + fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> { + self.chat_template_kwargs.as_ref() + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {

self.chat_template_kwargs.clone()

}

fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> {

self.chat_template_kwargs.as_ref()

}

🤖 Prompt for AI Agents

In lib/llm/src/preprocessor/prompt/template/oai.rs around lines 130 to 132, the impl currently clones self.chat_template_kwargs but the trait now expects a borrowed return; change the method to return an Option reference by returning self.chat_template_kwargs.as_ref() (or as_deref() as appropriate) so you pass a reference instead of cloning the HashMap and match the trait signature.

grahamking · 2025-09-15T17:04:57Z

I don't think kwargs is the correct term. That's a Python thing. More discussion on issue.

grahamking · 2025-09-16T14:58:44Z

@qimcis Thanks for sending this. Could you:

Rename it to chat_template_args (no kw).
Address the Code Rabbit review comments.

add chat_template_kwargs param

4a8afd0

Signed-off-by: Chi McIsaac <[email protected]>

qimcis requested a review from a team as a code owner September 11, 2025 16:06

pull-request-size bot added the size/M label Sep 11, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Sep 11, 2025

qimcis changed the title ~~add chat_template_kwargs param~~ feat: add chat_template_kwargs param to v1/chat/completion Sep 11, 2025

github-actions bot added the feat label Sep 11, 2025

coderabbitai bot reviewed Sep 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add chat_template_kwargs param to v1/chat/completion #3016

feat: add chat_template_kwargs param to v1/chat/completion #3016

qimcis commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

coderabbitai bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

coderabbitai bot Sep 11, 2025

Uh oh!

grahamking commented Sep 15, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

Uh oh!

feat: add chat_template_kwargs param to v1/chat/completion #3016

Are you sure you want to change the base?

feat: add chat_template_kwargs param to v1/chat/completion #3016

Conversation

qimcis commented Sep 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

coderabbitai bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Pre-merge checks (2 passed, 1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

grahamking commented Sep 15, 2025

Uh oh!

grahamking commented Sep 16, 2025

Uh oh!

Uh oh!

qimcis commented Sep 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 11, 2025 •

edited

Loading