Skip to content

Conversation

qimcis
Copy link
Contributor

@qimcis qimcis commented Sep 11, 2025

Overview:

Add chat_template_kwargs to v1/chat/completions to pass per‑request kwargs into the chat template allowing for request‑level behavior control without separate replicas.

Details:

  • Extended NvCreateChatCompletionRequest with optional chat_template_kwargs.
  • Works for streaming and non‑streaming; backward compatible (defaults to None).

Tested with Qwen3

Where should the reviewer start?

  • lib/llm/src/protocols/openai/chat_completions.rs
  • lib/llm/src/preprocessor/prompt.rs
  • lib/llm/src/preprocessor/prompt/template/oai.rs

Related Issues:

Summary by CodeRabbit

  • New Features
    • Added support for optional chat template keyword arguments in chat completion requests.
    • Allows passing extra context to influence template rendering on a per-request basis.
    • Fully backward-compatible: if not provided, behavior remains unchanged.
    • Enables finer control over output formatting without affecting streaming or error handling.

@qimcis qimcis requested a review from a team as a code owner September 11, 2025 16:06
Copy link

copy-pr-bot bot commented Sep 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Sep 11, 2025
@qimcis qimcis changed the title add chat_template_kwargs param feat: add chat_template_kwargs param to v1/chat/completion Sep 11, 2025
@github-actions github-actions bot added the feat label Sep 11, 2025
Copy link
Contributor

coderabbitai bot commented Sep 11, 2025

Walkthrough

Adds an optional chat_template_kwargs field to NvCreateChatCompletionRequest across construction sites, tests, and protocol types, and wires it into template rendering by exposing it via OAIChatLikeRequest and merging it into the Jinja context. Defaults to None and is skipped in serialization when absent.

Changes

Cohort / File(s) Summary
Entrypoints: add chat_template_kwargs to request construction
lib/llm/src/entrypoint/input/batch.rs, lib/llm/src/entrypoint/input/text.rs
Include chat_template_kwargs: None when building NvCreateChatCompletionRequest; no control-flow changes.
Protocol type: extend OpenAI request struct
lib/llm/src/protocols/openai/chat_completions.rs
Add pub chat_template_kwargs: Option<HashMap<String, serde_json::Value>> with serde defaults/skip. Public API change.
Protocol conversion: populate new field
lib/llm/src/protocols/openai/responses.rs
Set chat_template_kwargs: None in TryFrom<NvCreateResponse> for NvCreateChatCompletionRequest.
Preprocessor trait and template rendering
lib/llm/src/preprocessor/prompt.rs, lib/llm/src/preprocessor/prompt/template/oai.rs
Add OAIChatLikeRequest::chat_template_kwargs() (default None). Implement for NvCreateChatCompletionRequest. Merge provided kwargs into Jinja context in HfTokenizerConfigJsonFormatter::render (kwargs override existing keys).
HTTP service tests updated
lib/llm/src/http/service/openai.rs
Update test initializations to include chat_template_kwargs: None.
Integration and unit tests updated
lib/llm/tests/http-service.rs, lib/llm/tests/preprocessor.rs, lib/llm/tests/test_common_ext.rs
Adjust constructors/fixtures to include chat_template_kwargs: None; no assertion logic changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant HTTP as HTTP Service
    participant Proto as OpenAI Protocol
    participant Pre as Preprocessor
    participant Tpl as Template Renderer

    Client->>HTTP: NvCreateChatCompletionRequest { ..., chat_template_kwargs? }
    HTTP->>Proto: Forward request (struct includes chat_template_kwargs)
    Proto->>Pre: Build OAIChatLikeRequest view
    Pre->>Tpl: render(messages, system, context)
    note over Pre,Tpl: New: pass optional chat_template_kwargs
    Tpl->>Tpl: Merge base context + chat_template_kwargs (kwargs override)
    Tpl-->>Pre: Rendered prompt
    Pre-->>Proto: Prepared inputs
    Proto-->>HTTP: Completion response
    HTTP-->>Client: Stream/response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Poem

A whisk of keys, a sprinkle of args,
I hop through templates, dodging lags.
New kwargs bloom in Jinja light,
Merged at dusk, they guide the night.
Thump-thump! The prompts now sing—
A rabbit’s tweak, a subtle spring. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed The PR description follows the repository template and includes the required sections (Overview, Details, Where should the reviewer start, Related Issues); it describes the change, lists files to review, notes testing with Qwen3, and references issue #3013, so it is largely complete and actionable for reviewers.
Title Check ✅ Passed The title clearly and concisely describes the primary change: adding a chat_template_kwargs parameter to the v1 chat completion endpoint, and it matches the PR objectives and file-level changes that add the optional field and plumbing for per-request template kwargs. It is specific, not vague, and suitable for scan/readability.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
lib/llm/src/protocols/openai/chat_completions.rs (1)

57-60: Field addition looks good; consider a small type alias for readability and future reuse.

Def/new serde config is correct. To reduce repetition of the long type across modules, consider introducing a ChatTemplateKwargs alias and using it here.

+// Near other use statements or in a common module:
+type ChatTemplateKwargs = std::collections::HashMap<String, serde_json::Value>;

 // ...
-    pub chat_template_kwargs: Option<std::collections::HashMap<String, serde_json::Value>>,
+    pub chat_template_kwargs: Option<ChatTemplateKwargs>,
lib/llm/src/protocols/openai/responses.rs (1)

176-192: Defaulting chat_template_kwargs to None is fine; flag for future propagation.

If NvCreateResponse ever accepts per-request template kwargs, remember to propagate them here rather than forcing None.

lib/llm/src/preprocessor/prompt/template/oai.rs (1)

226-232: Be explicit about reserved-key overrides (messages/tools/bos/eos/add_generation_prompt).

Merging extra kwargs last means request-level data can override built-ins. If this is intentional, add a brief comment. If not, filter reserved keys or namespace the kwargs (e.g., request.*) to avoid foot-guns.

-        let ctx = if let Some(kwargs) = req.chat_template_kwargs() {
-            let extra = Value::from_serialize(kwargs);
-            context! { ..ctx, ..extra }
+        let ctx = if let Some(kwargs) = req.chat_template_kwargs() {
+            // Optional: prevent overriding reserved context names
+            // let reserved = ["messages","tools","bos_token","eos_token","unk_token","add_generation_prompt"];
+            // let filtered = kwargs.iter().filter(|(k,_)| !reserved.contains(&k.as_str()))
+            //     .collect::<std::collections::HashMap<_,_>>();
+            // let extra = Value::from_serialize(&filtered);
+            let extra = Value::from_serialize(kwargs);
+            context! { ..ctx, ..extra } // request-level values take precedence
         } else {
             ctx
         };
lib/llm/tests/preprocessor.rs (1)

269-275: Consider adding a precedence test for chat_template_kwargs.

Add a small test asserting that a kwarg (e.g., add_generation_prompt=false) or a custom key is visible in the rendered template and that precedence works as expected.

Example (new test function):

#[tokio::test]
async fn test_chat_template_kwargs_precedence() {
    // Build an MDC/formatter as in other tests
    let mdc = make_mdc_from_repo(
        "tests/data/sample-models",
        "meta-llama/Llama-3.1-70B-Instruct",
        "1605565",
        Some(vec![PromptContextMixin::Llama3DateTime]),
    ).await;
    let formatter = match PromptFormatter::from_mdc(&mdc).unwrap() {
        PromptFormatter::OAI(f) => f,
    };
    let mut req = Request::from(SINGLE_CHAT_MESSAGE, None, None, mdc.slug().to_string());
    // Inject a kwarg and ensure it's respected
    let mut map = std::collections::HashMap::new();
    map.insert("custom_var".to_string(), serde_json::json!("custom"));
    req.chat_template_kwargs = Some(map);
    let rendered = formatter.render(&req).unwrap();
    assert!(rendered.contains("custom")); // adapt to your template
}
lib/llm/tests/http-service.rs (1)

767-773: Add minimal coverage for non-None and BYOT JSON paths

To ensure deserialization and wiring don’t regress, add one positive test that sends chat_template_kwargs via the BYOT JSON client and succeeds (no need to validate templating here).

Example addition (outside these ranges) to the generic BYOT test:

// After the first successful BYOT streaming case
let request_with_kwargs = serde_json::json!({
    "model": "foo",
    "messages": [{ "role": "user", "content": "Hi" }],
    "stream": true,
    "max_tokens": 50,
    "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true }
});
let result = generic_byot_client.chat_stream(request_with_kwargs).await;
assert!(result.is_ok(), "chat_template_kwargs should deserialize and be accepted");

Also applies to: 807-813, 848-854

lib/llm/tests/test_common_ext.rs (2)

331-332: Assert it skips serialization when None

Strengthen serialization test to ensure the field is omitted when None.

Add (outside these lines) in test_serialization_preserves_structure after computing json:

assert!(json.get("chat_template_kwargs").is_none(), "Field should be skipped when None");

63-72: Add roundtrip test for Some(...)

Cover positive case to catch schema regressions (types and serde attributes).

New test (outside these lines):

#[test]
fn test_chat_template_kwargs_roundtrip() {
    use dynamo_llm::protocols::openai::chat_completions::NvCreateChatCompletionRequest;

    let req_json = serde_json::json!({
        "model": "test-model",
        "messages": [{ "role": "user", "content": "Hello" }],
        "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true }
    });

    let request: NvCreateChatCompletionRequest = serde_json::from_value(req_json).unwrap();

    // Serialize back and verify presence and contents
    let out = serde_json::to_value(&request).unwrap();
    assert!(out.get("chat_template_kwargs").is_some());
    assert_eq!(out["chat_template_kwargs"]["echo"], "on");
    assert_eq!(out["chat_template_kwargs"]["n"], 3);
    assert_eq!(out["chat_template_kwargs"]["flag"], true);
}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b2826a and 4a8afd0.

📒 Files selected for processing (10)
  • lib/llm/src/entrypoint/input/batch.rs (1 hunks)
  • lib/llm/src/entrypoint/input/text.rs (1 hunks)
  • lib/llm/src/http/service/openai.rs (2 hunks)
  • lib/llm/src/preprocessor/prompt.rs (2 hunks)
  • lib/llm/src/preprocessor/prompt/template/oai.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions.rs (1 hunks)
  • lib/llm/src/protocols/openai/responses.rs (1 hunks)
  • lib/llm/tests/http-service.rs (3 hunks)
  • lib/llm/tests/preprocessor.rs (1 hunks)
  • lib/llm/tests/test_common_ext.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/llm/src/entrypoint/input/batch.rs (1)
lib/llm/tests/http_metrics.rs (1)
  • None (17-17)
lib/llm/src/preprocessor/prompt.rs (1)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
  • chat_template_kwargs (130-132)
lib/llm/src/entrypoint/input/text.rs (1)
lib/llm/tests/http_metrics.rs (1)
  • None (17-17)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
lib/llm/src/preprocessor/prompt.rs (1)
  • chat_template_kwargs (74-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
lib/llm/src/entrypoint/input/text.rs (1)

121-126: LGTM.

Initializing chat_template_kwargs to None keeps behavior unchanged for the CLI path.

lib/llm/src/http/service/openai.rs (2)

1318-1329: LGTM (tests).

Adding chat_template_kwargs: None in the empty-messages test maintains parity with the struct definition.


1340-1356: LGTM (tests).

Same here; request construction reflects the new optional field.

lib/llm/src/entrypoint/input/batch.rs (1)

226-233: LGTM.

Batch path initializes the new field explicitly; no behavior change.

lib/llm/tests/http-service.rs (1)

767-773: LGTM: explicit None preserves backward compatibility

Setting chat_template_kwargs: None in all constructions is correct and keeps existing behavior unchanged.

Also applies to: 807-813, 848-854

lib/llm/tests/test_common_ext.rs (1)

63-72: LGTM: explicit None in struct initializers

The additions align with the new optional field and keep tests stable.

Also applies to: 327-333, 372-383

@@ -32,6 +32,7 @@

use anyhow::Result;
use minijinja::value::Value;
use std::collections::HashMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid cloning: return a reference from the trait.

Returning an owned HashMap forces a clone in implementers. Prefer borrowing to reduce allocations while keeping the trait object-safe.

-use std::collections::HashMap;
+use std::collections::HashMap;

 // ...
-    /// Optional additional kwargs to merge into the chat template context
-    fn chat_template_kwargs(&self) -> Option<HashMap<String, serde_json::Value>> {
-        None
-    }
+    /// Optional additional kwargs to merge into the chat template context
+    fn chat_template_kwargs(&self) -> Option<&HashMap<String, serde_json::Value>> {
+        None
+    }

Also applies to: 73-76

🤖 Prompt for AI Agents
In lib/llm/src/preprocessor/prompt.rs around lines 35 and also applying to lines
73-76, the trait methods currently return an owned HashMap which forces
implementers to clone; change the trait signatures to return a reference tied to
self (e.g. fn some_map(&self) -> &HashMap<...>) so implementers can return a
reference without cloning, update all implementations to return references to
their internal maps, and then update call sites to use the borrowed &HashMap
instead of expecting ownership (or clone there if ownership is truly required).

Comment on lines +130 to +132
fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {
self.chat_template_kwargs.clone()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align with borrowed return to avoid cloning.

If the trait is changed to return a reference, update this impl accordingly.

-    fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {
-        self.chat_template_kwargs.clone()
-    }
+    fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> {
+        self.chat_template_kwargs.as_ref()
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {
self.chat_template_kwargs.clone()
}
fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> {
self.chat_template_kwargs.as_ref()
}
🤖 Prompt for AI Agents
In lib/llm/src/preprocessor/prompt/template/oai.rs around lines 130 to 132, the
impl currently clones self.chat_template_kwargs but the trait now expects a
borrowed return; change the method to return an Option reference by returning
self.chat_template_kwargs.as_ref() (or as_deref() as appropriate) so you pass a
reference instead of cloning the HashMap and match the trait signature.

@grahamking
Copy link
Contributor

I don't think kwargs is the correct term. That's a Python thing. More discussion on issue.

@grahamking
Copy link
Contributor

@qimcis Thanks for sending this. Could you:

  • Rename it to chat_template_args (no kw).
  • Address the Code Rabbit review comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-contribution Pull request is from an external contributor feat size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants