-
Notifications
You must be signed in to change notification settings - Fork 595
feat: add chat_template_kwargs param to v1/chat/completion #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Chi McIsaac <[email protected]>
👋 Hi qimcis! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughAdds an optional chat_template_kwargs field to NvCreateChatCompletionRequest across construction sites, tests, and protocol types, and wires it into template rendering by exposing it via OAIChatLikeRequest and merging it into the Jinja context. Defaults to None and is skipped in serialization when absent. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant HTTP as HTTP Service
participant Proto as OpenAI Protocol
participant Pre as Preprocessor
participant Tpl as Template Renderer
Client->>HTTP: NvCreateChatCompletionRequest { ..., chat_template_kwargs? }
HTTP->>Proto: Forward request (struct includes chat_template_kwargs)
Proto->>Pre: Build OAIChatLikeRequest view
Pre->>Tpl: render(messages, system, context)
note over Pre,Tpl: New: pass optional chat_template_kwargs
Tpl->>Tpl: Merge base context + chat_template_kwargs (kwargs override)
Tpl-->>Pre: Rendered prompt
Pre-->>Proto: Prepared inputs
Proto-->>HTTP: Completion response
HTTP-->>Client: Stream/response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Pre-merge checks (2 passed, 1 warning)❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
lib/llm/src/protocols/openai/chat_completions.rs (1)
57-60
: Field addition looks good; consider a small type alias for readability and future reuse.Def/new serde config is correct. To reduce repetition of the long type across modules, consider introducing a ChatTemplateKwargs alias and using it here.
+// Near other use statements or in a common module: +type ChatTemplateKwargs = std::collections::HashMap<String, serde_json::Value>; // ... - pub chat_template_kwargs: Option<std::collections::HashMap<String, serde_json::Value>>, + pub chat_template_kwargs: Option<ChatTemplateKwargs>,lib/llm/src/protocols/openai/responses.rs (1)
176-192
: Defaulting chat_template_kwargs to None is fine; flag for future propagation.If NvCreateResponse ever accepts per-request template kwargs, remember to propagate them here rather than forcing None.
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
226-232
: Be explicit about reserved-key overrides (messages/tools/bos/eos/add_generation_prompt).Merging extra kwargs last means request-level data can override built-ins. If this is intentional, add a brief comment. If not, filter reserved keys or namespace the kwargs (e.g., request.*) to avoid foot-guns.
- let ctx = if let Some(kwargs) = req.chat_template_kwargs() { - let extra = Value::from_serialize(kwargs); - context! { ..ctx, ..extra } + let ctx = if let Some(kwargs) = req.chat_template_kwargs() { + // Optional: prevent overriding reserved context names + // let reserved = ["messages","tools","bos_token","eos_token","unk_token","add_generation_prompt"]; + // let filtered = kwargs.iter().filter(|(k,_)| !reserved.contains(&k.as_str())) + // .collect::<std::collections::HashMap<_,_>>(); + // let extra = Value::from_serialize(&filtered); + let extra = Value::from_serialize(kwargs); + context! { ..ctx, ..extra } // request-level values take precedence } else { ctx };lib/llm/tests/preprocessor.rs (1)
269-275
: Consider adding a precedence test for chat_template_kwargs.Add a small test asserting that a kwarg (e.g., add_generation_prompt=false) or a custom key is visible in the rendered template and that precedence works as expected.
Example (new test function):
#[tokio::test] async fn test_chat_template_kwargs_precedence() { // Build an MDC/formatter as in other tests let mdc = make_mdc_from_repo( "tests/data/sample-models", "meta-llama/Llama-3.1-70B-Instruct", "1605565", Some(vec![PromptContextMixin::Llama3DateTime]), ).await; let formatter = match PromptFormatter::from_mdc(&mdc).unwrap() { PromptFormatter::OAI(f) => f, }; let mut req = Request::from(SINGLE_CHAT_MESSAGE, None, None, mdc.slug().to_string()); // Inject a kwarg and ensure it's respected let mut map = std::collections::HashMap::new(); map.insert("custom_var".to_string(), serde_json::json!("custom")); req.chat_template_kwargs = Some(map); let rendered = formatter.render(&req).unwrap(); assert!(rendered.contains("custom")); // adapt to your template }lib/llm/tests/http-service.rs (1)
767-773
: Add minimal coverage for non-None and BYOT JSON pathsTo ensure deserialization and wiring don’t regress, add one positive test that sends chat_template_kwargs via the BYOT JSON client and succeeds (no need to validate templating here).
Example addition (outside these ranges) to the generic BYOT test:
// After the first successful BYOT streaming case let request_with_kwargs = serde_json::json!({ "model": "foo", "messages": [{ "role": "user", "content": "Hi" }], "stream": true, "max_tokens": 50, "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true } }); let result = generic_byot_client.chat_stream(request_with_kwargs).await; assert!(result.is_ok(), "chat_template_kwargs should deserialize and be accepted");Also applies to: 807-813, 848-854
lib/llm/tests/test_common_ext.rs (2)
331-332
: Assert it skips serialization when NoneStrengthen serialization test to ensure the field is omitted when None.
Add (outside these lines) in test_serialization_preserves_structure after computing json:
assert!(json.get("chat_template_kwargs").is_none(), "Field should be skipped when None");
63-72
: Add roundtrip test for Some(...)Cover positive case to catch schema regressions (types and serde attributes).
New test (outside these lines):
#[test] fn test_chat_template_kwargs_roundtrip() { use dynamo_llm::protocols::openai::chat_completions::NvCreateChatCompletionRequest; let req_json = serde_json::json!({ "model": "test-model", "messages": [{ "role": "user", "content": "Hello" }], "chat_template_kwargs": { "echo": "on", "n": 3, "flag": true } }); let request: NvCreateChatCompletionRequest = serde_json::from_value(req_json).unwrap(); // Serialize back and verify presence and contents let out = serde_json::to_value(&request).unwrap(); assert!(out.get("chat_template_kwargs").is_some()); assert_eq!(out["chat_template_kwargs"]["echo"], "on"); assert_eq!(out["chat_template_kwargs"]["n"], 3); assert_eq!(out["chat_template_kwargs"]["flag"], true); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
lib/llm/src/entrypoint/input/batch.rs
(1 hunks)lib/llm/src/entrypoint/input/text.rs
(1 hunks)lib/llm/src/http/service/openai.rs
(2 hunks)lib/llm/src/preprocessor/prompt.rs
(2 hunks)lib/llm/src/preprocessor/prompt/template/oai.rs
(2 hunks)lib/llm/src/protocols/openai/chat_completions.rs
(1 hunks)lib/llm/src/protocols/openai/responses.rs
(1 hunks)lib/llm/tests/http-service.rs
(3 hunks)lib/llm/tests/preprocessor.rs
(1 hunks)lib/llm/tests/test_common_ext.rs
(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
lib/llm/src/entrypoint/input/batch.rs (1)
lib/llm/tests/http_metrics.rs (1)
None
(17-17)
lib/llm/src/preprocessor/prompt.rs (1)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
chat_template_kwargs
(130-132)
lib/llm/src/entrypoint/input/text.rs (1)
lib/llm/tests/http_metrics.rs (1)
None
(17-17)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
lib/llm/src/preprocessor/prompt.rs (1)
chat_template_kwargs
(74-76)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
lib/llm/src/entrypoint/input/text.rs (1)
121-126
: LGTM.Initializing chat_template_kwargs to None keeps behavior unchanged for the CLI path.
lib/llm/src/http/service/openai.rs (2)
1318-1329
: LGTM (tests).Adding chat_template_kwargs: None in the empty-messages test maintains parity with the struct definition.
1340-1356
: LGTM (tests).Same here; request construction reflects the new optional field.
lib/llm/src/entrypoint/input/batch.rs (1)
226-233
: LGTM.Batch path initializes the new field explicitly; no behavior change.
lib/llm/tests/http-service.rs (1)
767-773
: LGTM: explicit None preserves backward compatibilitySetting chat_template_kwargs: None in all constructions is correct and keeps existing behavior unchanged.
Also applies to: 807-813, 848-854
lib/llm/tests/test_common_ext.rs (1)
63-72
: LGTM: explicit None in struct initializersThe additions align with the new optional field and keep tests stable.
Also applies to: 327-333, 372-383
@@ -32,6 +32,7 @@ | |||
|
|||
use anyhow::Result; | |||
use minijinja::value::Value; | |||
use std::collections::HashMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid cloning: return a reference from the trait.
Returning an owned HashMap forces a clone in implementers. Prefer borrowing to reduce allocations while keeping the trait object-safe.
-use std::collections::HashMap;
+use std::collections::HashMap;
// ...
- /// Optional additional kwargs to merge into the chat template context
- fn chat_template_kwargs(&self) -> Option<HashMap<String, serde_json::Value>> {
- None
- }
+ /// Optional additional kwargs to merge into the chat template context
+ fn chat_template_kwargs(&self) -> Option<&HashMap<String, serde_json::Value>> {
+ None
+ }
Also applies to: 73-76
🤖 Prompt for AI Agents
In lib/llm/src/preprocessor/prompt.rs around lines 35 and also applying to lines
73-76, the trait methods currently return an owned HashMap which forces
implementers to clone; change the trait signatures to return a reference tied to
self (e.g. fn some_map(&self) -> &HashMap<...>) so implementers can return a
reference without cloning, update all implementations to return references to
their internal maps, and then update call sites to use the borrowed &HashMap
instead of expecting ownership (or clone there if ownership is truly required).
fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> { | ||
self.chat_template_kwargs.clone() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Align with borrowed return to avoid cloning.
If the trait is changed to return a reference, update this impl accordingly.
- fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> {
- self.chat_template_kwargs.clone()
- }
+ fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> {
+ self.chat_template_kwargs.as_ref()
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
fn chat_template_kwargs(&self) -> Option<std::collections::HashMap<String, serde_json::Value>> { | |
self.chat_template_kwargs.clone() | |
} | |
fn chat_template_kwargs(&self) -> Option<&std::collections::HashMap<String, serde_json::Value>> { | |
self.chat_template_kwargs.as_ref() | |
} |
🤖 Prompt for AI Agents
In lib/llm/src/preprocessor/prompt/template/oai.rs around lines 130 to 132, the
impl currently clones self.chat_template_kwargs but the trait now expects a
borrowed return; change the method to return an Option reference by returning
self.chat_template_kwargs.as_ref() (or as_deref() as appropriate) so you pass a
reference instead of cloning the HashMap and match the trait signature.
I don't think |
@qimcis Thanks for sending this. Could you:
|
Overview:
Add
chat_template_kwargs
tov1/chat/completions
to pass per‑request kwargs into the chat template allowing for request‑level behavior control without separate replicas.Details:
Tested with Qwen3
Where should the reviewer start?
lib/llm/src/protocols/openai/chat_completions.rs
lib/llm/src/preprocessor/prompt.rs
lib/llm/src/preprocessor/prompt/template/oai.rs
Related Issues:
Summary by CodeRabbit