fix(backend): Handle missing content in streaming delta (#316)

git-jxj · sjmonson · web-flow · commit 0ce21dadcec6 · 2025-09-12T10:25:09.000-04:00
## Summary This PR fixes a `KeyError: 'content'` that occurs when processing streaming chat completions. ## Details When using the `chat_completions` endpoint with `stream=True`, the final `delta` chunk sent by the server may not contain a `content` key. This is part of the standard API behavior to signal the end of the stream. The existing code in `_extract_completions_delta_content` did not account for this possibility and tried to access `delta['content']` directly, leading to a `KeyError` and causing the benchmark process to crash when the stream ended. ## Test Plan This was discovered while running `guidellm benchmark` against an OpenAI-compatible API endpoint (via `litellm`) that correctly implements the streaming protocol. guidellm benchmark \ --target "http://10.64.1.62:4000/v1" \ --model "qwen3-06b-2" \ --processor "Qwen/Qwen3-0.6B" \ --rate-type "synchronous" \ --max-requests 1 \ --data "prompt_tokens=32,output_tokens=32,samples=1" ## Related Issues #315 - Resolves # --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: xinjun.jiang <xinjun.jiang@daocloud.io> Co-authored-by: Samuel Monson <smonson@redhat.com>
diff --git a/src/guidellm/backend/openai.py b/src/guidellm/backend/openai.py
@@ -688,7 +688,7 @@ def _extract_completions_delta_content(
             return data["choices"][0]["text"]
 
         if type_ == "chat_completions":
-            return data["choices"][0]["delta"]["content"]
+            return data.get("choices", [{}])[0].get("delta", {}).get("content")
 
         raise ValueError(f"Unsupported type: {type_}")