Skip to content

Conversation

@thesynapses
Copy link

@thesynapses thesynapses commented Nov 23, 2025

Fixes #3665

Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting finish_reason on aggregated LlmResponse objects, causing agent runners to not properly recognize completion states.

This fix mirrors the finish_reason mapping logic from the non-streaming path (lines 776-784) and applies it to both streaming code paths:

  • Tool call responses (lines 1340-1368)
  • Text-only responses (lines 1369-1390)

Without this fix, agents using Claude or GPT via LiteLLM would encounter stop conditions that couldn't be properly handled, leading to incomplete responses or unexpected agent behavior.

Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production multi-agent system with MCP tools.


Link to Issue or Description of Change

1. Link to an existing issue:

Testing Plan

Problem:
When using LiteLLM models in streaming mode, the finish_reason field was never set on aggregated LlmResponse objects. This caused ADK agent runners to not properly detect when responses completed, leading to incomplete responses, agents not recognizing stop conditions, and unpredictable behavior with Claude/GPT models.

Solution:
Added finish_reason mapping in both streaming code paths (tool calls and text-only), mirroring the existing non-streaming implementation at lines 776-784. Maps LiteLLM's string finish reasons ("stop", "tool_calls", etc.) to ADK's types.FinishReason enum values using the existing _FINISH_REASON_MAPPING dictionary.

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Manual End-to-End (E2E) Tests:

Setup:

  • Multi-agent system with ADK 1.19.0 + LiteLLM wrapper
  • Claude Sonnet 4.5 via Vertex AI (vertex_ai/claude-sonnet-4-5@20250929)
  • GPT-5 via Azure OpenAI (azure/gpt-5-openai-latest)
  • Streaming SSE mode with progressive chunk delivery
  • MCP tools connected via Gluon Link: Google Drive agent, HubSpot CRM agent

Test Cases:

  1. Google Drive: File listing with formatted markdown table output
  2. HubSpot CRM: Company queries with structured data
  3. Multi-turn conversations with tool calls

Before Fix:

  • finish_reason field was None on streaming responses
  • Agents showed incomplete/truncated responses
  • Claude: 291 tokens delivered, response cut off mid-table
  • GPT-5: 682 tokens but inconsistent completion detection

After Fix:

  • finish_reason correctly set to types.FinishReason.STOP
  • Complete responses delivered to users
  • Claude: Full markdown tables rendered properly (759 chars)
  • GPT-5: Consistent completion with proper finish_reason
  • Both models reliably signal completion states

Log Evidence:

# After fix - GPT-5 example:
INFO | GPT-5 streaming completed. Events processed: 15
INFO | Usage: prompt=123 tokens, candidates=682 tokens, finish_reason=STOP

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

This fix is critical for production systems using any LiteLLM-supported models (Claude, GPT, Mistral, etc.) in streaming mode. The bug affects all streaming scenarios where the ADK agent runner needs to detect proper completion. The fix ensures consistent behavior between streaming and non-streaming modes, making LiteLLM a viable production option for multi-agent systems.

Related to issue #3676 (double serialization) - both bugs prevented proper Claude/GPT operation with ADK.

Fixes google#3665

Streaming responses from LiteLLM models (Claude, GPT, etc.) were not setting
finish_reason on aggregated LlmResponse objects, causing agent runners to not
properly recognize completion states.

This fix mirrors the finish_reason mapping logic from the non-streaming path
(lines 776-784) and applies it to both streaming code paths:
- Tool call responses (lines 1340-1368)
- Text-only responses (lines 1369-1390)

Without this fix, agents using Claude or GPT via LiteLLM would encounter stop
conditions that couldn't be properly handled, leading to incomplete responses
or unexpected agent behavior.

Tested with Claude Sonnet 4.5 and GPT-5 via Azure OpenAI in production
multi-agent system with MCP tools.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical bug where LiteLLM streaming responses were failing to populate the finish_reason field in aggregated LlmResponse objects. This omission caused agent runners to misinterpret completion states, leading to functional issues in multi-agent systems. The solution involves integrating the established finish_reason mapping logic into the streaming pathways, ensuring that agents can reliably detect when a model's response has concluded, thereby improving the stability and correctness of agent interactions with streaming LLMs.

Highlights

  • Fix for LiteLLM Streaming Responses: Addressed an issue where streaming responses from LiteLLM models (e.g., Claude, GPT) were not correctly setting the finish_reason on aggregated LlmResponse objects.
  • Agent Runner Completion Recognition: The absence of finish_reason prevented agent runners from properly recognizing completion states, leading to incomplete responses or unexpected agent behavior.
  • Consistent Finish Reason Mapping: Implemented the finish_reason mapping logic, mirroring the existing non-streaming path, to ensure consistent behavior across both streaming and non-streaming modes.
  • Scope of Fix: The fix has been applied to both streaming code paths: for tool call responses and for text-only responses.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses a bug where finish_reason was not being mapped for streaming responses from LiteLLM, which could lead to incorrect agent behavior. The fix applies the existing mapping logic from the non-streaming path to both tool-calling and text-only streaming responses.

My review includes one suggestion to refactor the duplicated code into a helper function. This will improve the code's maintainability by adhering to the DRY principle. Overall, this is a good fix that improves the robustness of the LiteLLM integration.

@adk-bot
Copy link
Collaborator

adk-bot commented Nov 23, 2025

Response from ADK Triaging Agent

Hello @thesynapses, thank you for creating this PR!

Could you please fill out the Testing Plan section in your PR description? This information will help reviewers to review your PR more efficiently. Thanks!

@adk-bot adk-bot added the models [Component] Issues related to model support label Nov 23, 2025
@ryanaiagent ryanaiagent self-assigned this Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models [Component] Issues related to model support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LiteLLM Streaming Responses Missing finish_reason in ADK

3 participants