-
-
Notifications
You must be signed in to change notification settings - Fork 12
Typed Output Architecture
This document specifies how the dartantic_ai compatibility layer handles typed output (structured JSON responses) across different providers.
- Overview
- Provider Capabilities
- Implementation Approaches
- Agent-Level Handling
- Provider-Specific Details
- Testing and Validation
Typed output allows constraining LLM responses to specific JSON schemas. The system handles this through a clean separation of concerns across the six-layer architecture:
- API Layer (Agent): Selects appropriate orchestrator and adds return_result tool universally
- Orchestration Layer: TypedOutputStreamingOrchestrator handles typed output workflows
- Provider Abstraction Layer: ChatModel interface passes outputSchema to implementations
- Provider Implementation Layer: Provider-specific handling (native vs tool-based)
- Infrastructure Layer: JSON validation and parsing utilities
- Protocol Layer: Raw API communication with schema parameters
flowchart TD
A[Agent.sendFor/send with outputSchema] --> B{Add return_result tool}
B --> C[Select TypedOutputStreamingOrchestrator]
C --> D[Create ChatModel with tools]
D --> E{Provider Type?}
E -->|Native Support| F[Provider uses response_format/responseSchema]
E -->|Tool-based| G[Provider calls return_result tool]
F --> H[Stream native JSON text]
G --> I[Execute return_result tool]
H --> J[Return JSON in output]
I --> K[Create synthetic message with JSON]
K --> J
style A fill:#f9f,stroke:#333,stroke-width:2px
style J fill:#9f9,stroke:#333,stroke-width:2px
flowchart LR
A[Provider] --> B{Has typedOutputWithTools capability?}
B -->|Yes| C[Can use native format AND tools]
B -->|No| D{Has typedOutput capability?}
D -->|Yes| E[Native format OR tools<br/>but not both]
D -->|No| F[No typed output support]
C --> G[Examples: OpenAI, Anthropic]
E --> H[Examples: Google, Ollama]
F --> I[Example: Mistral]
| Provider | Typed Output | Method | Simultaneous Tools+Output |
|---|---|---|---|
| OpenAI | ✅ | Native response_format | ✅ |
| OpenAI Responses | ✅ | Native text_format (stateful) | ✅ |
| OpenRouter | ✅ | Native (OpenAI-compatible) | ✅ |
| Anthropic | ✅ | return_result tool | ✅ |
| ✅ | Native responseSchema + Double Agent | ✅ | |
| Ollama | ✅ | Native format param | ❌ (TODO: add double agent) |
| Together | ✅ | Native (OpenAI-compatible) | ✅ |
| Cohere | ✅ | Native (OpenAI-compatible) | ❌ (API limitation) |
| Mistral | ❌ | Not supported | ❌ |
Providers with direct API support for structured output handle typed output cleanly without any special handling at the Agent level.
OpenAI supports both tools and typed output simultaneously with no conflicts:
// OpenAI uses response_format.json_schema
ResponseFormat.jsonSchema(
jsonSchema: JsonSchemaObject(
name: 'response',
description: 'Generated response following the provided schema',
schema: outputSchema.schemaMap,
strict: true,
),
)The OpenAI Responses provider uses the stateful Responses API with session management:
// OpenAI Responses uses text_format with session continuations
TextFormatJsonSchema(
name: 'dartantic_output',
schema: outputSchema.schemaMap,
strict: true,
)Key differences from regular OpenAI:
-
Stateful Sessions: Maintains conversation state across requests with
previousResponseId - Message Validation: Enforces strict user/model message alternation
- Session Metadata: Stores session IDs in message metadata for continuations
-
Native JSON Support: Like OpenAI, it has native typed output support without
return_result
The Agent always adds the return_result tool when outputSchema is provided, regardless of provider.
Empirically verified behavior:
- OpenAI: Uses native response_format and returns JSON directly (ignores return_result tool)
- Anthropic: Calls the return_result tool (no native support)
The Agent's logic handles both cases identically:
- If return_result was called: use that output (Anthropic path)
- If not: use the model's direct output (OpenAI and other native providers)
This unified approach allows the Agent to support both native typed output (OpenAI, Google, etc.) and tool-based typed output (Anthropic) transparently.
-
Progressive streaming: Providers that emit schema-constrained JSON incrementally may stream partial chunks. Orchestrators forward those deltas so clients can render progressive JSON (see
example/bin/typed_output.dart). -
Message contents: During streaming, the assistant message attached to each chunk remains empty (other than metadata/tool parts). The JSON payload exists only in the streamed text chunks, so callers that care about the text must accumulate
chunk.outputthemselves. -
Final decoding: APIs such as
Agent.send()andAgent.sendFor()buffer the streamed chunks internally and decode once streaming completes. External consumers that want the final JSON document must follow the same pattern: concatenate the streamed chunks and parse once because the terminal assistant message does not repeat the streamed text. - Provider responsibility: Provider implementations should avoid emitting conflicting chunks; once a fragment is streamed it cannot be “taken back.” If a provider cannot supply coherent streaming deltas, it should suppress progressive JSON and emit only the final payload.
Still TODO: Make Ollama work with the double agent pattern when both tools and typed output are needed.
// Google uses GenerationConfig.responseSchema
GenerationConfig(
responseMimeType: 'application/json',
responseSchema: convertToGeminiSchema(outputSchema),
)Double Agent Pattern: Google's API does not support using tools and typed output (responseSchema) simultaneously in a single API call. To work around this limitation, Google uses the GoogleDoubleAgentOrchestrator which implements a two-phase approach:
Phase 1 - Tool Execution:
- Sends messages with tools (no outputSchema)
- Suppresses text output (we only care about tool calls)
- Executes all tool calls
- Accumulates tool results
Phase 2 - Structured Output:
- Sends tool results with outputSchema (no tools)
- Returns the structured JSON output
- Attaches metadata about suppressed content from Phase 1
This pattern allows Google to support the same capability as Anthropic and OpenAI, just with a different implementation strategy. The orchestrator is selected automatically by the Agent when both outputSchema and tools are present.
// Ollama uses format parameter in HTTP request
{
"format": outputSchema.schemaMap,
"model": "...",
"messages": [...],
}For providers without native typed output support, the Agent automatically adds a special tool:
// In Agent.runStream when outputSchema is provided
if (outputSchema != null) {
final returnResultTool = Tool<Map<String, dynamic>>(
name: kReturnResultToolName,
description: 'Return the final result in the required structured format',
inputSchema: outputSchema,
inputFromJson: (json) => json,
onCall: (args) async => json.encode(args),
);
tools = [...?_tools, returnResultTool];
}The system uses a specialized orchestrator for typed output requests that extends the default orchestrator:
class TypedOutputStreamingOrchestrator extends DefaultStreamingOrchestrator {
const TypedOutputStreamingOrchestrator({
required this.provider,
required this.hasReturnResultTool,
});
final Provider provider;
final bool hasReturnResultTool;
@override
String get providerHint => 'typed-output';
@override
Stream<StreamingIterationResult> processIteration(
ChatModel<ChatModelOptions> model,
StreamingState state, {
JsonSchema? outputSchema,
}) async* {
state.resetForNewMessage();
// Stream the model response
await for (final result in model.sendStream(
state.conversationHistory,
outputSchema: outputSchema,
)) {
// Stream native JSON text for providers without return_result tool
if (!hasReturnResultTool) {
final textOutput = result.output.parts
.whereType<TextPart>()
.map((p) => p.text)
.join();
if (textOutput.isNotEmpty) {
yield StreamingIterationResult(
output: textOutput,
messages: const [],
shouldContinue: true,
finishReason: result.finishReason,
metadata: result.metadata,
usage: result.usage,
);
}
}
// Accumulate the message
state.accumulatedMessage = state.accumulator.accumulate(
state.accumulatedMessage,
result.output,
);
state.lastResult = result;
}
// Handle return_result tool calls
final consolidatedMessage = state.accumulator.consolidate(
state.accumulatedMessage,
);
// Check if this message has return_result tool call
final hasReturnResultCall = consolidatedMessage.parts
.whereType<ToolPart>()
.any((p) => p.kind == ToolPartKind.call && p.name == kReturnResultToolName);
if (hasReturnResultCall) {
// Execute tools and create synthetic message with JSON
final toolCalls = consolidatedMessage.parts
.whereType<ToolPart>()
.where((p) => p.kind == ToolPartKind.call)
.toList();
final executionResults = await state.executor.executeBatch(
toolCalls,
state.toolMap,
);
// Extract return_result JSON
for (final result in executionResults) {
if (result.toolPart.name == kReturnResultToolName && result.isSuccess) {
final returnResultJson = result.resultPart.result ?? '';
// Create synthetic message
final syntheticMessage = ChatMessage(
role: ChatMessageRole.model,
parts: [TextPart(returnResultJson)],
metadata: {'toolId': result.toolPart.id},
);
yield StreamingIterationResult(
output: returnResultJson,
messages: [syntheticMessage],
shouldContinue: false,
finishReason: state.lastResult.finishReason,
metadata: state.lastResult.metadata,
usage: state.lastResult.usage,
);
}
}
}
}
}sequenceDiagram
participant A as Agent
participant O as TypedOutputOrchestrator
participant M as ChatModel
participant P as Provider API
A->>O: processIteration(model, state, outputSchema)
O->>M: sendStream(history, outputSchema)
alt Native Typed Output (OpenAI, Google)
M->>P: Request with response_format
P-->>M: Stream JSON text
M-->>O: Stream results with JSON text
O-->>A: Yield JSON text chunks
O-->>A: Yield final message with JSON
else Tool-based (Anthropic)
M->>P: Request (no native format)
P-->>M: Stream with return_result tool call
M-->>O: Stream results with tool call
O->>O: Suppress return_result message
O->>O: Execute return_result tool
O->>O: Create synthetic message
O-->>A: Yield synthetic message with JSON
end
The Agent automatically selects the TypedOutputStreamingOrchestrator when outputSchema is provided:
// In Agent._selectOrchestrator()
StreamingOrchestrator _selectOrchestrator({
JsonSchema? outputSchema,
List<Tool>? tools,
}) {
if (outputSchema != null) {
final hasReturnResultTool =
tools?.any((t) => t.name == kReturnResultToolName) ?? false;
return TypedOutputStreamingOrchestrator(
provider: _provider,
hasReturnResultTool: hasReturnResultTool,
);
}
return const DefaultStreamingOrchestrator();
}The Agent always adds the return_result tool when outputSchema is provided, regardless of provider:
// In Agent.runStream when outputSchema is provided
if (outputSchema != null) {
final returnResultTool = Tool<Map<String, dynamic>>(
name: kReturnResultToolName,
description: 'REQUIRED: You MUST call this tool to return the final result. '
'Use this tool to format and return your response according to '
'the specified JSON schema.',
inputSchema: outputSchema,
inputFromJson: (json) => json,
onCall: (args) async => json.encode(args),
);
tools = [...?_tools, returnResultTool];
}// Agent creates model directly from provider
final model = _provider.createModel(
name: _modelName,
tools: tools, // Includes return_result if outputSchema provided
temperature: _temperature,
);// Agent delegates typed output processing to orchestrator
final orchestrator = _selectOrchestrator(outputSchema: outputSchema, tools: model.tools);
final state = StreamingState(
conversationHistory: conversationHistory,
toolMap: {for (final tool in model.tools ?? <Tool>[]) tool.name: tool},
);
try {
await for (final result in orchestrator.processIteration(model, state, outputSchema: outputSchema)) {
// Orchestrator handles return_result vs native output detection
yield ChatResult<String>(
id: result.id,
output: result.output, // Already processed as JSON
messages: result.messages,
finishReason: result.finishReason,
metadata: result.metadata,
usage: result.usage,
);
}
} finally {
await _lifecycleManager.disposeModel(model);
}-
Method: Native
response_format.json_schemaparameter - Behavior: Uses native format and returns JSON directly (ignores return_result tool)
- Tools: Can use tools and typed output simultaneously
- Verified: Testing shows OpenAI uses native response_format even when return_result tool is present
-
Method: Native
text_formatparameter with stateful session management - Behavior: Uses native format and returns JSON directly (filters out return_result tool)
- Tools: Can use tools and typed output simultaneously
-
Session Management:
- Maintains conversation state across requests using
responseId - Stores session metadata in message.metadata for conversation continuations
- Only sends new messages after the session anchor point to reduce token usage
- Maintains conversation state across requests using
-
Message Validation:
- Enforces strict user/model message alternation synchronously
- Validates messages before sending to API (unlike other providers)
- Exposed race conditions in Agent that other providers missed
-
Key Differences from Regular OpenAI:
- Uses stateful Responses API instead of stateless Chat Completions
- Requires session ID tracking for multi-turn conversations
- Validates message structure more strictly
- Filters out return_result tool since it has native JSON support
-
Implementation Notes:
- OpenAIResponsesChatModel overrides
toolsgetter to filter return_result - OpenAIResponsesMessageMapper handles session metadata and message mapping
- OpenAIResponsesEventMapper processes streaming events and builds results
- OpenAIResponsesChatModel overrides
- Method: return_result tool pattern
- Behavior: Model calls return_result tool with JSON
- Tools: Works naturally since return_result is just another tool
- Note: Agent handles this transparently - Anthropic mapper has no special logic
- Edge case: Sometimes returns empty final message after return_result call (Agent replaces with JSON)
-
Method: Native
responseSchemain GenerationConfig + Double Agent orchestrator - Behavior: Phase 1 executes tools, Phase 2 returns JSON with suppressed metadata
- Tools: Supports tools and typed output together via double agent pattern
-
Implementation:
GoogleDoubleAgentOrchestratorhandles two-phase workflow- Phase 1: Sends request with tools (no outputSchema), executes tool calls
- Phase 2: Sends tool results with outputSchema (no tools), gets structured output
- Suppresses text from Phase 1 and attaches as metadata to Phase 2 output
-
Metadata: Attaches
suppressedTextmetadata when model attempts to output text in Phase 1
-
Method: Native
formatparameter - Behavior: Directly returns JSON in response
- Tools: Cannot use tools and typed output together
- Implementation: Uses direct HTTP client to access format parameter
- TODO: Add return_result pattern for simultaneous tools+output
- Basic Structured Output: Simple JSON object generation
- Complex Schemas: Nested objects, arrays, enums
- Edge Cases: Required fields, null handling, type validation
- Tool Integration: Simultaneous tools and typed output (where supported)
// Define schema
final schema = JsonSchema.create({
'type': 'object',
'properties': {
'name': {'type': 'string'},
'age': {'type': 'integer'},
},
'required': ['name', 'age'],
});
// Use with any provider
final agent = Agent('anthropic'); // or 'openai', 'google', etc.
final result = await agent.runFor<Person>(
'Generate a person named John who is 30 years old',
outputSchema: schema,
outputFromJson: Person.fromJson,
);Currently, Ollama doesn't support simultaneous tools and typed output. The plan is to enhance it to use the same double agent pattern as Google:
// Future: This will work just like Google
final agent = Agent('ollama', tools: [weatherTool]);
final result = await agent.runFor<Report>(
'Get weather and format as report',
outputSchema: reportSchema,
);
// Ollama will use double agent orchestrator
// Phase 1: Execute tools, Phase 2: Get structured output- Provider Transparency: Agent handles typed output uniformly
- Clean Separation: Mappers don't contain typed output logic
- Automatic Handling: return_result tool added automatically when needed
- Flexible Architecture: Models created on-the-fly with appropriate tools
- Error Transparency: JSON parsing errors bubble up for debugging
- Semantic Preservation: Schema mappers must preserve JSON Schema semantics
- Explicit Limitations: Throw clear errors for unsupported features
Schema mappers MUST NOT make semantic changes to accommodate provider limitations. Instead:
- Throw on Unmappable Features: Only throw when we cannot create a semantically equivalent mapping
-
No Silent Conversions: Don't convert unsupported types to supported ones (e.g.,
['string', 'number']→'string') - Let Providers Validate: Pass through valid mappings and let providers enforce their own limitations
- Preserve Original Intent: Don't add or remove constraints from the original schema
- Throw on Known Silent Failures: Also throw when we know a provider will silently fail (e.g., Google Gemini with tools + output schema combination)
Examples of when to throw:
- Type arrays with multiple non-null types:
['string', 'number'](no way to map union types) - anyOf/oneOf/allOf constructs (no equivalent in provider's schema model)
- Arrays without items specification (ambiguous intent)
- Unknown type values (can't map what we don't understand)
- Known provider limitations that cause silent failures (e.g., Google Gemini skips tool calls when outputSchema is provided)
Examples of when to pass through:
- Empty objects (valid JSON Schema, let provider decide)
- Complex nested structures (map what we can)
- Format fields (remove if truly unsupported by API, but document why)
- Provider responses that differ from expected types (e.g., returning strings for anyOf schemas)
- Requires all properties in
requiredarray for strict mode (API limitation) - Does not support
formatfield in schemas - Type arrays must be handled carefully
- Does not support anyOf/oneOf/allOf
- Does not support type arrays with multiple types
- Requires array schemas to have
itemsproperty - Requires object schemas to have at least one property
- Only supports basic types: string, number, integer, boolean, array, object
- Cannot use tools and outputSchema in same API call (handled by double agent orchestrator)
When throwing errors for unsupported features, provide actionable guidance:
throw ArgumentError(
'Provider X does not support feature Y; '
'consider alternative approach Z.'
);- Mistral Support: Investigate adding typed output support
- Cohere Support: Consider OpenAI-compatible endpoint
- Performance: Optimize JSON parsing and validation
- Schema Evolution: Support for schema versioning