-
-
Notifications
You must be signed in to change notification settings - Fork 11
Server Side Tools Tech Design
This document describes the architecture and implementation patterns for server-side tools in Dartantic AI providers. Server-side tools are capabilities executed by the provider's infrastructure (not client-side) that stream progress events during execution.
Server-side tools are capabilities provided by AI providers that execute on the provider's infrastructure rather than requiring client-side implementation. Unlike client-side tools (user-defined functions), server-side tools:
- Execute on the provider's infrastructure
- Are configured via provider-specific options
- Stream progress events during execution
- Require standardized metadata handling to expose their operation to applications
This document establishes generic patterns that apply across providers, with provider-specific details documented separately.
The following patterns apply to all providers with server-side tools.
Understanding when server-side tool data appears in metadata versus message output (parts) is critical:
- Metadata: Progress information, intermediate states, tool execution details
- Message Output: Final deliverables that are part of the conversation content
Server-side tools often produce both:
- Process information: How the tool executed, what steps it took
- Content deliverables: Actual results that should be part of the message
The distinction ensures:
- Clean separation between "how it happened" (metadata) and "what was produced" (message content)
- Metadata is optional to consume - developers can ignore it if they only care about results
- Message content is always accessible through standard part iteration
- Conversation history remains clean and focused on actual content
Critical: Metadata is never sent to the model. It exists purely for application/developer use. This means:
- ✅ Safe to keep in message history for debugging/transparency
- ✅ Safe to strip from messages before sending to reduce token usage
- ✅ Does not affect model behavior or responses
- ✅ Can contain verbose debugging information without cost
Developers can choose to:
- Keep metadata for full transparency and debugging
- Strip metadata to reduce memory/storage footprint
- Selectively preserve certain metadata fields
Metadata contains information about tool execution and intermediate states. Typical metadata includes:
- Progress events: in_progress, processing, completed
- Intermediate states: Partial results, status updates
- Execution details: What was searched, code executed, queries run
Key characteristic: Metadata is about the journey - it shows what the tool is doing or did.
Message parts contain final content that is part of the conversation. Content belongs in message parts when:
- The content is a primary deliverable (images, files, documents)
- Users will want to see/save/use it directly
- It should appear in conversation history naturally
- It's standalone content that makes sense without context
Key characteristic: Message parts are deliverables - they are the actual content being communicated.
Image Generation:
- Metadata: Progress events, partial preview images
- Message Parts: Final generated image as
DataPart - Rationale: The image IS the response content
Code Execution:
- Metadata: Execution events, code, logs, results
- Message Parts: Text synthesis only
- Rationale: Code output is contextual, model synthesizes into natural language
Search (Web/File):
- Metadata: Search events, queries, results
- Message Parts: Text synthesis only
- Rationale: Search informs the text response
Use message parts when:
- ✅ The content is a primary deliverable (images, files)
- ✅ Users will want to see/save/use it directly
- ✅ It should appear in conversation history naturally
- ✅ It's standalone content that makes sense without context
Use metadata when:
- ✅ Showing tool execution progress/steps
- ✅ Providing debugging/transparency information
- ✅ Offering intermediate states (previews, partial results)
- ✅ Documenting what was searched/executed
- ✅ Content needs context from text to be meaningful
- Both during streaming and final: Structure should be the same
- Metadata accumulates: Each streaming chunk adds to the event list
- Parts replace: Final message parts replace any streaming parts
- No duplication: Don't put the same content in both metadata and parts
Server-side tool metadata follows a streaming-only pattern:
flowchart LR
A[Streaming Event] --> B[ChatResult.metadata]
B -.NOT.-> C[ChatMessage.metadata]
B -.NOT.-> D[Final ChatResult.metadata]
Key Principle: Tool events are ONLY available during streaming, NOT in message metadata.
-
During streaming: Individual events emitted in
ChatResult.metadata -
Message metadata: Only contains
_responses_session(for session continuation) - Final result metadata: Only contains response-level info (response_id, model, status)
- No duplication: Tool events already streamed in real-time, no need to duplicate in message history
- Clean message history: Messages only contain data needed for processing (session info)
- Clear separation: Streaming metadata = transparency/debugging, Message metadata = processing data
- Reduced memory: Large tool event logs don't bloat conversation history
When a server-side tool event arrives, it's immediately emitted in the ChatResult.metadata as a single-item list.
Algorithm:
- Identify the event type and determine which tool it belongs to
- Convert the event to JSON
- Wrap the JSON in a single-item list
- Create a ChatResult with empty output and the list in metadata under the tool's key
- Yield this chunk to the streaming response
Critical: Metadata is ALWAYS a list, even during streaming with single events. This ensures:
- Consistent structure between streaming and final results
- Developer code works the same for both cases
- Easy to iterate over events without type checking
// Developer code works the same for streaming and final:
await for (final chunk in agent.sendStream(prompt)) {
final events = chunk.metadata['web_search'] as List?;
if (events != null) {
for (final event in events) {
final stage = event['type'];
print('Stage: $stage');
}
}
}Events are accumulated in an internal event log map during streaming for use in final result metadata.
Algorithm:
- Maintain a map with keys for each tool type: 'web_search', 'file_search', 'image_generation', 'local_shell', 'mcp', 'code_interpreter'
- Each key maps to a list of event objects (JSON maps)
- When a server-side tool event arrives during streaming:
- Convert the event to JSON
- Append to the appropriate tool's list in the map
- Yield the event in
ChatResult.metadatafor streaming consumers
- This accumulated log is used ONLY to populate final
ChatResult.metadata(viaAgent.send())
Message metadata contains ONLY session information needed for processing.
Algorithm:
- Create a message metadata map
- Add only
_responses_sessiondata (response_id for session continuation) - DO NOT add tool events, thinking, or any other transparency metadata
- Attach this minimal metadata map to the final ChatMessage
Rationale: Tool events were already streamed via ChatResult.metadata. Message metadata should only contain data needed for message processing (like session continuation), not transparency/debugging data.
Streaming chunks: Each ChatResult.metadata contains tool events as they arrive
Final ChatResult.metadata (from Agent.send()): Contains accumulated tool events from all chunks
- Thinking (accumulated from all deltas)
- Tool events (accumulated from all streaming events)
- Response-level info (response_id, model, status)
Difference from message metadata: Result metadata provides transparency for non-streaming consumers. Message metadata only contains processing data.
Status: NOT used in current implementation
Previous Design: The original design added synthetic summary events from response.output to message metadata for tools like FileSearch and CodeInterpreter that have additional data not available during streaming.
Current Implementation: Tool events are NOT added to message metadata at all. They are only available:
- During streaming via
ChatResult.metadata - In the final
ChatResult.metadata(accumulated byAgent.send())
Rationale:
- Tool events were already streamed in real-time
- Duplicating them in message metadata creates unnecessary bloat
- Message metadata should only contain data needed for processing (like
_responses_session) - Developers who need tool events can access them from
ChatResult.metadataduring streaming or from the final result
This section is retained for historical context and may be reconsidered in future iterations if use cases emerge that require tool event data in conversation history.
Some server-side tools generate content that should appear as message parts, not just metadata.
Content should be added as a DataPart when:
- It's the primary deliverable of the tool (images, documents, files)
- It should persist in conversation history
- Users expect to access it as message content
For tools that support progressive rendering/generation:
Algorithm:
- During streaming, track state:
- Store partial content indexed by unique identifier (e.g., outputIndex)
- Each new partial updates the corresponding entry
- Mark entry as completed when completion event arrives
- When building the final result:
- Iterate through all completed entries
- Decode/process each content item into appropriate format
- Create a DataPart for each completed item with appropriate MIME type
- Add all DataParts to the message parts list
- Wait for completion events - only add DataParts for completed items
- Semantic correctness: Generated content is primary response material, not metadata
- Consistency: Content from all sources should appear as parts
- Developer ergonomics: Standard message part handling works uniformly
- History persistence: Content naturally persists in conversation history
Algorithm:
- When a streaming event arrives, check its type
- If it's a server-side tool event (web search, image generation, etc.):
- Convert event to JSON
- Append to the appropriate tool's list in internal event log
- Yield a metadata chunk (see next section)
Algorithm:
- Create a ChatResult with:
- Empty output message (no parts)
- Empty messages list
- Metadata containing the event as a single-item list under the tool key
- Empty usage stats
- Yield this chunk to the stream
The single-item list format is critical for consistency with final metadata structure.
Algorithm for Message Metadata:
- Create message metadata map
- Add ONLY
_responses_sessiondata (for session continuation) - DO NOT add tool events or thinking
- Return this minimal map as part of the final ChatMessage
Algorithm for Result Metadata (Agent.send only):
-
Agent.send()accumulates metadata from all streaming chunks - Accumulate thinking from streaming chunks into a buffer
- Preserve tool event logs from streaming chunks
- Include response-level info (response_id, model, status)
- Return complete metadata in final
ChatResult.metadata
Algorithm:
- During streaming: track partial content and completion status
- When building final result: check completion flag
- If completed: process content and add DataPart to message parts
See Content Deliverables section for details.
Algorithm:
- Check if text was streamed during response
- If yes:
- Filter message parts to separate text from non-text parts
- Create output message with empty parts list (text already streamed)
- Add output message metadata
- If non-text parts exist: create separate message with only non-text parts
- Return result with metadata-only output and non-text parts in messages
- If no text was streamed:
- Include all parts (text and non-text) in output message
await for (final chunk in agent.sendStream(prompt)) {
if (chunk.output.isNotEmpty) stdout.write(chunk.output);
// Access tool events (always a list)
final toolEvents = chunk.metadata['tool_name'] as List?;
if (toolEvents != null) {
for (final event in toolEvents) {
// Process event based on its structure
final eventType = event['type'] as String?;
// Handle event...
}
}
}final result = await agent.send(prompt);
// Tool events are in result.metadata (accumulated by Agent.send)
final toolEvents = result.metadata['tool_name'] as List?;
if (toolEvents != null) {
for (final event in toolEvents) {
// Process complete event history
}
}
// Messages only contain session info, not tool events
final session = result.messages.last.metadata['_responses_session'];
// Use session data for stateful continuation...await for (final chunk in agent.sendStream(prompt)) {
// Check for content parts in messages
for (final msg in chunk.messages) {
for (final part in msg.parts) {
if (part is DataPart) {
// Handle generated content (images, files, etc.)
processContent(part.bytes, part.mimeType);
}
}
}
}This section contains OpenAI Responses API specific implementation details. The generic patterns defined above apply here, but this section documents the provider-specific configuration, event types, and behaviors.
Future providers with server-side tools should follow the same generic patterns while documenting their own provider-specific details in separate sections.
Server-side tools are configured when creating an Agent via OpenAIResponsesChatModelOptions:
final agent = Agent(
'openai-responses:gpt-4o',
chatModelOptions: OpenAIResponsesChatModelOptions(
serverSideTools: {
OpenAIServerSideTool.webSearch,
OpenAIServerSideTool.imageGeneration,
},
webSearchConfig: WebSearchConfig(
contextSize: WebSearchContextSize.medium,
location: WebSearchLocation(city: 'Seattle', country: 'US'),
),
imageGenerationConfig: ImageGenerationConfig(
partialImages: 2, // Request 2 progressive previews
quality: ImageGenerationQuality.high,
size: ImageGenerationSize.square1024,
),
),
);- Web Search: Search the web for current information
- Image Generation: Generate images using gpt-image-1
- File Search: Search through uploaded files/vector stores
- Code Interpreter: Execute Python code with file handling
- MCP (Model Context Protocol): Connect to MCP servers
- Local Shell: Execute shell commands server-side
Note: OpenAI's Responses API also provides a Computer Use tool for remote desktop/browser control, but this is currently out of scope for Dartantic and not implemented.
- contextSize: Search context size (small, medium, large)
- location: User location metadata (city, region, country, timezone)
- partialImages: Number of progressive previews (0-3, default: 0)
- quality: Image quality (low, medium, high, auto - default: auto)
- size: Image dimensions (square1024, portrait, landscape, etc. - default: auto)
- vectorStoreIds: List of vector store IDs to search
- maxResults: Maximum number of results to return
- ranker: Ranking algorithm to use
- scoreThreshold: Minimum relevance score
- shouldReuseContainer: Whether to reuse previous container
- containerId: Specific container ID to reuse
- fileIds: Files to make available in container
OpenAI Responses uses event types like:
response.web_search_call.in_progressresponse.web_search_call.searchingresponse.web_search_call.completedresponse.image_generation_call.partial_imageresponse.code_interpreter_call.interpreting
Tools requiring synthetic events:
- ✅ FileSearch: Append
FileSearchCall(has queries + results) - ✅ CodeInterpreter: Append
CodeInterpreterCall(has code + results + containerId) - ❌ WebSearch: Ignore
WebSearchCall(no additional data) - ❌ ImageGeneration: Ignore
ImageGenerationCall(resultBase64 redundant) - ❌ MCP: Ignore
McpCall(no additional data) - ❌ LocalShell: Ignore
LocalShellCall(no additional data) - 🚫 ComputerUse: Not supported (out of scope for Dartantic)
When partialImages > 0, the API streams intermediate render stages:
- Track each
ResponseImageGenerationCallPartialImageevent - Store base64 data indexed by outputIndex (supports multiple concurrent images)
- Set completion flag on
ResponseImageGenerationCallCompletedfor each outputIndex - Add all completed images as DataParts after completion
Implementation: The AttachmentCollector class manages both image generation and code interpreter file attachments:
-
Images: Tracked via
Map<int, String> _imagesByIndexandSet<int> _completedImageIndices- Maps output index → base64 data
- Supports multiple concurrent image generation calls
- Each image tracked independently by its position in the response output array
-
Container Files: Tracked via
Set<({String containerId, String fileId})> _containerFiles- Uses a Set to automatically handle multiple files without duplication
- Files are discovered via
ContainerFileCitationannotations in text content - Downloaded asynchronously and converted to DataParts
Both attachment types are resolved in resolveAttachments(), which returns all completed images and downloaded container files as a unified list of DataParts.
await for (final chunk in agent.sendStream('What are the latest Dart news?')) {
final webSearchEvents = chunk.metadata['web_search'] as List?;
if (webSearchEvents != null) {
for (final event in webSearchEvents) {
print('Stage: ${event['type']}');
}
}
}await for (final chunk in agent.sendStream('Generate a logo')) {
final imageEvents = chunk.metadata['image_generation'] as List?;
if (imageEvents != null) {
for (final event in imageEvents) {
// Save partial images
if (event['partial_image_b64'] != null) {
final bytes = base64Decode(event['partial_image_b64']);
savePreview(bytes, event['partial_image_index']);
}
}
}
// Final image as DataPart
for (final msg in chunk.messages) {
for (final part in msg.parts) {
if (part is DataPart && part.mimeType.startsWith('image/')) {
saveFinal(part.bytes);
}
}
}
}Code execution follows the same pattern as thinking/reasoning:
- During streaming: Individual code delta events stream in chunk metadata
- After streaming: Single accumulated code delta appears in message metadata
// Stream code as it's generated
await for (final chunk in agent.sendStream('Calculate fibonacci(100)')) {
final codeEvents = chunk.metadata['code_interpreter'] as List?;
if (codeEvents != null) {
for (final event in codeEvents) {
// Stream individual code deltas character-by-character if desired
if (event['type'] == 'response.code_interpreter_call_code.delta') {
stdout.write(event['delta']);
}
}
}
}
// Access complete code in result metadata (not message metadata)
final result = await agent.send('Calculate fibonacci(100)');
final codeEvents = result.metadata['code_interpreter'] as List?;
if (codeEvents != null) {
// Find code deltas in the event stream
for (final event in codeEvents) {
if (event['type'] == 'response.code_interpreter_call_code.delta') {
print('Code chunk: ${event['delta']}');
}
// Find completion events with execution details
if (event['type'] == 'response.output_item.done') {
final item = event['item'];
if (item['container_id'] != null) {
print('Container: ${item['container_id']}');
print('Status: ${item['status']}');
}
}
}
}
// Generated images are automatically attached as DataParts
final result = await agent.send('Create a plot and save it as plot.png');
for (final part in agent.messages.last.parts) {
if (part is DataPart && part.mimeType.startsWith('image/')) {
print('Image generated: ${part.bytes.length} bytes');
// The image bytes are directly available in the message
}
}File Generation: When code interpreter generates files (e.g., via plt.savefig()), they are:
- Referenced via
container_file_citationevents in streaming metadata - Automatically downloaded and attached as
DataParts in the model's response - Citations with zero-length text ranges (start_index == end_index) are filtered out
- Event recording: Verify events are added to internal event log
- Metadata emission: Check streaming chunks have events as single-item lists
-
Message metadata: Verify messages only contain
_responses_session, NOT tool events -
Result metadata accumulation: Verify
Agent.send()accumulates tool events in result.metadata - Image DataPart: Confirm final image appears as DataPart when completed
-
Web search flow: Test streaming events appear in
ChatResult.metadata - Image generation: Test partial images in streaming metadata + final DataPart
- File search: Test streaming events contain search results
- Code interpreter: Test streaming code deltas + container reuse
- Multiple tools: Test multiple server-side tools in one response
Tests should verify:
- Streaming chunks contain single-item event lists in
ChatResult.metadata - Final
ChatResult.metadata(fromAgent.send()) contains accumulated events - Message metadata contains ONLY
_responses_session, NOT tool events - Event types match expected progression for the provider
- Content deliverables appear as DataPart in message parts
- Session continuation works via
_responses_sessionin message metadata
- Message Handling Architecture - Core message patterns
- OpenAI Responses Provider Requirements - Feature requirements
- OpenAI Responses Provider Technical Design - Implementation details
- State Management Architecture - Session persistence patterns