Support returning multi-modal content from tools #1517

DouweM · 2025-04-17T00:01:53Z

Fixes #1497

hyperlint-ai · 2025-04-17T23:05:45Z

PR Change Summary

Enhanced support for returning multi-modal content from tools, ensuring compatibility with various models.

Implemented multi-modal content return for OpenAI, Gemini, and Anthropic tools.
Updated documentation to include examples of function tool outputs.
Clarified the registration process for function tools via decorators and agent arguments.

Modified Files

docs/tools.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

DouweM · 2025-04-17T23:10:52Z

@Kludex Please review!

Kludex

What happens with the models that don't support this? I don't see mistral, groq, etc.

docs/tools.md

tests/models/test_anthropic.py

DouweM · 2025-04-18T13:59:37Z

What happens with the models that don't support this?

@Kludex The same thing that would happen if a developer tries to pass unsupported multi-modal content to the LLM as a user part: PydanticAI raises an error. I think that's better than silently serializing the content part as JSON instead, as just passing along the URL would be unlikely to have the desired result.

I don't see mistral, groq, etc.

I've added a test for Groq. I can add one for Mistral as well; is there a particular reason we're not actually calling to the API in test_mistral.py (and using VCR?)? Writing a test that fakes the entire API interaction seems to defeat the point as we specifically want to test that the LLM interprets this correctly :)

DouweM · 2025-04-18T14:43:20Z

Hmm, looks like Mistral, unlike the other 4 I tested, doesn't like this trick:

Unexpected role 'user' after role 'tool'

Edit: Got it to work!

tests/models/test_openai.py

pydantic_ai_slim/pydantic_ai/models/mistral.py

DouweM · 2025-04-18T19:34:50Z

@Kludex Ready for review again!

TheFirstMe · 2025-04-20T18:03:34Z

@DouweM Does this also cover Bedrock (Claude)?

DouweM · 2025-04-21T01:59:31Z

@DouweM Does this also cover Bedrock (Claude)?

@TheFirstMe The implementation is model-agnostic -- if the model support multi-modal input, PydanticAI will try to send the file after the tool output. I haven't specifically tested Bedrock though; if you have an API token and could give it a try it'd be much appreciated!

TheFirstMe · 2025-04-23T21:21:57Z

@TheFirstMe The implementation is model-agnostic -- if the model support multi-modal input, PydanticAI will try to send the file after the tool output. I haven't specifically tested Bedrock though; if you have an API token and could give it a try it'd be much appreciated!

@DouweM I have tested and can verify that it works with bedrock. Thanks!

When can we expect this to be released?

pydantic_ai_slim/pydantic_ai/models/mistral.py

docs/tools.md

pydantic_ai_slim/pydantic_ai/_agent_graph.py

pydantic_ai_slim/pydantic_ai/messages.py

pydantic_ai_slim/pydantic_ai/models/mistral.py

# Conflicts: # tests/models/test_openai.py

…-llama/llama-4-scout-17b-16e-instruct instead

Wh1isper · 2025-04-27T03:37:29Z

LGTM, just one thing I'm concerned about is that claude actually supports tool return image, and I'm not sure there will be a difference between that. I'm guessing the impact is minimal and I'll test it in my case later.

DouweM mentioned this pull request Apr 17, 2025

Multimodal tool return type #1497

Open

DouweM marked this pull request as draft April 17, 2025 22:02

DouweM force-pushed the multimodal-tool-output branch from 747131e to bffbe4f Compare April 17, 2025 23:05

DouweM marked this pull request as ready for review April 17, 2025 23:06

DouweM force-pushed the multimodal-tool-output branch 2 times, most recently from d9b7fb1 to aca752f Compare April 17, 2025 23:37

Kludex requested changes Apr 18, 2025

View reviewed changes

docs/tools.md Show resolved Hide resolved

tests/models/test_anthropic.py Outdated Show resolved Hide resolved

DouweM force-pushed the multimodal-tool-output branch from e7f3e04 to cf49b0c Compare April 18, 2025 15:23

Kludex reviewed Apr 18, 2025

View reviewed changes

tests/models/test_openai.py Outdated Show resolved Hide resolved

pydantic_ai_slim/pydantic_ai/models/mistral.py Outdated Show resolved Hide resolved

DouweM force-pushed the multimodal-tool-output branch from cf49b0c to c1c7392 Compare April 18, 2025 19:27

TheFirstMe mentioned this pull request Apr 23, 2025

Anthropic tool response nested content blocks are stringified (no image response possible?) #1267

Open

2 tasks

Kludex reviewed Apr 24, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/mistral.py Outdated Show resolved Hide resolved

DouweM self-assigned this Apr 25, 2025

Kludex reviewed Apr 25, 2025

View reviewed changes

docs/tools.md Outdated Show resolved Hide resolved

pydantic_ai_slim/pydantic_ai/_agent_graph.py Outdated Show resolved Hide resolved

pydantic_ai_slim/pydantic_ai/messages.py Outdated Show resolved Hide resolved

pydantic_ai_slim/pydantic_ai/models/mistral.py Outdated Show resolved Hide resolved

DouweM added 8 commits April 25, 2025 23:37

Support returning multi-modal content from tools

cfb52d4

# Conflicts: # tests/models/test_openai.py

Fix multi-modal tool output tests

fd27cab

Fix test for multi-modal tool output docs example

7eb1d04

Groq llama-3.2-11b-vision-preview is deprecated, use recommended meta…

06aea16

…-llama/llama-4-scout-17b-16e-instruct instead

Test multi-modal tool output with Groq

3c4ea3d

Use snapshots in multi-modal tool output tests

a987afe

Support and test multi-modal tool output with Mistral

7145a92

Update multi-modal tool output tests to test all messages

9783b13

DouweM added 2 commits April 25, 2025 23:38

Address review feedback

8405843

Waste fewer tokens on Mistral fake assistant message

3ee3ca1

DouweM force-pushed the multimodal-tool-output branch from 2d3dec7 to 3ee3ca1 Compare April 25, 2025 23:38

DouweM assigned Kludex and unassigned DouweM Apr 25, 2025

Kludex approved these changes Apr 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support returning multi-modal content from tools #1517

Support returning multi-modal content from tools #1517

DouweM commented Apr 17, 2025 •

edited

Loading

hyperlint-ai bot commented Apr 17, 2025

DouweM commented Apr 17, 2025

Kludex left a comment

DouweM commented Apr 18, 2025 •

edited

Loading

DouweM commented Apr 18, 2025 •

edited

Loading

DouweM commented Apr 18, 2025

TheFirstMe commented Apr 20, 2025

DouweM commented Apr 21, 2025

TheFirstMe commented Apr 23, 2025

Wh1isper commented Apr 27, 2025

Support returning multi-modal content from tools #1517

Are you sure you want to change the base?

Support returning multi-modal content from tools #1517

Conversation

DouweM commented Apr 17, 2025 • edited Loading

hyperlint-ai bot commented Apr 17, 2025

PR Change Summary

DouweM commented Apr 17, 2025

Kludex left a comment

Choose a reason for hiding this comment

DouweM commented Apr 18, 2025 • edited Loading

DouweM commented Apr 18, 2025 • edited Loading

DouweM commented Apr 18, 2025

TheFirstMe commented Apr 20, 2025

DouweM commented Apr 21, 2025

TheFirstMe commented Apr 23, 2025

Wh1isper commented Apr 27, 2025

DouweM commented Apr 17, 2025 •

edited

Loading

DouweM commented Apr 18, 2025 •

edited

Loading

DouweM commented Apr 18, 2025 •

edited

Loading