Skip to content

Support returning multi-modal content from tools #1517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

DouweM
Copy link
Contributor

@DouweM DouweM commented Apr 17, 2025

Fixes #1497

@DouweM DouweM marked this pull request as draft April 17, 2025 22:02
@DouweM DouweM force-pushed the multimodal-tool-output branch from 747131e to bffbe4f Compare April 17, 2025 23:05
Copy link
Contributor

hyperlint-ai bot commented Apr 17, 2025

PR Change Summary

Enhanced support for returning multi-modal content from tools, ensuring compatibility with various models.

  • Implemented multi-modal content return for OpenAI, Gemini, and Anthropic tools.
  • Updated documentation to include examples of function tool outputs.
  • Clarified the registration process for function tools via decorators and agent arguments.

Modified Files

  • docs/tools.md

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

@DouweM DouweM marked this pull request as ready for review April 17, 2025 23:06
@DouweM
Copy link
Contributor Author

DouweM commented Apr 17, 2025

@Kludex Please review!

@DouweM DouweM force-pushed the multimodal-tool-output branch 2 times, most recently from d9b7fb1 to aca752f Compare April 17, 2025 23:37
Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the models that don't support this? I don't see mistral, groq, etc.

@DouweM
Copy link
Contributor Author

DouweM commented Apr 18, 2025

What happens with the models that don't support this?

@Kludex The same thing that would happen if a developer tries to pass unsupported multi-modal content to the LLM as a user part: PydanticAI raises an error. I think that's better than silently serializing the content part as JSON instead, as just passing along the URL would be unlikely to have the desired result.

I don't see mistral, groq, etc.

I've added a test for Groq. I can add one for Mistral as well; is there a particular reason we're not actually calling to the API in test_mistral.py (and using VCR?)? Writing a test that fakes the entire API interaction seems to defeat the point as we specifically want to test that the LLM interprets this correctly :)

@DouweM
Copy link
Contributor Author

DouweM commented Apr 18, 2025

Hmm, looks like Mistral, unlike the other 4 I tested, doesn't like this trick:

Unexpected role 'user' after role 'tool'

Edit: Got it to work!

@DouweM DouweM force-pushed the multimodal-tool-output branch from e7f3e04 to cf49b0c Compare April 18, 2025 15:23
@DouweM DouweM force-pushed the multimodal-tool-output branch from cf49b0c to c1c7392 Compare April 18, 2025 19:27
@DouweM
Copy link
Contributor Author

DouweM commented Apr 18, 2025

@Kludex Ready for review again!

@TheFirstMe
Copy link

@DouweM Does this also cover Bedrock (Claude)?

@DouweM
Copy link
Contributor Author

DouweM commented Apr 21, 2025

@DouweM Does this also cover Bedrock (Claude)?

@TheFirstMe The implementation is model-agnostic -- if the model support multi-modal input, PydanticAI will try to send the file after the tool output. I haven't specifically tested Bedrock though; if you have an API token and could give it a try it'd be much appreciated!

@TheFirstMe
Copy link

@TheFirstMe The implementation is model-agnostic -- if the model support multi-modal input, PydanticAI will try to send the file after the tool output. I haven't specifically tested Bedrock though; if you have an API token and could give it a try it'd be much appreciated!

image

@DouweM I have tested and can verify that it works with bedrock. Thanks!

When can we expect this to be released?

@DouweM DouweM self-assigned this Apr 25, 2025
@DouweM DouweM force-pushed the multimodal-tool-output branch from 2d3dec7 to 3ee3ca1 Compare April 25, 2025 23:38
@DouweM DouweM assigned Kludex and unassigned DouweM Apr 25, 2025
@Wh1isper
Copy link
Contributor

LGTM, just one thing I'm concerned about is that claude actually supports tool return image, and I'm not sure there will be a difference between that. I'm guessing the impact is minimal and I'll test it in my case later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multimodal tool return type
4 participants