Skip to content

Multimodal tool return type #1497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kawaijoe opened this issue Apr 16, 2025 · 5 comments · Fixed by #1517
Closed

Multimodal tool return type #1497

kawaijoe opened this issue Apr 16, 2025 · 5 comments · Fixed by #1517
Assignees
Labels
Feature request New feature request

Comments

@kawaijoe
Copy link

kawaijoe commented Apr 16, 2025

Description

I'm not entirely sure if this is a bug or a missing feature but I think it would make sense to be able to return multimodal types. I believe currently types such as DocumentUrl gets serialized as a json.

For example:

@agent.tool_plain
def special_document() -> DocumentUrl:
  '''Retrieve a research paper for analysis.'''
  return DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

Full example (DocumentUrl returned as tool)

import httpx
from google.colab import userdata
from pydantic_ai import Agent, BinaryContent, DocumentUrl
from pydantic_ai.models.gemini import GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider

model = GeminiModel(
  'gemini-2.5-pro-preview-03-25',
  provider=GoogleGLAProvider(api_key=userdata.get('GEMINI_API_KEY'))
)

agent = Agent(model)

documentUrl = DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

@agent.tool_plain
def special_document() -> DocumentUrl:
  '''Retrieve a research paper for analysis.'''
  return documentUrl

result = await agent.run(
  [
    'I need to read a research paper. Please use the special_document tool to get the paper and tell me its title.'
  ]
)

print('Agent response:')
print(result.output)

Agent response:
Okay, I have retrieved the research paper using the special_document tool.

However, the tool only provided a URL to the paper's PDF file: [https://arxiv.org/pdf/2504.07136](https://arxiv.org/pdf/2504.07136%60)

It did not return the content or the title of the paper itself. Therefore, I cannot tell you the title based on the information provided by the tool. You can access the paper at the URL above to read it and find its title.

Full example (DocumentUrl passed in agent.run())

import httpx
from google.colab import userdata
from pydantic_ai import Agent, BinaryContent, DocumentUrl
from pydantic_ai.models.gemini import GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider

model = GeminiModel(
  'gemini-2.5-pro-preview-03-25',
  provider=GoogleGLAProvider(api_key=userdata.get('GEMINI_API_KEY'))
)

agent = Agent(model)

documentUrl = DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

result = await agent.run(
  [
    'I need to read a research paper. Please use the special_document tool to get the paper and tell me its title.',
    documentUrl # Directly pass in documentUrl.
  ]
)

print('Agent response:')
print(result.output)

Agent response:
Okay, I have accessed the research paper using the special_document tool.

The title of the paper is: The spectrum of magnetized turbulence in the interstellar medium

References

@tianshangwuyun
Copy link

The return value of the tool must be of type str or UserContent?

@DouweM
Copy link
Contributor

DouweM commented Apr 16, 2025

@kawaijoe Great idea, this isn't currently implemented but should be relatively straightforward since we already have the logic to turn documents and images etc into the format different LLM providers expect.

Function calls responses are expected to be JSON, so this'd require following the response message up with a new file/binary message that will look like it came directly from the user rather than the function call, but at least Gemini interprets these back-to-back messages correctly and uses the document to answer the question.

I plan to implement this today or later this week.

@DouweM
Copy link
Contributor

DouweM commented Apr 17, 2025

@kawaijoe I've implemented this in #1517, can you please see if it works for you?

@kawaijoe
Copy link
Author

@DouweM Thank you so much for the fix! The PR looks great 🎉

@DouweM DouweM self-assigned this Apr 23, 2025
@Wh1isper
Copy link
Contributor

Nice one, bedrock and anthropic support tool return image, so I guess both can return images directly: #1339

@DouweM DouweM added the Feature request New feature request label Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants