-
Notifications
You must be signed in to change notification settings - Fork 819
Multimodal tool return type #1497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The return value of the tool must be of type str or UserContent? |
@kawaijoe Great idea, this isn't currently implemented but should be relatively straightforward since we already have the logic to turn documents and images etc into the format different LLM providers expect. Function calls responses are expected to be JSON, so this'd require following the response message up with a new file/binary message that will look like it came directly from the user rather than the function call, but at least Gemini interprets these back-to-back messages correctly and uses the document to answer the question. I plan to implement this today or later this week. |
@DouweM Thank you so much for the fix! The PR looks great 🎉 |
Nice one, bedrock and anthropic support tool return image, so I guess both can return images directly: #1339 |
Description
I'm not entirely sure if this is a bug or a missing feature but I think it would make sense to be able to return multimodal types. I believe currently types such as
DocumentUrl
gets serialized as a json.For example:
Full example (
DocumentUrl
returned as tool)Full example (
DocumentUrl
passed inagent.run()
)References
The text was updated successfully, but these errors were encountered: