Skip to content

Conversation

ChinmayBansal
Copy link
Contributor

Related Issues

Proposed Changes:

This PR adds multimodal (image + text) support to WatsonxChatGenerator, enabling the component to process
both text and images in chat messages. The implementation follows established patterns from the
AnthropicChatGenerator and LlamaCppChatGenerator multimodal support.

Key Features Added:

  • Image format validation for supported formats (JPEG, PNG)
  • Proper message conversion to Watson API format with base64 data URIs
  • Support for multimodal models like meta-llama/llama-3-2-11b-vision-instruct and pixtral-12b
  • Role-based image restrictions (images only allowed in user messages)
  • Comprehensive error handling for unsupported formats and edge cases
  • Pre-validation of images before processing for better error flow

Implementation Details:

  • Updated _prepare_api_call() method to handle multimodal content while preserving order
  • Added image format validation constants ImageFormat and IMAGE_SUPPORTED_FORMATS
  • Enhanced component docstring with detailed usage examples for multimodal scenarios
  • Added proper type annotations and removed type: ignore directive
  • Pre-validate all images upfront before content processing (following LlamaCpp pattern)

How did you test it?

Unit Tests:

  • test_prepare_api_call_with_image() - Tests proper multimodal message conversion
  • test_prepare_api_call_with_unsupported_mime_type() - Tests error handling for
    unsupported formats
  • test_prepare_api_call_with_none_mime_type() - Tests edge case with None mime type
  • test_prepare_api_call_image_in_non_user_message() - Tests role-based restrictions
  • test_multimodal_message_processing() - Tests end-to-end multimodal processing with mocked model
  • test_supported_image_formats() - Tests all supported formats (JPEG, PNG)
  • test_multiple_images_in_single_message() - Tests multiple image support

Integration Tests:

  • test_live_run_multimodal() - Tests live API calls with real Watson multimodal models

Code Quality Verification:

  • ✅ All linting checks pass: hatch run fmt
  • ✅ All type checking passes: hatch run test:types
  • ✅ All unit tests pass: hatch run test:unit

Manual Verification:

  • Tested multimodal message creation and conversion
  • Verified proper error messages for validation failures
  • Confirmed Watson API format compatibility with data URI structure

Notes for the reviewer

  • The implementation closely follows the patterns established in AnthropicChatGenerator and
    LlamaCppChatGenerator
  • Image validation uses the same error message format as other integrations for consistency
  • The data:mime-type;base64,data format with image_url structure is required by Watson API for multimodal processing
  • Added comprehensive test coverage that matches and exceeds the patterns used in Anthropic and LlamaCpp tests
  • All edge cases are properly handled including None mime types and role restrictions
  • Watson supports fewer image formats (JPEG, PNG only) compared to Anthropic/LlamaCpp (which also support GIF, WebP)

Checklist

@ChinmayBansal ChinmayBansal requested a review from a team as a code owner August 22, 2025 19:26
@ChinmayBansal ChinmayBansal requested review from sjrl and removed request for a team August 22, 2025 19:26
@github-actions github-actions bot added integration:watsonx type:documentation Improvements or additions to documentation labels Aug 22, 2025
@ChinmayBansal ChinmayBansal changed the title Feat/watsonx multimodal support feat:watsonx multimodal support Aug 22, 2025
@ChinmayBansal ChinmayBansal changed the title feat:watsonx multimodal support feat: watsonx multimodal support Aug 22, 2025
@sjrl
Copy link
Contributor

sjrl commented Aug 26, 2025

@ChinmayBansal thanks for your work on this!

One high-level comment:

  • Let's make sure to update the minimum version of haystack-ai in pyproject.toml to haystack-ai>=2.17.1 since ImageContent was only introduced in the latest release

@ChinmayBansal
Copy link
Contributor Author

Hi @sjrl,

I have addressed your feedback. Could you review?

Thanks!

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@sjrl sjrl merged commit 2ef5c82 into deepset-ai:main Aug 27, 2025
11 checks passed
@ChinmayBansal ChinmayBansal deleted the feat/watsonx-multimodal-support branch August 27, 2025 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:watsonx type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Image support in WatsonxChatGenerator
2 participants