-
Notifications
You must be signed in to change notification settings - Fork 518
Description
Did you check the docs?
- I have read all the NeMo-Guardrails docs
Is your feature request related to a problem? Please describe.
Hi ,
I'm encountering an issue with the Pixtral model in the context of multimodal input support via the LangChain + NeMo Guardrails setup using VisionRails.
I have a working integration where I'm sending a chat completion payload that includes both base64-encoded image data and text input as part of the same message. This input format works perfectly with gpt-4o when called through my LLM Gateway using the ChatCompletions-compatible API.
potentially_unsafe_message = [{
"role": "user",
"content": [
{
"type": "text",
"text": "describe the image?",
},
{
"type": "image",
"source_type": "base64",
"data": base64_image,
"mime_type": "image/jpeg",
}
],
On checking the logs we checked that it filters the image base64 from the payload before passing it to the payload for Nemo using ChatCompletions.
The response given is : Sorry I dont know which image you are talking about .
vision_model = ChatOpenAI(api_key="None",
base_url=LLM_GW_ENDPOINT,
model='Pixtral',
default_headers=headers ,
streaming= False
)
Load configuration
config = RailsConfig.from_path("./config/")
Load configuration
rails = LLMRails(config,llm=vision_model,verbose=False)
Is there an alternative payload format or preprocessing step required to use Pixtral with Nemo?
Thanks in advance for your help!
Describe the solution you'd like
Is there an alternative payload format or preprocessing step required to use Pixtral with Nemo using images along with text?
Describe alternatives you've considered
What I’ve verified:
The payload format is valid and works with GPT-4o.The base64-encoded image is a standard JPEG, loaded correctly from file.Switching only the model name from gpt-4o to pixtral causes the issue.
Additional context
No response