-
Notifications
You must be signed in to change notification settings - Fork 0
Vision Image Input
Arian Amiramjadi edited this page Dec 24, 2025
·
1 revision
Send images to multimodal models like GPT-4o, Claude, and Gemini.
// Single image
ai.GPT4o().
Image("screenshot.png").
Ask("What's shown in this screenshot?")
// Multiple images
ai.Claude().
Images("before.png", "after.png").
Ask("What changed between these two images?")ai.Gemini().
ImageURL("https://example.com/diagram.png").
Ask("Explain this diagram")Control quality vs token usage:
// Low detail = faster, fewer tokens
ai.GPT4o().
ImageWithDetail("large-image.jpg", ai.ImageDetailLow).
Ask("What's the main subject?")
// High detail = more accurate, more tokens
ai.GPT4o().
ImageWithDetail("complex-chart.png", ai.ImageDetailHigh).
Ask("Extract all data points from this chart")
// Auto (default) = model decides
ai.GPT4o().
Image("photo.jpg"). // Uses ImageDetailAuto
Ask("Describe this")For programmatic use:
imageData := base64.StdEncoding.EncodeToString(rawBytes)
ai.Claude().
ImageBase64(imageData, "image/png").
Ask("Analyze this image")Vision works with all builder methods:
ai.GPT4o().
System("You are an expert art critic").
Context("art-history.md").
Image("painting.jpg").
Temperature(0.3).
JSON().
Ask("Analyze this painting and return {style, period, mood}")Not all models support vision. Use these for images:
-
ai.GPT4o()/ai.GPT5()- OpenAI vision models -
ai.Claude()- Anthropic Claude with vision -
ai.Gemini()- Google Gemini Pro Vision
- PNG (
.png) - JPEG (
.jpg,.jpeg) - GIF (
.gif) - WebP (
.webp)
Send PDF documents to models that support them (Claude, Gemini).
// Single PDF
ai.Anthropic().Claude().
PDF("report.pdf").
Ask("Summarize this document")
// Multiple PDFs
ai.Google().Gemini().
PDFs("doc1.pdf", "doc2.pdf").
Ask("Compare these two documents")ai.Claude().
PDFURL("https://example.com/whitepaper.pdf").
Ask("What are the key findings?")For programmatic use:
pdfData := base64.StdEncoding.EncodeToString(rawBytes)
ai.Anthropic().Claude().
PDFBase64(pdfData).
Ask("Extract the main points")Combine documents and images in one request:
ai.Google().GeminiPro().
PDF("report.pdf").
Image("chart.png").
Ask("Explain the chart in context of the report")Auto-detects document type:
ai.Claude().
Document("file.pdf"). // Auto-detects PDF
Ask("Summarize this")Not all providers support PDF input:
| Provider | PDF Support |
|---|---|
| Anthropic (Claude) | ✅ |
| Google (Gemini) | ✅ |
| OpenAI | ❌ |
| OpenRouter | ❌ |
| Ollama | ❌ |
- PDF (
.pdf)