Vision Image Input

Vision / Image Input

Send images to multimodal models like GPT-4o, Claude, and Gemini.

Local Files

// Single image
ai.GPT4o().
    Image("screenshot.png").
    Ask("What's shown in this screenshot?")

// Multiple images
ai.Claude().
    Images("before.png", "after.png").
    Ask("What changed between these two images?")

URLs

ai.Gemini().
    ImageURL("https://example.com/diagram.png").
    Ask("Explain this diagram")

Image Detail Level

Control quality vs token usage:

// Low detail = faster, fewer tokens
ai.GPT4o().
    ImageWithDetail("large-image.jpg", ai.ImageDetailLow).
    Ask("What's the main subject?")

// High detail = more accurate, more tokens
ai.GPT4o().
    ImageWithDetail("complex-chart.png", ai.ImageDetailHigh).
    Ask("Extract all data points from this chart")

// Auto (default) = model decides
ai.GPT4o().
    Image("photo.jpg").  // Uses ImageDetailAuto
    Ask("Describe this")

Base64 Images

For programmatic use:

imageData := base64.StdEncoding.EncodeToString(rawBytes)

ai.Claude().
    ImageBase64(imageData, "image/png").
    Ask("Analyze this image")

Mixing with Other Features

Vision works with all builder methods:

ai.GPT4o().
    System("You are an expert art critic").
    Context("art-history.md").
    Image("painting.jpg").
    Temperature(0.3).
    JSON().
    Ask("Analyze this painting and return {style, period, mood}")

Supported Models

Not all models support vision. Use these for images:

ai.GPT4o() / ai.GPT5() - OpenAI vision models
ai.Claude() - Anthropic Claude with vision
ai.Gemini() - Google Gemini Pro Vision

Supported Image Formats

PNG (.png)
JPEG (.jpg, .jpeg)
GIF (.gif)
WebP (.webp)

PDF / Document Input

Send PDF documents to models that support them (Claude, Gemini).

Local PDF Files

// Single PDF
ai.Anthropic().Claude().
    PDF("report.pdf").
    Ask("Summarize this document")

// Multiple PDFs
ai.Google().Gemini().
    PDFs("doc1.pdf", "doc2.pdf").
    Ask("Compare these two documents")

PDF URLs

ai.Claude().
    PDFURL("https://example.com/whitepaper.pdf").
    Ask("What are the key findings?")

Base64 PDFs

For programmatic use:

pdfData := base64.StdEncoding.EncodeToString(rawBytes)

ai.Anthropic().Claude().
    PDFBase64(pdfData).
    Ask("Extract the main points")

Mixing PDFs with Images

Combine documents and images in one request:

ai.Google().GeminiPro().
    PDF("report.pdf").
    Image("chart.png").
    Ask("Explain the chart in context of the report")

Generic Document Method

Auto-detects document type:

ai.Claude().
    Document("file.pdf").  // Auto-detects PDF
    Ask("Summarize this")

Supported Providers

Not all providers support PDF input:

Provider	PDF Support
Anthropic (Claude)	✅
Google (Gemini)	✅
OpenAI	❌
OpenRouter	❌
Ollama	❌

Supported Document Formats

PDF (.pdf)

Vision Image Input

Vision / Image Input

Local Files

URLs

Image Detail Level

Base64 Images

Mixing with Other Features

Supported Models

Supported Image Formats

PDF / Document Input

Local PDF Files

PDF URLs

Base64 PDFs

Mixing PDFs with Images

Generic Document Method

Supported Providers

Supported Document Formats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally