Skip to content

Vision Image Input

Arian Amiramjadi edited this page Dec 24, 2025 · 1 revision

Vision / Image Input

Send images to multimodal models like GPT-4o, Claude, and Gemini.

Local Files

// Single image
ai.GPT4o().
    Image("screenshot.png").
    Ask("What's shown in this screenshot?")

// Multiple images
ai.Claude().
    Images("before.png", "after.png").
    Ask("What changed between these two images?")

URLs

ai.Gemini().
    ImageURL("https://example.com/diagram.png").
    Ask("Explain this diagram")

Image Detail Level

Control quality vs token usage:

// Low detail = faster, fewer tokens
ai.GPT4o().
    ImageWithDetail("large-image.jpg", ai.ImageDetailLow).
    Ask("What's the main subject?")

// High detail = more accurate, more tokens
ai.GPT4o().
    ImageWithDetail("complex-chart.png", ai.ImageDetailHigh).
    Ask("Extract all data points from this chart")

// Auto (default) = model decides
ai.GPT4o().
    Image("photo.jpg").  // Uses ImageDetailAuto
    Ask("Describe this")

Base64 Images

For programmatic use:

imageData := base64.StdEncoding.EncodeToString(rawBytes)

ai.Claude().
    ImageBase64(imageData, "image/png").
    Ask("Analyze this image")

Mixing with Other Features

Vision works with all builder methods:

ai.GPT4o().
    System("You are an expert art critic").
    Context("art-history.md").
    Image("painting.jpg").
    Temperature(0.3).
    JSON().
    Ask("Analyze this painting and return {style, period, mood}")

Supported Models

Not all models support vision. Use these for images:

  • ai.GPT4o() / ai.GPT5() - OpenAI vision models
  • ai.Claude() - Anthropic Claude with vision
  • ai.Gemini() - Google Gemini Pro Vision

Supported Image Formats

  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • WebP (.webp)

PDF / Document Input

Send PDF documents to models that support them (Claude, Gemini).

Local PDF Files

// Single PDF
ai.Anthropic().Claude().
    PDF("report.pdf").
    Ask("Summarize this document")

// Multiple PDFs
ai.Google().Gemini().
    PDFs("doc1.pdf", "doc2.pdf").
    Ask("Compare these two documents")

PDF URLs

ai.Claude().
    PDFURL("https://example.com/whitepaper.pdf").
    Ask("What are the key findings?")

Base64 PDFs

For programmatic use:

pdfData := base64.StdEncoding.EncodeToString(rawBytes)

ai.Anthropic().Claude().
    PDFBase64(pdfData).
    Ask("Extract the main points")

Mixing PDFs with Images

Combine documents and images in one request:

ai.Google().GeminiPro().
    PDF("report.pdf").
    Image("chart.png").
    Ask("Explain the chart in context of the report")

Generic Document Method

Auto-detects document type:

ai.Claude().
    Document("file.pdf").  // Auto-detects PDF
    Ask("Summarize this")

Supported Providers

Not all providers support PDF input:

Provider PDF Support
Anthropic (Claude)
Google (Gemini)
OpenAI
OpenRouter
Ollama

Supported Document Formats

  • PDF (.pdf)

Clone this wiki locally