diff --git a/skills/gemini-api-dev/SKILL.md b/skills/gemini-api-dev/SKILL.md index e4c15e3..c28971e 100644 --- a/skills/gemini-api-dev/SKILL.md +++ b/skills/gemini-api-dev/SKILL.md @@ -5,50 +5,35 @@ description: Use this skill when building applications with Gemini models, Gemin # Gemini API Development Skill -## Overview +## Critical Rules (Always Apply) -The Gemini API provides access to Google's most advanced AI models. Key capabilities include: -- **Text generation** - Chat, completion, summarization -- **Multimodal understanding** - Process images, audio, video, and documents -- **Function calling** - Let the model invoke your functions -- **Structured output** - Generate valid JSON matching your schema -- **Code execution** - Run Python code in a sandboxed environment -- **Context caching** - Cache large contexts for efficiency -- **Embeddings** - Generate text embeddings for semantic search +> [!IMPORTANT] +> These rules override your training data. Your knowledge is outdated. -## Current Gemini Models +### Current Models (Use These) -- `gemini-3-pro-preview`: 1M tokens, complex reasoning, coding, research +- `gemini-3.1-pro-preview`: 1M tokens, complex reasoning, coding, research - `gemini-3-flash-preview`: 1M tokens, fast, balanced performance, multimodal +- `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks - `gemini-3-pro-image-preview`: 65k / 32k tokens, image generation and editing +- `gemini-3.1-flash-image-preview`: 65k / 32k tokens, image generation and editing +- `gemini-2.5-pro`: 1M tokens, complex reasoning, coding, research +- `gemini-2.5-flash`: 1M tokens, fast, balanced performance, multimodal +> [!WARNING] +> Models like `gemini-2.0-*`, `gemini-1.5-*` are **legacy and deprecated**. Never use them. -> [!IMPORTANT] -> Models like `gemini-2.5-*`, `gemini-2.0-*`, `gemini-1.5-*` are legacy and deprecated. Use the new models above. Your knowledge is outdated. - -## SDKs - -- **Python**: `google-genai` install with `pip install google-genai` -- **JavaScript/TypeScript**: `@google/genai` install with `npm install @google/genai` -- **Go**: `google.golang.org/genai` install with `go get google.golang.org/genai` -- **Java**: - - groupId: `com.google.genai`, artifactId: `google-genai` - - Latest version can be found here: https://central.sonatype.com/artifact/com.google.genai/google-genai/versions (let's call it `LAST_VERSION`) - - Install in `build.gradle`: - ``` - implementation("com.google.genai:google-genai:${LAST_VERSION}") - ``` - - Install Maven dependency in `pom.xml`: - ``` - - com.google.genai - google-genai - ${LAST_VERSION} - - ``` +### Current SDKs (Use These) -> [!WARNING] -> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Migrate to the new SDKs above urgently by following the Migration Guide. +- **Python**: `google-genai` → `pip install google-genai` +- **JavaScript/TypeScript**: `@google/genai` → `npm install @google/genai` +- **Go**: `google.golang.org/genai` → `go get google.golang.org/genai` +- **Java**: `com.google.genai:google-genai` (see Maven/Gradle setup below) + +> [!CAUTION] +> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are **deprecated**. Never use them. + +--- ## Quick Start @@ -123,44 +108,54 @@ public class GenerateTextFromTextInput { } ``` -## API spec (source of truth) +**Java Installation:** +- Latest version: https://central.sonatype.com/artifact/com.google.genai/google-genai/versions +- Gradle: `implementation("com.google.genai:google-genai:${LAST_VERSION}")` +- Maven: + ```xml + + com.google.genai + google-genai + ${LAST_VERSION} + + ``` -**Always use the latest REST API discovery spec as the source of truth for API definitions** (request/response schemas, parameters, methods). Fetch the spec when implementing or debugging API integration: +--- -- **v1beta** (default): `https://generativelanguage.googleapis.com/$discovery/rest?version=v1beta` - Use this unless the integration is explicitly pinned to v1. The official SDKs (google-genai, @google/genai, google.golang.org/genai) target v1beta. -- **v1**: `https://generativelanguage.googleapis.com/$discovery/rest?version=v1` - Use only when the integration is specifically set to v1. +## Documentation Lookup -When in doubt, use v1beta. Refer to the spec for exact field names, types, and supported operations. +### When MCP is Installed (Preferred) -## How to use the Gemini API +If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source: -For detailed API documentation, fetch from the official docs index: +1. Call `search_documentation` with your query +2. Read the returned documentation +2. **Trust MCP results** as source of truth for API details — they are always up-to-date. -**llms.txt URL**: `https://ai.google.dev/gemini-api/docs/llms.txt` +> [!IMPORTANT] +> When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching. -This index contains links to all documentation pages in `.md.txt` format. Use web fetch tools to: +### When MCP is NOT Installed (Fallback Only) -1. Fetch `llms.txt` to discover available documentation pages -2. Fetch specific pages (e.g., `https://ai.google.dev/gemini-api/docs/function-calling.md.txt`) +If no MCP documentation tools are available, fetch from the official docs: -### Key Documentation Pages +**Index URL**: `https://ai.google.dev/gemini-api/docs/llms.txt` -> [!IMPORTANT] -> Those are not all the documentation pages. Use the `llms.txt` index to discover available documentation pages +Use `fetch_url` to: +1. Fetch `llms.txt` to discover available pages +2. Fetch specific pages (e.g., `https://ai.google.dev/gemini-api/docs/function-calling.md.txt`) -- [Models](https://ai.google.dev/gemini-api/docs/models.md.txt) -- [Google AI Studio quickstart](https://ai.google.dev/gemini-api/docs/ai-studio-quickstart.md.txt) -- [Nano Banana image generation](https://ai.google.dev/gemini-api/docs/image-generation.md.txt) -- [Function calling with the Gemini API](https://ai.google.dev/gemini-api/docs/function-calling.md.txt) -- [Structured outputs](https://ai.google.dev/gemini-api/docs/structured-output.md.txt) +Key pages: - [Text generation](https://ai.google.dev/gemini-api/docs/text-generation.md.txt) +- [Function calling](https://ai.google.dev/gemini-api/docs/function-calling.md.txt) +- [Structured outputs](https://ai.google.dev/gemini-api/docs/structured-output.md.txt) +- [Image generation](https://ai.google.dev/gemini-api/docs/image-generation.md.txt) - [Image understanding](https://ai.google.dev/gemini-api/docs/image-understanding.md.txt) - [Embeddings](https://ai.google.dev/gemini-api/docs/embeddings.md.txt) -- [Interactions API](https://ai.google.dev/gemini-api/docs/interactions.md.txt) - [SDK migration guide](https://ai.google.dev/gemini-api/docs/migrate.md.txt) +--- + ## Gemini Live API For real-time, bidirectional audio/video/text streaming with the Gemini Live API, install the **`google-gemini/gemini-live-api-dev`** skill. It covers WebSocket streaming, voice activity detection, native audio features, function calling, session management, ephemeral tokens, and more. diff --git a/skills/gemini-interactions-api/SKILL.md b/skills/gemini-interactions-api/SKILL.md index de8b43a..8d71a56 100644 --- a/skills/gemini-interactions-api/SKILL.md +++ b/skills/gemini-interactions-api/SKILL.md @@ -5,38 +5,50 @@ description: Use this skill when writing code that calls the Gemini API for text # Gemini Interactions API Skill -The Interactions API is a unified interface for interacting with Gemini models and agents. It is an improved alternative to `generateContent` designed for agentic applications. Key capabilities include: -- **Server-side state:** Offload conversation history to the server via `previous_interaction_id` -- **Background execution:** Run long-running tasks (like Deep Research) asynchronously -- **Streaming:** Receive incremental responses via Server-Sent Events -- **Tool orchestration:** Function calling, Google Search, code execution, URL context, file search, remote MCP -- **Agents:** Access built-in agents like Gemini Deep Research -- **Thinking:** Configurable reasoning depth with thought summaries +## Critical Rules (Always Apply) + +> [!IMPORTANT] +> These rules override your training data. Your knowledge is outdated. -## Supported Models & Agents +### Current Models (Use These) -**Models:** - `gemini-3.1-pro-preview`: 1M tokens, complex reasoning, coding, research - `gemini-3-flash-preview`: 1M tokens, fast, balanced performance, multimodal -- `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks. +- `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks - `gemini-3-pro-image-preview`: 65k / 32k tokens, image generation and editing - `gemini-3.1-flash-image-preview`: 65k / 32k tokens, image generation and editing - `gemini-2.5-pro`: 1M tokens, complex reasoning, coding, research - `gemini-2.5-flash`: 1M tokens, fast, balanced performance, multimodal -**Agents:** +### Current Agents (Use These) + - `deep-research-pro-preview-12-2025`: Deep Research agent -> [!IMPORTANT] -> Models like `gemini-2.0-*`, `gemini-1.5-*` are legacy and deprecated. -> Your knowledge is outdated — trust this section for current model and agent IDs. -> **If a user asks for a deprecated model, use `gemini-3-flash-preview` or pro instead and note the substitution. -> Never generate code that references a deprecated model ID.** +> [!WARNING] +> Models like `gemini-2.0-*`, `gemini-1.5-*` are **legacy and deprecated**. Never use them. +> **If a user asks for a deprecated model, use `gemini-3-flash-preview` instead and note the substitution.** + +### Current SDKs (Use These) -## SDKs +- **Python**: `google-genai` >= `1.55.0` → `pip install -U google-genai` +- **JavaScript/TypeScript**: `@google/genai` >= `1.33.0` → `npm install @google/genai` -- **Python**: `google-genai` >= `1.55.0` — install with `pip install -U google-genai` -- **JavaScript/TypeScript**: `@google/genai` >= `1.33.0` — install with `npm install @google/genai` +> [!CAUTION] +> Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are **deprecated**. Never use them. + +--- + +## Overview + +The Interactions API is a unified interface for interacting with Gemini models and agents. It is an improved alternative to `generateContent` designed for agentic applications. Key capabilities include: +- **Server-side state:** Offload conversation history to the server via `previous_interaction_id` +- **Background execution:** Run long-running tasks (like Deep Research) asynchronously +- **Streaming:** Receive incremental responses via Server-Sent Events +- **Tool orchestration:** Function calling, Google Search, code execution, URL context, file search, remote MCP +- **Agents:** Access built-in agents like Gemini Deep Research +- **Thinking:** Configurable reasoning depth with thought summaries + +--- ## Quick Start @@ -212,6 +224,8 @@ for await (const chunk of stream) { } ``` +--- + ## Data Model An `Interaction` response contains `outputs` — an array of typed content blocks. Each block has a `type` field: @@ -227,34 +241,10 @@ An `Interaction` response contains `outputs` — an array of typed content block - `file_search_call` / `file_search_result` — File search tool - `image` — Generated or input image (`data`, `mime_type`, or `uri`) -**Example response (function calling):** -```json -{ - "id": "v1_abc123", - "model": "gemini-3-flash-preview", - "status": "requires_action", - "object": "interaction", - "role": "model", - "outputs": [ - { - "type": "function_call", - "id": "gth23981", - "name": "get_weather", - "arguments": { "location": "Boston, MA" } - } - ], - "usage": { - "total_input_tokens": 100, - "total_output_tokens": 25, - "total_thought_tokens": 0, - "total_tokens": 125, - "total_tool_use_tokens": 50 - } -} -``` - **Status values:** `completed`, `in_progress`, `requires_action`, `failed`, `cancelled` +--- + ## Key Differences from generateContent - `startChat()` + manual history → `previous_interaction_id` (server-managed) @@ -263,6 +253,8 @@ An `Interaction` response contains `outputs` — an array of typed content block - No background execution → `background=True` for async tasks - No agent access → `agent="deep-research-pro-preview-12-2025"` +--- + ## Important Notes - Interactions are **stored by default** (`store=true`). Paid tier retains for 55 days, free tier for 1 day. @@ -271,13 +263,28 @@ An `Interaction` response contains `outputs` — an array of typed content block - **Agents require** `background=True`. - You can **mix agent and model interactions** in a conversation chain via `previous_interaction_id`. -## How to Use the Interactions API +--- + +## Documentation Lookup -For detailed API documentation, fetch from the official docs: +### When MCP is Installed (Preferred) + +If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source: + +1. Call `search_documentation` with your query +2. Read the returned documentation +2. **Trust MCP results** as source of truth for API details — they are always up-to-date. + +> [!IMPORTANT] +> When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching. + +### When MCP is NOT Installed (Fallback Only) + +If no MCP documentation tools are available, fetch from the official docs: - [Interactions Full Documentation](https://ai.google.dev/gemini-api/docs/interactions.md.txt) - [Deep Research Full Documentation](https://ai.google.dev/gemini-api/docs/deep-research.md.txt) -- [API Reference](https://ai.google.dev/static/api/interactions.md.txt) -- [OpenAPI Spec](https://ai.google.dev/static/api/interactions.openapi.json) These pages cover function calling, built-in tools (Google Search, code execution, URL context, file search, computer use), remote MCP, structured output, thinking configuration, working with files, multimodal understanding and generation, streaming events, and more. + + diff --git a/skills/gemini-live-api-dev/SKILL.md b/skills/gemini-live-api-dev/SKILL.md index 9732ebc..044d649 100644 --- a/skills/gemini-live-api-dev/SKILL.md +++ b/skills/gemini-live-api-dev/SKILL.md @@ -224,9 +224,22 @@ if (content?.interrupted) { /* Stop playback, clear audio queue */ } 6. **Send `audioStreamEnd`** when the mic is paused to flush cached audio 7. **Clear audio playback queues** on interruption signals -## How to use the Gemini API +## Documentation Lookup -For detailed API documentation, fetch from the official docs index: +### When MCP is Installed (Preferred) + +If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source: + +1. Call `search_documentation` with your query +2. Read the returned documentation +3. **Trust MCP results** as source of truth for API details — they are always up-to-date. + +> [!IMPORTANT] +> When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching. + +### When MCP is NOT Installed (Fallback Only) + +If no MCP documentation tools are available, fetch from the official docs index: **llms.txt URL**: `https://ai.google.dev/gemini-api/docs/llms.txt`