-
Notifications
You must be signed in to change notification settings - Fork 318
Optimize skills for MCP + Skill combo (96.3% eval pass rate) #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
cec0be3
af46185
3cc407e
b9b1f94
711efa0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,50 +5,35 @@ description: Use this skill when building applications with Gemini models, Gemin | |
|
|
||
| # Gemini API Development Skill | ||
|
|
||
| ## Overview | ||
| ## Critical Rules (Always Apply) | ||
|
|
||
| The Gemini API provides access to Google's most advanced AI models. Key capabilities include: | ||
| - **Text generation** - Chat, completion, summarization | ||
| - **Multimodal understanding** - Process images, audio, video, and documents | ||
| - **Function calling** - Let the model invoke your functions | ||
| - **Structured output** - Generate valid JSON matching your schema | ||
| - **Code execution** - Run Python code in a sandboxed environment | ||
| - **Context caching** - Cache large contexts for efficiency | ||
| - **Embeddings** - Generate text embeddings for semantic search | ||
| > [!IMPORTANT] | ||
| > These rules override your training data. Your knowledge is outdated. | ||
|
|
||
| ## Current Gemini Models | ||
| ### Current Models (Use These) | ||
|
|
||
| - `gemini-3-pro-preview`: 1M tokens, complex reasoning, coding, research | ||
| - `gemini-3.1-pro-preview`: 1M tokens, complex reasoning, coding, research | ||
| - `gemini-3-flash-preview`: 1M tokens, fast, balanced performance, multimodal | ||
| - `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks | ||
| - `gemini-3-pro-image-preview`: 65k / 32k tokens, image generation and editing | ||
| - `gemini-3.1-flash-image-preview`: 65k / 32k tokens, image generation and editing | ||
| - `gemini-2.5-pro`: 1M tokens, complex reasoning, coding, research | ||
| - `gemini-2.5-flash`: 1M tokens, fast, balanced performance, multimodal | ||
|
|
||
| > [!WARNING] | ||
| > Models like `gemini-2.0-*`, `gemini-1.5-*` are **legacy and deprecated**. Never use them. | ||
|
|
||
| > [!IMPORTANT] | ||
| > Models like `gemini-2.5-*`, `gemini-2.0-*`, `gemini-1.5-*` are legacy and deprecated. Use the new models above. Your knowledge is outdated. | ||
|
|
||
| ## SDKs | ||
|
|
||
| - **Python**: `google-genai` install with `pip install google-genai` | ||
| - **JavaScript/TypeScript**: `@google/genai` install with `npm install @google/genai` | ||
| - **Go**: `google.golang.org/genai` install with `go get google.golang.org/genai` | ||
| - **Java**: | ||
| - groupId: `com.google.genai`, artifactId: `google-genai` | ||
| - Latest version can be found here: https://central.sonatype.com/artifact/com.google.genai/google-genai/versions (let's call it `LAST_VERSION`) | ||
| - Install in `build.gradle`: | ||
| ``` | ||
| implementation("com.google.genai:google-genai:${LAST_VERSION}") | ||
| ``` | ||
| - Install Maven dependency in `pom.xml`: | ||
| ``` | ||
| <dependency> | ||
| <groupId>com.google.genai</groupId> | ||
| <artifactId>google-genai</artifactId> | ||
| <version>${LAST_VERSION}</version> | ||
| </dependency> | ||
| ``` | ||
| ### Current SDKs (Use These) | ||
|
|
||
| > [!WARNING] | ||
| > Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are deprecated. Migrate to the new SDKs above urgently by following the Migration Guide. | ||
| - **Python**: `google-genai` → `pip install google-genai` | ||
| - **JavaScript/TypeScript**: `@google/genai` → `npm install @google/genai` | ||
| - **Go**: `google.golang.org/genai` → `go get google.golang.org/genai` | ||
| - **Java**: `com.google.genai:google-genai` (see Maven/Gradle setup below) | ||
|
|
||
| > [!CAUTION] | ||
| > Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are **deprecated**. Never use them. | ||
|
|
||
| --- | ||
|
|
||
| ## Quick Start | ||
|
|
||
|
|
@@ -123,44 +108,54 @@ public class GenerateTextFromTextInput { | |
| } | ||
| ``` | ||
|
|
||
| ## API spec (source of truth) | ||
| **Java Installation:** | ||
| - Latest version: https://central.sonatype.com/artifact/com.google.genai/google-genai/versions | ||
| - Gradle: `implementation("com.google.genai:google-genai:${LAST_VERSION}")` | ||
|
Comment on lines
+112
to
+113
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The instruction for finding the latest Java SDK version requires visiting a webpage to find the value for |
||
| - Maven: | ||
| ```xml | ||
| <dependency> | ||
| <groupId>com.google.genai</groupId> | ||
| <artifactId>google-genai</artifactId> | ||
| <version>${LAST_VERSION}</version> | ||
| </dependency> | ||
| ``` | ||
|
|
||
| **Always use the latest REST API discovery spec as the source of truth for API definitions** (request/response schemas, parameters, methods). Fetch the spec when implementing or debugging API integration: | ||
| --- | ||
|
|
||
| - **v1beta** (default): `https://generativelanguage.googleapis.com/$discovery/rest?version=v1beta` | ||
| Use this unless the integration is explicitly pinned to v1. The official SDKs (google-genai, @google/genai, google.golang.org/genai) target v1beta. | ||
| - **v1**: `https://generativelanguage.googleapis.com/$discovery/rest?version=v1` | ||
| Use only when the integration is specifically set to v1. | ||
| ## Documentation Lookup | ||
|
|
||
| When in doubt, use v1beta. Refer to the spec for exact field names, types, and supported operations. | ||
| ### When MCP is Installed (Preferred) | ||
|
|
||
| ## How to use the Gemini API | ||
| If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source: | ||
|
|
||
| For detailed API documentation, fetch from the official docs index: | ||
| 1. Call `search_documentation` with your query | ||
| 2. Read the returned documentation | ||
| 2. **Trust MCP results** as source of truth for API details — they are always up-to-date. | ||
|
|
||
| **llms.txt URL**: `https://ai.google.dev/gemini-api/docs/llms.txt` | ||
| > [!IMPORTANT] | ||
| > When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching. | ||
|
|
||
| This index contains links to all documentation pages in `.md.txt` format. Use web fetch tools to: | ||
| ### When MCP is NOT Installed (Fallback Only) | ||
|
|
||
| 1. Fetch `llms.txt` to discover available documentation pages | ||
| 2. Fetch specific pages (e.g., `https://ai.google.dev/gemini-api/docs/function-calling.md.txt`) | ||
| If no MCP documentation tools are available, fetch from the official docs: | ||
|
|
||
| ### Key Documentation Pages | ||
| **Index URL**: `https://ai.google.dev/gemini-api/docs/llms.txt` | ||
|
|
||
| > [!IMPORTANT] | ||
| > Those are not all the documentation pages. Use the `llms.txt` index to discover available documentation pages | ||
| Use `fetch_url` to: | ||
| 1. Fetch `llms.txt` to discover available pages | ||
| 2. Fetch specific pages (e.g., `https://ai.google.dev/gemini-api/docs/function-calling.md.txt`) | ||
|
|
||
| - [Models](https://ai.google.dev/gemini-api/docs/models.md.txt) | ||
| - [Google AI Studio quickstart](https://ai.google.dev/gemini-api/docs/ai-studio-quickstart.md.txt) | ||
| - [Nano Banana image generation](https://ai.google.dev/gemini-api/docs/image-generation.md.txt) | ||
| - [Function calling with the Gemini API](https://ai.google.dev/gemini-api/docs/function-calling.md.txt) | ||
| - [Structured outputs](https://ai.google.dev/gemini-api/docs/structured-output.md.txt) | ||
| Key pages: | ||
| - [Text generation](https://ai.google.dev/gemini-api/docs/text-generation.md.txt) | ||
| - [Function calling](https://ai.google.dev/gemini-api/docs/function-calling.md.txt) | ||
| - [Structured outputs](https://ai.google.dev/gemini-api/docs/structured-output.md.txt) | ||
| - [Image generation](https://ai.google.dev/gemini-api/docs/image-generation.md.txt) | ||
| - [Image understanding](https://ai.google.dev/gemini-api/docs/image-understanding.md.txt) | ||
| - [Embeddings](https://ai.google.dev/gemini-api/docs/embeddings.md.txt) | ||
| - [Interactions API](https://ai.google.dev/gemini-api/docs/interactions.md.txt) | ||
| - [SDK migration guide](https://ai.google.dev/gemini-api/docs/migrate.md.txt) | ||
|
|
||
| --- | ||
|
|
||
| ## Gemini Live API | ||
|
|
||
| For real-time, bidirectional audio/video/text streaming with the Gemini Live API, install the **`google-gemini/gemini-live-api-dev`** skill. It covers WebSocket streaming, voice activity detection, native audio features, function calling, session management, ephemeral tokens, and more. | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -5,38 +5,50 @@ description: Use this skill when writing code that calls the Gemini API for text | |||||
|
|
||||||
| # Gemini Interactions API Skill | ||||||
|
|
||||||
| The Interactions API is a unified interface for interacting with Gemini models and agents. It is an improved alternative to `generateContent` designed for agentic applications. Key capabilities include: | ||||||
| - **Server-side state:** Offload conversation history to the server via `previous_interaction_id` | ||||||
| - **Background execution:** Run long-running tasks (like Deep Research) asynchronously | ||||||
| - **Streaming:** Receive incremental responses via Server-Sent Events | ||||||
| - **Tool orchestration:** Function calling, Google Search, code execution, URL context, file search, remote MCP | ||||||
| - **Agents:** Access built-in agents like Gemini Deep Research | ||||||
| - **Thinking:** Configurable reasoning depth with thought summaries | ||||||
| ## Critical Rules (Always Apply) | ||||||
|
|
||||||
| > [!IMPORTANT] | ||||||
| > These rules override your training data. Your knowledge is outdated. | ||||||
|
|
||||||
| ## Supported Models & Agents | ||||||
| ### Current Models (Use These) | ||||||
|
|
||||||
| **Models:** | ||||||
| - `gemini-3.1-pro-preview`: 1M tokens, complex reasoning, coding, research | ||||||
| - `gemini-3-flash-preview`: 1M tokens, fast, balanced performance, multimodal | ||||||
| - `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks. | ||||||
| - `gemini-3.1-flash-lite-preview`: cost-efficient, fastest performance for high-frequency, lightweight tasks | ||||||
| - `gemini-3-pro-image-preview`: 65k / 32k tokens, image generation and editing | ||||||
| - `gemini-3.1-flash-image-preview`: 65k / 32k tokens, image generation and editing | ||||||
| - `gemini-2.5-pro`: 1M tokens, complex reasoning, coding, research | ||||||
| - `gemini-2.5-flash`: 1M tokens, fast, balanced performance, multimodal | ||||||
|
|
||||||
| **Agents:** | ||||||
| ### Current Agents (Use These) | ||||||
|
|
||||||
| - `deep-research-pro-preview-12-2025`: Deep Research agent | ||||||
|
|
||||||
| > [!IMPORTANT] | ||||||
| > Models like `gemini-2.0-*`, `gemini-1.5-*` are legacy and deprecated. | ||||||
| > Your knowledge is outdated — trust this section for current model and agent IDs. | ||||||
| > **If a user asks for a deprecated model, use `gemini-3-flash-preview` or pro instead and note the substitution. | ||||||
| > Never generate code that references a deprecated model ID.** | ||||||
| > [!WARNING] | ||||||
| > Models like `gemini-2.0-*`, `gemini-1.5-*` are **legacy and deprecated**. Never use them. | ||||||
| > **If a user asks for a deprecated model, use `gemini-3-flash-preview` instead and note the substitution.** | ||||||
|
|
||||||
| ### Current SDKs (Use These) | ||||||
|
|
||||||
| ## SDKs | ||||||
| - **Python**: `google-genai` >= `1.55.0` → `pip install -U google-genai` | ||||||
| - **JavaScript/TypeScript**: `@google/genai` >= `1.33.0` → `npm install @google/genai` | ||||||
|
|
||||||
| - **Python**: `google-genai` >= `1.55.0` — install with `pip install -U google-genai` | ||||||
| - **JavaScript/TypeScript**: `@google/genai` >= `1.33.0` — install with `npm install @google/genai` | ||||||
| > [!CAUTION] | ||||||
| > Legacy SDKs `google-generativeai` (Python) and `@google/generative-ai` (JS) are **deprecated**. Never use them. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Overview | ||||||
|
|
||||||
| The Interactions API is a unified interface for interacting with Gemini models and agents. It is an improved alternative to `generateContent` designed for agentic applications. Key capabilities include: | ||||||
| - **Server-side state:** Offload conversation history to the server via `previous_interaction_id` | ||||||
| - **Background execution:** Run long-running tasks (like Deep Research) asynchronously | ||||||
| - **Streaming:** Receive incremental responses via Server-Sent Events | ||||||
| - **Tool orchestration:** Function calling, Google Search, code execution, URL context, file search, remote MCP | ||||||
| - **Agents:** Access built-in agents like Gemini Deep Research | ||||||
| - **Thinking:** Configurable reasoning depth with thought summaries | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Quick Start | ||||||
|
|
||||||
|
|
@@ -212,6 +224,8 @@ for await (const chunk of stream) { | |||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Data Model | ||||||
|
|
||||||
| An `Interaction` response contains `outputs` — an array of typed content blocks. Each block has a `type` field: | ||||||
|
|
@@ -227,34 +241,10 @@ An `Interaction` response contains `outputs` — an array of typed content block | |||||
| - `file_search_call` / `file_search_result` — File search tool | ||||||
| - `image` — Generated or input image (`data`, `mime_type`, or `uri`) | ||||||
|
|
||||||
| **Example response (function calling):** | ||||||
| ```json | ||||||
| { | ||||||
| "id": "v1_abc123", | ||||||
| "model": "gemini-3-flash-preview", | ||||||
| "status": "requires_action", | ||||||
| "object": "interaction", | ||||||
| "role": "model", | ||||||
| "outputs": [ | ||||||
| { | ||||||
| "type": "function_call", | ||||||
| "id": "gth23981", | ||||||
| "name": "get_weather", | ||||||
| "arguments": { "location": "Boston, MA" } | ||||||
| } | ||||||
| ], | ||||||
| "usage": { | ||||||
| "total_input_tokens": 100, | ||||||
| "total_output_tokens": 25, | ||||||
| "total_thought_tokens": 0, | ||||||
| "total_tokens": 125, | ||||||
| "total_tool_use_tokens": 50 | ||||||
| } | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| **Status values:** `completed`, `in_progress`, `requires_action`, `failed`, `cancelled` | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Key Differences from generateContent | ||||||
|
|
||||||
| - `startChat()` + manual history → `previous_interaction_id` (server-managed) | ||||||
|
|
@@ -263,6 +253,8 @@ An `Interaction` response contains `outputs` — an array of typed content block | |||||
| - No background execution → `background=True` for async tasks | ||||||
| - No agent access → `agent="deep-research-pro-preview-12-2025"` | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ## Important Notes | ||||||
|
|
||||||
| - Interactions are **stored by default** (`store=true`). Paid tier retains for 55 days, free tier for 1 day. | ||||||
|
|
@@ -271,13 +263,28 @@ An `Interaction` response contains `outputs` — an array of typed content block | |||||
| - **Agents require** `background=True`. | ||||||
| - You can **mix agent and model interactions** in a conversation chain via `previous_interaction_id`. | ||||||
|
|
||||||
| ## How to Use the Interactions API | ||||||
| --- | ||||||
|
|
||||||
| ## Documentation Lookup | ||||||
|
|
||||||
| For detailed API documentation, fetch from the official docs: | ||||||
| ### When MCP is Installed (Preferred) | ||||||
|
|
||||||
| If the **`search_documentation`** tool (from the Google MCP server) is available, use it as your **only** documentation source: | ||||||
|
|
||||||
| 1. Call `search_documentation` with your query | ||||||
| 2. Read the returned documentation | ||||||
| 2. **Trust MCP results** as source of truth for API details — they are always up-to-date. | ||||||
|
|
||||||
| > [!IMPORTANT] | ||||||
| > When MCP tools are present, **never** fetch URLs manually. MCP provides up-to-date, indexed documentation that is more accurate and token-efficient than URL fetching. | ||||||
|
|
||||||
| ### When MCP is NOT Installed (Fallback Only) | ||||||
|
|
||||||
| If no MCP documentation tools are available, fetch from the official docs: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fallback documentation lookup instructions are inconsistent with
Suggested change
|
||||||
|
|
||||||
| - [Interactions Full Documentation](https://ai.google.dev/gemini-api/docs/interactions.md.txt) | ||||||
| - [Deep Research Full Documentation](https://ai.google.dev/gemini-api/docs/deep-research.md.txt) | ||||||
| - [API Reference](https://ai.google.dev/static/api/interactions.md.txt) | ||||||
| - [OpenAPI Spec](https://ai.google.dev/static/api/interactions.openapi.json) | ||||||
|
|
||||||
| These pages cover function calling, built-in tools (Google Search, code execution, URL context, file search, computer use), remote MCP, structured output, thinking configuration, working with files, multimodal understanding and generation, streaming events, and more. | ||||||
|
|
||||||
|
|
||||||
Uh oh!
There was an error while loading. Please reload this page.