Skip to content

Conversation

@usnavy13
Copy link
Contributor

@usnavy13 usnavy13 commented Nov 25, 2025

Summary

Adds a new image generation tool integrating Google's Gemini Image Models with support for both text-to-image generation and image context-aware editing.

Key Features:

  • Dual API Support: Works with both Gemini API (simple API key) and Vertex AI (service account)
  • Configurable Model: Use \GEMINI_IMAGE_MODEL\ env var to switch between models (default: \gemini-2.0-flash-exp, also supports \gemini-3-pro-image-preview)
  • Image Context: Can use existing images as context/inspiration for generation
  • Multi-Storage Support: Compatible with local, S3, Azure, and Firebase storage strategies
  • Safety Filtering: User-friendly error messages for content policy violations

Configuration:
\\env

Option 1: Gemini API (recommended for most users)

GEMINI_API_KEY=your-api-key

Option 2: Vertex AI

GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
GOOGLE_CLOUD_LOCATION=us-central1

Optional: Change model (default: gemini-2.5-flash-image)

GEMINI_IMAGE_MODEL=gemini-3-pro-image-preview
\\

Builds upon and addresses feedback from #9538

cc @devilb2103 @danny-avila

Change Type

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Testing

Tested locally with both Gemini API and Vertex AI configurations:

  1. Text-to-image generation with various prompts
  2. Image editing/context-aware generation using existing images
  3. Safety filter handling for blocked content
  4. Local and cloud storage strategies

Test Configuration:

  • Node.js v20
  • Local file storage strategy
  • Gemini API with gemini-3-pro-image-preview

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • I have made pertinent documentation changes
  • My changes do not introduce new warnings
  • Local unit tests pass with my changes
  • A pull request for updating the documentation has been submitted.

devilb2103 and others added 13 commits September 10, 2025 15:31
* Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override.
* Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths.
- Bump @google/genai package version to ^1.19.0 for improved functionality.
- Refactor GeminiImageGen to createGeminiImageTool for better clarity and consistency.
- Enhance manifest.json for Gemini Image Tools with updated descriptions and icon.
- Add SVG icon for Gemini Image Tools.
- Implement progress tracking for Gemini image generation in the UI.
- Introduce new toolkit and context handling for image generation tools.

This update improves the Gemini image generation capabilities and user experience.
…icon

- Deleted the obsolete PNG file for Gemini image generation.
- Updated the SVG icon with a new design featuring a gradient and shadow effect, enhancing visual appeal and consistency.
@usnavy13
Copy link
Contributor Author

@danny-avila Corresponding Docs PR LibreChat-AI/librechat.ai#452

@KiGamji
Copy link
Contributor

KiGamji commented Nov 26, 2025

shouldn't it also work natively?

@KiGamji
Copy link
Contributor

KiGamji commented Nov 26, 2025

image

like this

@KiGamji
Copy link
Contributor

KiGamji commented Nov 26, 2025

nvm, that should invoke tools too lmao

@usnavy13
Copy link
Contributor Author

shouldn't it also work natively?

I was thinking about that but this would be a departure from how the project handles image tools. I organized it similar to the openai tools so the workflows stay the same for users

@KiGamji
Copy link
Contributor

KiGamji commented Nov 26, 2025

@danny-avila with native multimodal image generation models appearing, it would be great to implement this functionality actually!

image

like this

@avimar
Copy link

avimar commented Nov 30, 2025

This is great that it's a tool - it can be called by other models.

But yes, there's more models that can natively return text AND images (and audio?), so that would be good if it can handle that too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants