Skip to content

Conversation

@devilb2103
Copy link
Contributor

@devilb2103 devilb2103 commented Sep 10, 2025

Added Gemini Flash Image Generation Tool (Nano Banana + Nano Banana Pro)

trim.6A4F5154-48D1-4547-ABEF-C3A188733218.MOV

Summary

This PR (addresses #9283 and #6065) implements a new image generation tool using Google's Gemini 2.5 Flash Image Model with Vertex AI integration. The tool supports both text-only image generation and image context-aware generation for editing/modification requests. It includes handling of rejected tool calls due to content policy violoations, multiple file storage strategies, and seamless integration with LibreChat's existing agent tool ecosystem.

Key Features:

  • Text-to-image generation using Gemini 2.5 Flash Image Model
  • Image context support for editing existing images
  • Multiple storage strategies (local, S3, Azure, Firebase)
  • Safety filtering with user-friendly error messages

Configuration Requirements:

  • Google Cloud service account JSON file required at api/data/auth.json
  • Docker volume mapping: ./api/data/auth.json:/app/api/data/auth.json
  • Vertex AI API permissions for the service account
  • LibreChat YAML configuration update required

Change Type

  • New feature (non-breaking change which adds functionality)

Testing

The feature has been tested for:

  • ESLint compliance (all files pass)
  • Tool registration and integration
  • Docker container compatibility
  • File path resolution in containerized environment

Test Configuration:

  • Environment: Docker containerized deployment
  • Dependencies: @google/genai: ^1.17.0 added to api/package.json
  • Authentication: Google Cloud service account with Vertex AI permissions
  • Storage: Local and S3 file strategy tested (other strategies compatible - follow Open AI Image Tools Logic)

Test Steps:

  1. Build Docker container with new dependencies
  2. Configure Google service account credentials
  3. Enable gemini_image_gen tool in an agent
  4. Test text-to-image generation
  5. Test image context-aware generation
  6. Verify safety filtering functionality
  7. Verify if generated Images show up in attachments drop down in the collapsable LLM Tools section at the right hand side of UI

Checklist

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • My changes do not introduce new warnings
  • Local unit tests pass with my changes
  • Any changes dependent on mine have been merged and published in downstream modules.

Files Modified

Core Implementation

  • api/app/clients/tools/structured/GeminiImageGen.js - Main tool implementation
  • api/package.json - Added @google/genai dependency
  • packages/data-provider/src/config.ts - Added to imageGenTools set

Integration

  • api/app/clients/tools/index.js - Tool export
  • api/app/clients/tools/manifest.json - Tool manifest entry
  • api/app/clients/tools/util/handleTools.js - Tool registration and configuration
  • api/server/services/ToolService.js - Import statement

Documentation

  • api/app/clients/tools/structured/README-GeminiNanoBanana.md - Implementation guide

Breaking Changes

None. This is a purely additive feature that doesn't modify existing functionality.

Configuration Notes

Required Docker Configuration:

volumes:
  - ./api/data/auth.json:/app/api/data/auth.json

LibreChat YAML Configuration:
Add gemini_image_gen to the includedTools array in your librechat.yaml:

includedTools: ['gemini_image_gen']

Environment Variables

Add these to your .env file to configure the Gemini Image Generation tool:

Variable Description Default Required
GEMINI_IMAGE_PROVIDER Provider to use: vertex (service account) or gemini (API key) Auto-detected No
GEMINI_IMAGE_MODEL Model ID for image generation gemini-2.5-flash-image-preview No
GEMINI_API_KEY Gemini API key (required if using gemini provider) - If provider=gemini
GOOGLE_SERVICE_KEY_FILE Path to service account JSON (for Vertex AI) api/data/auth.json If provider=vertex
GOOGLE_CLOUD_LOCATION Google Cloud region for Vertex AI global No
GOOGLE_APPLICATION_CREDENTIALS Vertex AI Service account credentials path None Yes

Note: If GEMINI_IMAGE_PROVIDER is not set, the tool auto-detects:

  • Uses Vertex AI if GOOGLE_SERVICE_KEY_FILE exists or api/data/auth.json is present
  • Uses Gemini API if GEMINI_API_KEY is set

Note: Service Account Credentials must be manually set in .env in the following manner:

GOOGLE_APPLICATION_CREDENTIALS="data/auth.json"

Google Cloud Setup:

  1. Create a service account with Vertex AI permissions
  2. Download the service account key as JSON
  3. Place the file at api/data/auth.json
  4. Ensure the project ID matches your Google Cloud project

@devilb2103 devilb2103 changed the base branch from main to dev September 10, 2025 10:27
Copy link

@falkenbt falkenbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice, I got this to work on my machine. There is just a small issue with the credentials path, see inline comment.

* Refactored the credentials path to follow a consistent pattern with other Google service integrations, allowing for an environment variable override.
* Updated documentation in README-GeminiNanoBanana.md to reflect the new credentials handling approach and removed references to hardcoded paths.
@devilb2103
Copy link
Contributor Author

@falkenbt The issues are addressed. Thank you for the comments

@devilb2103 devilb2103 requested a review from falkenbt September 13, 2025 14:40
@lukaswelte
Copy link

It might only be my local configuration, but for me it says the image is generated, but it then does not show. I also can't find it anywhere on the local disk (using local storage strategy)

Do you have good pointers how I could further debug?

Thanks a lot for contributing this btw. - eagerly awaiting it :)

@petervcook
Copy link

A future related feature request after this ships: Nano Banana is supported by OpenRouter and it would be great to support Nano-Banana via OpenRouter in addition to Vertex API.

@devilb2103
Copy link
Contributor Author

It might only be my local configuration, but for me it says the image is generated, but it then does not show. I also can't find it anywhere on the local disk (using local storage strategy)

Do you have good pointers how I could further debug?

Thanks a lot for contributing this btw. - eagerly awaiting it :)

Could you please share the logs that appear in the librechat container when you try to generate an image?

@lukaswelte
Copy link

@devilb2103 thanks for that trivial hint - sorry I hadn't checked that 🤦
Error was in IAM permissions on GCP - problem solved, it works like a charm 🚀

Copy link

@falkenbt falkenbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks a lot. I cannot approve though, I'm just a user without elevated access.

@falkenbt
Copy link

@danny-avila any chance this could make it into the 0.80 release? This is really useful for a lot of people.

@devilb2103
Copy link
Contributor Author

hey @danny-avila do let me know what can be improved here so that this contribution can be merged without much efforts from the team 😄

@Dual-0
Copy link

Dual-0 commented Sep 21, 2025

Why is Vertex AI API used here? This also works with Gemini API

curl -s -X POST
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [
        {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}
      ]
    }]
  }' \
  | grep -o '"data": "[^"]*"' \
  | cut -d'"' -f4 \
  | base64 --decode > gemini-native-image.png

@devilb2103
Copy link
Contributor Author

devilb2103 commented Sep 21, 2025

Why is Vertex AI API used here? This also works with Gemini API

curl -s -X POST

  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \

  -H "x-goog-api-key: $GEMINI_API_KEY" \

  -H "Content-Type: application/json" \

  -d '{

    "contents": [{

      "parts": [

        {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}

      ]

    }]

  }' \

  | grep -o '"data": "[^"]*"' \

  | cut -d'"' -f4 \

  | base64 --decode > gemini-native-image.png

It should be fairly easy to have another librechat manifest var to switch between the already supported VertexAI option with GeminiAPI

This implementation makes it convenient for those who have access to Googles VertexAI platform

@falkenbt
Copy link

@danny-avila is this on your radar? It would be great to have an indication when this could be merged and released, from the reactions here and my surroundings a lot of people are eagerly awaiting this. Thanks!

@djp1971
Copy link

djp1971 commented Oct 21, 2025

We Would love to see this in the next release. i have a ton of users calling for it. tried making a action for this usinga. OpeAPI spec and it fails as it passes back base64 in a format that it cant decode. so this woudl be great as a tool.

@marlonka
Copy link
Contributor

@danny-avila hate to annoy you but once again asking if you could accept this PR into main plz since there is huge demand for nano banana in our enterprise

@raymond9zhou
Copy link

@danny-avila Just wanted to reiterate the importance of this feature. Adding native support for Gemini 2.5 is a major value-add for the platform. The PR is stable, tested, and ready to go. Merging this now would greatly benefit the user base who are waiting to utilize these image generation tools. Hoping to see this live soon!

@ZenDevMaster
Copy link

It must be discouraging for contributors to submit pull requests and then be ignored.

@RepLicanT-UHD
Copy link

RepLicanT-UHD commented Nov 20, 2025

@devilb2103 thanks for that trivial hint - sorry I hadn't checked that 🤦 Error was in IAM permissions on GCP - problem solved, it works like a charm 🚀

@lukaswelte Could you please tell the exact problem with IAM Permissions? I have Vertex AI Admin role and I'm still experiencing the same problem.

@devilb2103 Thank you for the great job! Just like to ask if this normal or not:

warn: [GeminiImageGen] Missing required parameters for storage, falling back to data URL
warn: [GeminiImageGen] Could not save to storage, using data URL

@lukaswelte
Copy link

@RepLicanT-UHD in my case I literally didn't have the role on the service account that I gave to librechat. Are you sure it picks up the service-account.json in your case?

@RepLicanT-UHD
Copy link

@RepLicanT-UHD in my case I literally didn't have the role on the service account that I gave to librechat. Are you sure it picks up the service-account.json in your case?

Thanks for your response! Yes it does. When I select any model other than 2.5 Flash for the Agent to execute requests via the 'gemini_image_gen' tool, the errors disappear (surprisingly, o4-mini handles this role particularly well). The only warnings remaining are warn: [GeminiImageGen] Missing required parameters for storage, falling back to data URL and warn: [GeminiImageGen] Could not save to storage, using data URL. Meanwhile, my File Strategy is set to local, and images are displayed and edited correctly.

However, as soon as I switch back to 2.5 Flash with any combination of reasoning and input/output tokens, it claims inside the thought process that the image was generated and shown to the user, even though that's not the case.

@marlonka
Copy link
Contributor

marlonka commented Nov 20, 2025

Does it support Gemini 3 Pro Image (aka Nano Banana Pro) already? Would be awesome for admins to choose in env or librechat.yaml the model or have a dropdown in the tool UI: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new image generation tool that integrates Google's Gemini 2.5 Flash Image Model with Vertex AI, supporting both text-to-image generation and image context-aware editing. The implementation follows existing LibreChat patterns for image generation tools and includes comprehensive safety filtering and multi-storage strategy support.

Key Changes

  • Added Gemini image generation tool with support for text prompts and image context editing
  • Integrated with existing LibreChat file storage strategies (local, S3, Azure, Firebase)
  • Implemented safety filtering with user-friendly error messages for content policy violations

Reviewed changes

Copilot reviewed 8 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
api/app/clients/tools/structured/GeminiImageGen.js Core implementation of Gemini image generation tool with image context support and safety filtering
api/package.json Added @google/genai dependency for Vertex AI integration
package-lock.json Locked dependency versions including @google/[email protected] and unrelated dicebear changes
api/app/clients/tools/index.js Exported GeminiImageGen tool for registration
api/app/clients/tools/manifest.json Added Gemini Image Tools manifest entry
api/app/clients/tools/util/handleTools.js Registered tool constructor with custom initialization logic and unrelated serpapi configuration
api/server/services/ToolService.js Added commented-out import (cleanup needed)
packages/data-provider/src/config.ts Added gemini_image_gen to image generation tools set
api/app/clients/tools/structured/README-GeminiNanoBanana.md Comprehensive documentation for setup and usage

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

@danny-avila
Copy link
Owner

danny-avila commented Nov 25, 2025

I understand the frustration with review delays. As maintainers, we need to balance responsiveness with ensuring thorough reviews, since incomplete features can create significant long-term maintenance burden. I'm reviewing this PR now and will provide detailed feedback shortly.

To start, please see the GitHub Copilot reviews, which are actually worth considering.

I will add a configurable yaml key for this soon.

This is a "must implement" for merging and corresponding documentation is required as well in https://github.com/LibreChat-AI/librechat.ai

@devilb2103
Copy link
Contributor Author

devilb2103 commented Nov 25, 2025

@danny-avila Thanks for the comments ✌️.

@usnavy13
Copy link
Contributor

@danny-avila Thanks for the comments ✌️. I plan to implement said changes after a couple of weeks

I can take a shot at it this week if you add me

@danny-avila danny-avila marked this pull request as draft November 25, 2025 19:51
@usnavy13
Copy link
Contributor

@devilb2103 @danny-avila
I implemented the suggested changes in #10676 since I couldn't commit to his branch.
I also added geminiAPI support and some other small improvements.

…n with SVG for Gemini Image Tools

- Updated the @google/genai dependency in package-lock.json and package.json to version 1.19.0.
- Enhanced the Gemini Image Tools description and changed the icon from PNG to SVG for better scalability and design.
- Refactored GeminiImageGen.js to support provider detection and model ID configuration for improved flexibility in image generation.
- Removed deprecated references and cleaned up the handleTools.js file by eliminating unnecessary parameters.
- Added a new utility function `createImageToolContext` to streamline the creation of context strings for image generation and editing tools.
- Refactored `handleTools.js` to utilize the new function, improving code readability and maintainability.
- Created a new file `imageContext.ts` to define the interface and implementation for the image tool context functionality.
- Updated the index file to export the new image context utilities.
…e related files

- Migrated `GeminiImageGen` tool implementationfrom CommonJS to TS.
- Updated the `index.js` `GeminiImageGen` references.
- Enhanced the README documentation for the Gemini Image Generation Tool, reflecting the new structure and features.
- Adjusted the `handleTools.js` and `tools.js` files to ensure proper integration of remaining tools and maintain functionality.
- Added constants and types for the Gemini tool to improve code organization and clarity.
- Added support for the `gemini_image_gen` tool in the message rendering logic.
- Updated progress text handling for `gemini_image_gen` to provide detailed status messages during image creation.
- Enhanced user feedback with localized messages reflecting the progress of image generation.
- Deleted the README-GeminiNanoBanana.md file as it is no longer relevant to the current implementation of the Gemini image generation tool.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

…ageGen

- Enhanced auto-detection logic to prefer Vertex AI only if the GOOGLE_SERVICE_KEY_FILE exists and is valid.
- Updated the credentials path to use the current working directory for the default auth.json file.
- Added error handling for malformed JSON in the Google service account credentials file, providing clearer error messages.
- Updated the method for retrieving the credentials file path to streamline the logic and remove unnecessary environment variable checks.
- Adjusted the default path for the auth.json file to reflect the current working directory more accurately.
@devilb2103 devilb2103 changed the title ✨ feat: Added the Flagship Gemini 2.5 Flash Image (🍌 Nano Banana) model for Image generation and editing ✨ feat: Added Support for Flagship Gemini Image (🍌 Nano Banana) models for Image generation and editing Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.