-
Notifications
You must be signed in to change notification settings - Fork 185
refactor: use gemini endpoint for fine tuned models in vertex --skip-pipeline #1144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use gemini endpoint for fine tuned models in vertex --skip-pipeline #1144
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds a Gemini endpoint URL builder and routes Gemini and all-digit deployments (AllDigitsASCII) through it across ChatCompletion, ChatCompletionStream, Responses, and ResponsesStream; switches those paths to use generateContent/streamGenerateContent URL patterns and adds project-number validation for digit-only deployments. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🧪 Test Suite AvailableThis PR can be tested by a repository admin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/vertex/vertex.go (1)
1040-1050: DuplicateModelDeploymentassignment.Lines 1040-1042 and 1048-1050 perform the same assignment. One of these can be removed.
🔎 Proposed fix
response.ExtraFields.Latency = latency.Milliseconds() - if request.Model != deployment { - response.ExtraFields.ModelDeployment = deployment - } - // Set raw response if enabled if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { response.ExtraFields.RawResponse = rawResponse } if request.Model != deployment { response.ExtraFields.ModelDeployment = deployment }
🧹 Nitpick comments (2)
core/providers/vertex/utils.go (1)
88-110: Well-designed helper function for URL construction.The function correctly:
- Distinguishes custom/fine-tuned models (all-digits) from standard Gemini models
- Uses appropriate API versions (v1beta1 for custom, v1 for Gemini)
- Handles both global and regional endpoints
- Selects proper method suffix for streaming vs non-streaming
One observation: this helper is used in
ChatCompletionStream,Responses, andResponsesStream, but the non-streamingChatCompletionmethod constructs URLs inline (lines 385-388 in vertex.go). Consider refactoringChatCompletionto use this helper for consistency.core/providers/vertex/vertex.go (1)
375-418: Inconsistent URL construction — consider using the helper function.The
ChatCompletionmethod constructs URLs inline for fine-tuned models (lines 385-388) and Gemini models (lines 409-412), whileChatCompletionStream,Responses, andResponsesStreamusegetCompleteURLForGeminiEndpoint. This creates potential for drift and makes maintenance harder.🔎 Proposed refactor to use the helper
if schemas.IsAllDigitsASCII(deployment) { - // Custom Fine-tuned models use OpenAPI endpoint projectNumber := key.VertexKeyConfig.ProjectNumber if projectNumber == "" { return nil, providerUtils.NewConfigurationError("project number is not set for fine-tuned models", providerName) } if key.Value != "" { authQuery = fmt.Sprintf("key=%s", url.QueryEscape(key.Value)) } - if region == "global" { - completeURL = fmt.Sprintf("https://aiplatform.googleapis.com/v1beta1/projects/%s/locations/global/endpoints/%s:generateContent", projectNumber, deployment) - } else { - completeURL = fmt.Sprintf("https://%s-aiplatform.googleapis.com/v1beta1/projects/%s/locations/%s/endpoints/%s:generateContent", region, projectNumber, region, deployment) - } + completeURL = getCompleteURLForGeminiEndpoint(deployment, region, projectID, projectNumber, false) } else if schemas.IsAnthropicModel(deployment) { // ... unchanged ... } else if schemas.IsGeminiModel(deployment) { - // Gemini models support api key if key.Value != "" { authQuery = fmt.Sprintf("key=%s", url.QueryEscape(key.Value)) } - if region == "global" { - completeURL = fmt.Sprintf("https://aiplatform.googleapis.com/v1/projects/%s/locations/global/publishers/google/models/%s:generateContent", projectID, deployment) - } else { - completeURL = fmt.Sprintf("https://%s-aiplatform.googleapis.com/v1/projects/%s/locations/%s/publishers/google/models/%s:generateContent", region, projectID, region, deployment) - } + completeURL = getCompleteURLForGeminiEndpoint(deployment, region, projectID, "", false) } else {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/providers/vertex/utils.go(1 hunks)core/providers/vertex/vertex.go(10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/vertex/utils.gocore/providers/vertex/vertex.go
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/vertex/utils.gocore/providers/vertex/vertex.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/vertex/utils.gocore/providers/vertex/vertex.go
🧬 Code graph analysis (2)
core/providers/vertex/utils.go (1)
core/schemas/utils.go (1)
IsAllDigitsASCII(39-50)
core/providers/vertex/vertex.go (3)
core/schemas/utils.go (3)
IsGeminiModel(1057-1059)IsAllDigitsASCII(39-50)IsMistralModel(1053-1055)core/schemas/account.go (1)
VertexKeyConfig(32-38)core/providers/utils/utils.go (1)
NewConfigurationError(762-772)
🔇 Additional comments (6)
core/providers/vertex/vertex.go (6)
314-329: LGTM!Correctly extends the Gemini model handling to include all-digits (fine-tuned) deployments, ensuring they use the appropriate Gemini request format.
505-530: LGTM!Correctly extends Gemini response parsing to include fine-tuned model responses.
678-754: LGTM!The streaming path correctly:
- Extends Gemini handling to fine-tuned models
- Validates
projectNumberis set for fine-tuned models- Uses
getCompleteURLForGeminiEndpointhelper for consistent URL construction- Properly handles auth via query parameter or OAuth2 token
755-778: LGTM!The else branch now correctly handles only Mistral and other OpenAPI-compatible models, since fine-tuned models are routed through the Gemini streaming path above.
937-977: LGTM!The
Responsesmethod correctly:
- Routes fine-tuned models through Gemini handling
- Validates
projectNumberfor fine-tuned models- Uses the centralized URL helper
1160-1253: LGTM!The
ResponsesStreammethod correctly:
- Routes fine-tuned models through Gemini streaming handling
- Validates
projectNumberfor fine-tuned models- Uses the centralized URL helper
- Handles auth consistently with other streaming methods
05d0e57 to
7b68adf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/providers/vertex/utils.gocore/providers/vertex/vertex.go
🚧 Files skipped from review as they are similar to previous changes (1)
- core/providers/vertex/vertex.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/vertex/utils.go
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/vertex/utils.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/vertex/utils.go
🧬 Code graph analysis (1)
core/providers/vertex/utils.go (1)
core/schemas/utils.go (1)
IsAllDigitsASCII(39-50)
🔇 Additional comments (2)
core/providers/vertex/utils.go (2)
94-100: LGTM! Custom model URL construction follows Vertex AI conventions.The URL construction logic correctly:
- Uses
projectNumberfor custom/fine-tuned models (numeric deployment IDs)- Uses the v1beta1 API version appropriate for custom endpoints
- Handles global vs regional endpoint patterns correctly
- Includes the appropriate method suffix for generateContent operations
101-108: LGTM! Gemini model URL construction follows Google's API conventions.The URL construction correctly:
- Uses
projectIDfor standard Gemini models- Uses the stable v1 API version
- Includes the proper
/publishers/google/models/path segment- Handles global and regional endpoint patterns correctly
The different API versions (v1 for Gemini vs v1beta1 for custom models) align with Google's Vertex AI API structure.
7b68adf to
2cf99d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/providers/vertex/vertex.go (1)
1040-1050: DuplicateModelDeploymentassignment.The
ModelDeploymentfield is set at lines 1040-1041 and again at lines 1048-1050. Remove the duplicate.🔎 Proposed fix
if request.Model != deployment { response.ExtraFields.ModelDeployment = deployment } // Set raw response if enabled if providerUtils.ShouldSendBackRawResponse(ctx, provider.sendBackRawResponse) { response.ExtraFields.RawResponse = rawResponse } - if request.Model != deployment { - response.ExtraFields.ModelDeployment = deployment - } if providerUtils.ShouldSendBackRawRequest(ctx, provider.sendBackRawRequest) { response.ExtraFields.RawRequest = rawRequest }
🧹 Nitpick comments (1)
core/providers/vertex/vertex.go (1)
375-418: Consider usinggetCompleteURLForGeminiEndpointhelper for consistency.The
ChatCompletionmethod constructs URLs inline for both fine-tuned (AllDigitsASCII) and Gemini models, whileChatCompletionStream,Responses, andResponsesStreamuse the centralizedgetCompleteURLForGeminiEndpointhelper. This inconsistency may lead to maintenance burden if URL patterns change.🔎 Suggested refactor for AllDigitsASCII and Gemini branches
- if schemas.IsAllDigitsASCII(deployment) { - // Custom Fine-tuned models use OpenAPI endpoint - projectNumber := key.VertexKeyConfig.ProjectNumber - if projectNumber == "" { - return nil, providerUtils.NewConfigurationError("project number is not set for fine-tuned models", providerName) - } - if key.Value != "" { - authQuery = fmt.Sprintf("key=%s", url.QueryEscape(key.Value)) - } - if region == "global" { - completeURL = fmt.Sprintf("https://aiplatform.googleapis.com/v1beta1/projects/%s/locations/global/endpoints/%s:generateContent", projectNumber, deployment) - } else { - completeURL = fmt.Sprintf("https://%s-aiplatform.googleapis.com/v1beta1/projects/%s/locations/%s/endpoints/%s:generateContent", region, projectNumber, region, deployment) - } - } else if schemas.IsAnthropicModel(deployment) { + if schemas.IsAllDigitsASCII(deployment) || schemas.IsGeminiModel(deployment) { + // Fine-tuned models require project number validation + projectNumber := key.VertexKeyConfig.ProjectNumber + if schemas.IsAllDigitsASCII(deployment) && projectNumber == "" { + return nil, providerUtils.NewConfigurationError("project number is not set for fine-tuned models", providerName) + } + // Gemini and fine-tuned models support API key auth + if key.Value != "" { + authQuery = fmt.Sprintf("key=%s", url.QueryEscape(key.Value)) + } + completeURL = getCompleteURLForGeminiEndpoint(deployment, region, projectID, projectNumber, false) + } else if schemas.IsAnthropicModel(deployment) { // ... existing Anthropic handling - } else if schemas.IsGeminiModel(deployment) { - // Gemini models support api key - if key.Value != "" { - authQuery = fmt.Sprintf("key=%s", url.QueryEscape(key.Value)) - } - if region == "global" { - completeURL = fmt.Sprintf("https://aiplatform.googleapis.com/v1/projects/%s/locations/global/publishers/google/models/%s:generateContent", projectID, deployment) - } else { - completeURL = fmt.Sprintf("https://%s-aiplatform.googleapis.com/v1/projects/%s/locations/%s/publishers/google/models/%s:generateContent", region, projectID, region, deployment) - } - } else {
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
core/providers/vertex/utils.gocore/providers/vertex/vertex.go
🚧 Files skipped from review as they are similar to previous changes (1)
- core/providers/vertex/utils.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)
Files:
core/providers/vertex/vertex.go
🧠 Learnings (2)
📚 Learning: 2025-12-09T17:07:42.007Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/schemas/account.go:9-18
Timestamp: 2025-12-09T17:07:42.007Z
Learning: In core/schemas/account.go, the HuggingFaceKeyConfig field within the Key struct is currently unused and reserved for future Hugging Face inference endpoint deployments. Do not flag this field as missing from OpenAPI documentation or require its presence in the API spec until the feature is actively implemented and used. When the feature is added, update the OpenAPI docs accordingly; otherwise, treat this field as non-breaking and not part of the current API surface.
Applied to files:
core/providers/vertex/vertex.go
📚 Learning: 2025-12-19T09:26:54.961Z
Learnt from: qwerty-dvorak
Repo: maximhq/bifrost PR: 1006
File: core/providers/utils/utils.go:1050-1051
Timestamp: 2025-12-19T09:26:54.961Z
Learning: Update streaming end-marker handling so HuggingFace is treated as a non-[DONE] provider for backends that do not emit a DONE marker (e.g., meta llama on novita). In core/providers/utils/utils.go, adjust ProviderSendsDoneMarker() (or related logic) to detect providers that may not emit DONE and avoid relying on DONE as the sole end signal. Add tests to cover both DONE-emitting and non-DONE backends, with clear documentation in code comments explaining the rationale and any fallback behavior.
Applied to files:
core/providers/vertex/vertex.go
🧬 Code graph analysis (1)
core/providers/vertex/vertex.go (2)
core/schemas/utils.go (3)
IsGeminiModel(1057-1059)IsAllDigitsASCII(39-50)IsMistralModel(1053-1055)core/schemas/account.go (1)
VertexKeyConfig(32-38)
🔇 Additional comments (6)
core/providers/vertex/vertex.go (6)
314-329: LGTM!The condition correctly routes fine-tuned models (all-digit deployment IDs) through the Gemini request formatting path, which aligns with the PR objective.
505-530: LGTM!The response handling correctly parses fine-tuned model responses using the Gemini response format, consistent with the request path.
704-711: LGTM!The project number validation and centralized URL construction via
getCompleteURLForGeminiEndpointare correctly implemented. The conditional validation ensuresprojectNumberis only required for fine-tuned models.
755-778: LGTM!The else branch correctly handles Mistral and other OpenAI-compatible models after moving fine-tuned model handling to the Gemini branch.
971-977: LGTM!Project number validation and URL construction are correctly implemented for the Responses method, consistent with the streaming methods.
1196-1203: LGTM!The ResponsesStream implementation correctly validates project number for fine-tuned models and uses the centralized URL helper, maintaining consistency across all streaming methods.
Merge activity
|
2cf99d2 to
a70af60
Compare

Summary
Add support for fine-tuned Gemini models in Vertex AI by enabling custom models with numeric IDs to use the Gemini API endpoints.
Changes
getCompleteURLForGeminiEndpointto centralize URL construction logic for both standard Gemini models and custom/fine-tuned models:generateContentand:streamGenerateContentmethods instead of/chat/completionsType of change
Affected areas
How to test
Breaking changes
Related issues
Enables support for fine-tuned Gemini models in Vertex AI
Security considerations
No new security implications. The code continues to use the existing authentication mechanisms.
Checklist