-
-
Notifications
You must be signed in to change notification settings - Fork 41
v6.5.32 #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v6.5.32 #2
Conversation
…o alternatives - Add AmpModelMapping config to route models like 'claude-opus-4.5' to 'claude-sonnet-4' - Add ModelMapper interface and DefaultModelMapper implementation with hot-reload support - Enhance FallbackHandler to apply model mappings before falling back to ampcode.com - Add structured logging for routing decisions (local provider, mapping, amp credits) - Update config.example.yaml with amp-model-mappings documentation
- Add model mapping feature to README.md Amp CLI section - Add detailed Model Mapping Configuration section to amp-cli-integration.md - Update architecture diagram to show model mapping flow - Update Model Fallback Behavior to include mapping step - Add Table of Contents entry for model mapping
- Store ampModule in Server struct to access it during config updates - Call ampModule.OnConfigUpdated() in UpdateClients() for hot reload - Watch config directory instead of file to handle atomic saves (vim, VSCode, etc.) - Improve config file event detection with basename matching - Add diagnostic logging for config reload tracing
feat(amp): add model mapping support for routing unavailable models to alternatives
feat: consolidate Vertex AI compatibility with API key support in Gemini
…ertex API key handling** - Removed `vertex-compat` executor and related configuration. - Consolidated Vertex compatibility checks into `vertex` handling with `apikey`-based model resolution. - Streamlined model generation logic for Vertex API key entries.
Summary of ChangesHello @luispater, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the CLIProxyAPI's integration capabilities and flexibility. The core enhancement is the introduction of model mapping for Amp CLI, which intelligently redirects requests for unavailable models to suitable local alternatives, thereby optimizing resource usage and cost. Complementing this, the proxy now supports Vertex AI-compatible API keys, expanding its compatibility with various third-party services. Minor but important improvements include a more robust configuration file watcher and detailed logging for Amp routing decisions, enhancing overall system stability and observability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces two significant features: model mapping for Amp CLI integration and support for Vertex-compatible API key providers. The changes are well-implemented, with good documentation for the model mapping feature and thorough unit tests for the new model mapper component. The structured logging for Amp routing decisions is a great addition for observability.
I have two suggestions for improving maintainability:
- Refactoring the
WrapHandlerinfallback_handlers.goto avoid using agotostatement. - Reducing code duplication in
gemini_vertex_executor.goby extracting common request preparation and response handling logic into helper functions.
Overall, this is a solid update that adds valuable new functionality.
| func (e *GeminiVertexExecutor) countTokensWithServiceAccount(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, projectID, location string, saJSON []byte) (cliproxyexecutor.Response, error) { | ||
| from := opts.SourceFormat | ||
| to := sdktranslator.FromString("gemini") | ||
| translatedReq := sdktranslator.TranslateRequest(from, to, req.Model, bytes.Clone(req.Payload), false) | ||
| if budgetOverride, includeOverride, ok := util.GeminiThinkingFromMetadata(req.Metadata); ok && util.ModelSupportsThinking(req.Model) { | ||
| if budgetOverride != nil { | ||
| norm := util.NormalizeThinkingBudget(req.Model, *budgetOverride) | ||
| budgetOverride = &norm | ||
| } | ||
| translatedReq = util.ApplyGeminiThinkingConfig(translatedReq, budgetOverride, includeOverride) | ||
| } | ||
| translatedReq = util.StripThinkingConfigIfUnsupported(req.Model, translatedReq) | ||
| translatedReq = fixGeminiImageAspectRatio(req.Model, translatedReq) | ||
| respCtx := context.WithValue(ctx, "alt", opts.Alt) | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "tools") | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "generationConfig") | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "safetySettings") | ||
|
|
||
| baseURL := vertexBaseURL(location) | ||
| url := fmt.Sprintf("%s/%s/projects/%s/locations/%s/publishers/google/models/%s:%s", baseURL, vertexAPIVersion, projectID, location, req.Model, "countTokens") | ||
|
|
||
| httpReq, errNewReq := http.NewRequestWithContext(respCtx, http.MethodPost, url, bytes.NewReader(translatedReq)) | ||
| if errNewReq != nil { | ||
| return cliproxyexecutor.Response{}, errNewReq | ||
| } | ||
| httpReq.Header.Set("Content-Type", "application/json") | ||
| if token, errTok := vertexAccessToken(ctx, e.cfg, auth, saJSON); errTok == nil && token != "" { | ||
| httpReq.Header.Set("Authorization", "Bearer "+token) | ||
| } else if errTok != nil { | ||
| log.Errorf("vertex executor: access token error: %v", errTok) | ||
| return cliproxyexecutor.Response{}, statusErr{code: 500, msg: "internal server error"} | ||
| } | ||
| applyGeminiHeaders(httpReq, auth) | ||
|
|
||
| var authID, authLabel, authType, authValue string | ||
| if auth != nil { | ||
| authID = auth.ID | ||
| authLabel = auth.Label | ||
| authType, authValue = auth.AccountInfo() | ||
| } | ||
| recordAPIRequest(ctx, e.cfg, upstreamRequestLog{ | ||
| URL: url, | ||
| Method: http.MethodPost, | ||
| Headers: httpReq.Header.Clone(), | ||
| Body: translatedReq, | ||
| Provider: e.Identifier(), | ||
| AuthID: authID, | ||
| AuthLabel: authLabel, | ||
| AuthType: authType, | ||
| AuthValue: authValue, | ||
| }) | ||
|
|
||
| httpClient := newProxyAwareHTTPClient(ctx, e.cfg, auth, 0) | ||
| httpResp, errDo := httpClient.Do(httpReq) | ||
| if errDo != nil { | ||
| recordAPIResponseError(ctx, e.cfg, errDo) | ||
| return cliproxyexecutor.Response{}, errDo | ||
| } | ||
| defer func() { | ||
| if errClose := httpResp.Body.Close(); errClose != nil { | ||
| log.Errorf("vertex executor: close response body error: %v", errClose) | ||
| } | ||
| }() | ||
| recordAPIResponseMetadata(ctx, e.cfg, httpResp.StatusCode, httpResp.Header.Clone()) | ||
| if httpResp.StatusCode < 200 || httpResp.StatusCode >= 300 { | ||
| b, _ := io.ReadAll(httpResp.Body) | ||
| appendAPIResponseChunk(ctx, e.cfg, b) | ||
| log.Debugf("request error, error status: %d, error body: %s", httpResp.StatusCode, summarizeErrorBody(httpResp.Header.Get("Content-Type"), b)) | ||
| return cliproxyexecutor.Response{}, statusErr{code: httpResp.StatusCode, msg: string(b)} | ||
| } | ||
| data, errRead := io.ReadAll(httpResp.Body) | ||
| if errRead != nil { | ||
| recordAPIResponseError(ctx, e.cfg, errRead) | ||
| return cliproxyexecutor.Response{}, errRead | ||
| } | ||
| appendAPIResponseChunk(ctx, e.cfg, data) | ||
| if httpResp.StatusCode < 200 || httpResp.StatusCode >= 300 { | ||
| log.Debugf("request error, error status: %d, error body: %s", httpResp.StatusCode, summarizeErrorBody(httpResp.Header.Get("Content-Type"), data)) | ||
| return cliproxyexecutor.Response{}, statusErr{code: httpResp.StatusCode, msg: string(data)} | ||
| } | ||
| count := gjson.GetBytes(data, "totalTokens").Int() | ||
| out := sdktranslator.TranslateTokenCount(ctx, to, from, count, data) | ||
| return cliproxyexecutor.Response{Payload: []byte(out)}, nil | ||
| } | ||
|
|
||
| // countTokensWithAPIKey handles token counting using API key credentials. | ||
| func (e *GeminiVertexExecutor) countTokensWithAPIKey(ctx context.Context, auth *cliproxyauth.Auth, req cliproxyexecutor.Request, opts cliproxyexecutor.Options, apiKey, baseURL string) (cliproxyexecutor.Response, error) { | ||
| from := opts.SourceFormat | ||
| to := sdktranslator.FromString("gemini") | ||
| translatedReq := sdktranslator.TranslateRequest(from, to, req.Model, bytes.Clone(req.Payload), false) | ||
| if budgetOverride, includeOverride, ok := util.GeminiThinkingFromMetadata(req.Metadata); ok && util.ModelSupportsThinking(req.Model) { | ||
| if budgetOverride != nil { | ||
| norm := util.NormalizeThinkingBudget(req.Model, *budgetOverride) | ||
| budgetOverride = &norm | ||
| } | ||
| translatedReq = util.ApplyGeminiThinkingConfig(translatedReq, budgetOverride, includeOverride) | ||
| } | ||
| translatedReq = util.StripThinkingConfigIfUnsupported(req.Model, translatedReq) | ||
| translatedReq = fixGeminiImageAspectRatio(req.Model, translatedReq) | ||
| respCtx := context.WithValue(ctx, "alt", opts.Alt) | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "tools") | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "generationConfig") | ||
| translatedReq, _ = sjson.DeleteBytes(translatedReq, "safetySettings") | ||
|
|
||
| // For API key auth, use simpler URL format without project/location | ||
| if baseURL == "" { | ||
| baseURL = "https://generativelanguage.googleapis.com" | ||
| } | ||
| url := fmt.Sprintf("%s/%s/publishers/google/models/%s:%s", baseURL, vertexAPIVersion, req.Model, "countTokens") | ||
|
|
||
| httpReq, errNewReq := http.NewRequestWithContext(respCtx, http.MethodPost, url, bytes.NewReader(translatedReq)) | ||
| if errNewReq != nil { | ||
| return cliproxyexecutor.Response{}, errNewReq | ||
| } | ||
| httpReq.Header.Set("Content-Type", "application/json") | ||
| if apiKey != "" { | ||
| httpReq.Header.Set("x-goog-api-key", apiKey) | ||
| } | ||
| applyGeminiHeaders(httpReq, auth) | ||
|
|
||
| var authID, authLabel, authType, authValue string | ||
| if auth != nil { | ||
| authID = auth.ID | ||
| authLabel = auth.Label | ||
| authType, authValue = auth.AccountInfo() | ||
| } | ||
| recordAPIRequest(ctx, e.cfg, upstreamRequestLog{ | ||
| URL: url, | ||
| Method: http.MethodPost, | ||
| Headers: httpReq.Header.Clone(), | ||
| Body: translatedReq, | ||
| Provider: e.Identifier(), | ||
| AuthID: authID, | ||
| AuthLabel: authLabel, | ||
| AuthType: authType, | ||
| AuthValue: authValue, | ||
| }) | ||
|
|
||
| httpClient := newProxyAwareHTTPClient(ctx, e.cfg, auth, 0) | ||
| httpResp, errDo := httpClient.Do(httpReq) | ||
| if errDo != nil { | ||
| recordAPIResponseError(ctx, e.cfg, errDo) | ||
| return cliproxyexecutor.Response{}, errDo | ||
| } | ||
| defer func() { | ||
| if errClose := httpResp.Body.Close(); errClose != nil { | ||
| log.Errorf("vertex executor: close response body error: %v", errClose) | ||
| } | ||
| }() | ||
| recordAPIResponseMetadata(ctx, e.cfg, httpResp.StatusCode, httpResp.Header.Clone()) | ||
| if httpResp.StatusCode < 200 || httpResp.StatusCode >= 300 { | ||
| b, _ := io.ReadAll(httpResp.Body) | ||
| appendAPIResponseChunk(ctx, e.cfg, b) | ||
| log.Debugf("request error, error status: %d, error body: %s", httpResp.StatusCode, summarizeErrorBody(httpResp.Header.Get("Content-Type"), b)) | ||
| return cliproxyexecutor.Response{}, statusErr{code: httpResp.StatusCode, msg: string(b)} | ||
| } | ||
| data, errRead := io.ReadAll(httpResp.Body) | ||
| if errRead != nil { | ||
| recordAPIResponseError(ctx, e.cfg, errRead) | ||
| return cliproxyexecutor.Response{}, errRead | ||
| } | ||
| appendAPIResponseChunk(ctx, e.cfg, data) | ||
| if httpResp.StatusCode < 200 || httpResp.StatusCode >= 300 { | ||
| log.Debugf("request error, error status: %d, error body: %s", httpResp.StatusCode, summarizeErrorBody(httpResp.Header.Get("Content-Type"), data)) | ||
| return cliproxyexecutor.Response{}, statusErr{code: httpResp.StatusCode, msg: string(data)} | ||
| } | ||
| count := gjson.GetBytes(data, "totalTokens").Int() | ||
| out := sdktranslator.TranslateTokenCount(ctx, to, from, count, data) | ||
| return cliproxyexecutor.Response{Payload: []byte(out)}, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions countTokensWithServiceAccount and countTokensWithAPIKey share a significant amount of duplicated code for request preparation and response handling. This pattern of duplication is also present in the execute... and executeStream... function pairs. To improve maintainability and reduce redundancy, consider refactoring the common logic into helper functions. For example, a helper could handle request translation and modification, while another could handle the HTTP response.
Simplifies routing logic by delegating all provider/mapping/proxy decisions to FallbackHandler. Previously, the route checked for provider/mapping availability before calling the handler, then FallbackHandler performed the same checks again. Changes: - Remove model extraction and provider checking from route (lines 182-201) - Route now only checks if request is POST with /models/ path - FallbackHandler handles provider -> mapping -> proxy fallback - Remove unused internal/util import Benefits: - Eliminates duplicate checks (addresses PR review feedback #2) - Centralizes all provider/mapping logic in FallbackHandler - Reduces routing code by ~20 lines - Aligns with how other /api/provider routes work Performance: No impact (checks still happen once in FallbackHandler)
No description provided.