feat(usage): integrate LiteLLM pricing catalog and tiered token pricing for proxy usage#4470
feat(usage): integrate LiteLLM pricing catalog and tiered token pricing for proxy usage#4470allenxu09 wants to merge 3 commits into
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b09adb695
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let Some(above) = above_200k.filter(|_| tokens > THRESHOLD) { | ||
| Decimal::from(THRESHOLD) * base + Decimal::from(tokens - THRESHOLD) * above |
There was a problem hiding this comment.
Apply long-context rates to the whole request
For models such as Gemini 2.5 Pro, the LiteLLM *_above_200k_tokens fields represent the rate to use when the prompt/context exceeds 200k tokens, not a progressive tier within each token bucket. With a 250k-token Gemini prompt and a small output, this charges only the last 50k input tokens at the high input rate and leaves output at the base rate because output_tokens <= 200k, so large-context requests are materially underreported; the calculation needs to select the above-200k input/cache/output rates based on the request context size instead of tiering each component independently.
Useful? React with 👍 / 👎.
| let model = model.rsplit('/').next().unwrap_or(model); | ||
| let model = model.split(':').next().unwrap_or(model); |
There was a problem hiding this comment.
Preserve provider and model qualifiers in pricing keys
These truncations collapse distinct LiteLLM catalog entries before both insertion and lookup. For example, the embedded catalog contains anthropic.claude-haiku-4-5-20251001-v1:0 and bedrock/us-gov-west-1/anthropic.claude-haiku-4-5-20251001-v1:0 at different prices, but both normalize to anthropic-claude-haiku-4-5-20251001-v1, so one overwrites the other and standard traffic can be priced with gov-region rates; OpenAI fine-tune IDs like ft:gpt-4o-mini-... also normalize to just ft. Keep enough of the provider/suffix structure to avoid cross-provider and fine-tune collisions.
Useful? React with 👍 / 👎.
Summary
Proxy usage pricing now falls back to LiteLLM's model pricing catalog when no custom pricing is configured for a provider. The catalog is embedded at build time and refreshed at runtime, so unknown models no longer result in missing cost data. Additionally, cost calculation now supports above-200k token tiered pricing, aligning with upstream API pricing models.
Changes
src-tauri/src/proxy/usage/pricing.rs— LiteLLM-backed pricing catalog with embedded JSON (litellm-pricing.json) and runtime refresh from GitHubsrc-tauri/src/proxy/usage/litellm-pricing.json— Compressed LiteLLM model pricing embedded at build timesrc-tauri/src/proxy/usage/calculator.rs— Added tiered pricing (above 200k tokens) for input, output, cache read, and cache creation costssrc-tauri/src/proxy/usage/logger.rs— Falls back to LiteLLM pricing lookup when no custom pricing is found per providersrc-tauri/src/proxy/usage/mod.rs— Registeredpricingmodulesrc-tauri/src/proxy/providers/transform_codex_chat.rs— Rust 1.95 formatting alignmentsrc/components/usage/PricingConfigPanel.tsx— Updated pricing display to reflect tiered cost structuresrc/i18n/locales/*.json(en, zh, zh-TW, ja) — Updated pricing-related UI stringsHow It Works
include_str!/prefix,:suffix, replaces./@/_→-, lowercases) and performs prefix-based fuzzy matching*_above_200k_tokensrates when available