Skip to content

feat(usage): integrate LiteLLM pricing catalog and tiered token pricing for proxy usage#4470

Open
allenxu09 wants to merge 3 commits into
farion1231:mainfrom
allenxu09:main
Open

feat(usage): integrate LiteLLM pricing catalog and tiered token pricing for proxy usage#4470
allenxu09 wants to merge 3 commits into
farion1231:mainfrom
allenxu09:main

Conversation

@allenxu09

Copy link
Copy Markdown
Contributor

Summary

Proxy usage pricing now falls back to LiteLLM's model pricing catalog when no custom pricing is configured for a provider. The catalog is embedded at build time and refreshed at runtime, so unknown models no longer result in missing cost data. Additionally, cost calculation now supports above-200k token tiered pricing, aligning with upstream API pricing models.

Changes

  • New: src-tauri/src/proxy/usage/pricing.rs — LiteLLM-backed pricing catalog with embedded JSON (litellm-pricing.json) and runtime refresh from GitHub
  • New: src-tauri/src/proxy/usage/litellm-pricing.json — Compressed LiteLLM model pricing embedded at build time
  • Modified: src-tauri/src/proxy/usage/calculator.rs — Added tiered pricing (above 200k tokens) for input, output, cache read, and cache creation costs
  • Modified: src-tauri/src/proxy/usage/logger.rs — Falls back to LiteLLM pricing lookup when no custom pricing is found per provider
  • Modified: src-tauri/src/proxy/usage/mod.rs — Registered pricing module
  • Modified: src-tauri/src/proxy/providers/transform_codex_chat.rs — Rust 1.95 formatting alignment
  • Modified: src/components/usage/PricingConfigPanel.tsx — Updated pricing display to reflect tiered cost structure
  • Modified: src/i18n/locales/*.json (en, zh, zh-TW, ja) — Updated pricing-related UI strings

How It Works

  1. Embedded fallback: A compressed LiteLLM pricing catalog is embedded in the binary via include_str!
  2. Runtime refresh: On first pricing lookup, a background task fetches the latest catalog from GitHub (at most once per process lifetime)
  3. Model matching: Normalizes model names (strips /prefix, :suffix, replaces ./@/_-, lowercases) and performs prefix-based fuzzy matching
  4. Tiered pricing: Costs above 200k tokens use *_above_200k_tokens rates when available

@allenxu09

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b09adb695

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +128 to +129
if let Some(above) = above_200k.filter(|_| tokens > THRESHOLD) {
Decimal::from(THRESHOLD) * base + Decimal::from(tokens - THRESHOLD) * above

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply long-context rates to the whole request

For models such as Gemini 2.5 Pro, the LiteLLM *_above_200k_tokens fields represent the rate to use when the prompt/context exceeds 200k tokens, not a progressive tier within each token bucket. With a 250k-token Gemini prompt and a small output, this charges only the last 50k input tokens at the high input rate and leaves output at the base rate because output_tokens <= 200k, so large-context requests are materially underreported; the calculation needs to select the above-200k input/cache/output rates based on the request context size instead of tiering each component independently.

Useful? React with 👍 / 👎.

Comment on lines +135 to +136
let model = model.rsplit('/').next().unwrap_or(model);
let model = model.split(':').next().unwrap_or(model);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve provider and model qualifiers in pricing keys

These truncations collapse distinct LiteLLM catalog entries before both insertion and lookup. For example, the embedded catalog contains anthropic.claude-haiku-4-5-20251001-v1:0 and bedrock/us-gov-west-1/anthropic.claude-haiku-4-5-20251001-v1:0 at different prices, but both normalize to anthropic-claude-haiku-4-5-20251001-v1, so one overwrites the other and standard traffic can be priced with gov-region rates; OpenAI fine-tune IDs like ft:gpt-4o-mini-... also normalize to just ft. Keep enough of the provider/suffix structure to avoid cross-provider and fine-tune collisions.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant