Token usage/cost estimates undershoot sometimes

Observed on interrupted stream in Anthropic models: usage comes back strangely low (like 26 input tokens) on a very long chat. Perhaps it's because the cache is getting thrashed and we're somehow not seeing those cache eviction tokens?

Observed on GPT-5-Pro: I believe we're estimating reasoning by counting tokens in the reasoning trace. This might be accurate for Anthropic models where reasoning is highly detailed, but it falls flat on its face on GPT-5-Pro and the costs are an order of magnitude off. We should see if the API provides the number of reasoning tokens by some other means.

One idea is to compute as  `reasoning_tokens = total_output_tokens - parsed_text_tokens`. This might be provider-agnostic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token usage/cost estimates undershoot sometimes #277

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Token usage/cost estimates undershoot sometimes #277

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions