Fix duplicate Codex session usage in aggregate stats#2371
Fix duplicate Codex session usage in aggregate stats#2371xandersbell wants to merge 1 commit intofarion1231:mainfrom
Conversation
7dc2010 to
f4922ec
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f4922ecfec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| AND p.created_at BETWEEN {created_at} - {CODEX_SESSION_DEDUP_WINDOW_SECS} | ||
| AND {created_at} + {CODEX_SESSION_DEDUP_WINDOW_SECS} |
There was a problem hiding this comment.
Constrain dedup match to the caller's time range
The duplicate filter can remove in-range Codex session rows based on proxy rows that are outside the requested window, because the EXISTS clause only checks p.created_at against l.created_at ± 90s and never against the query’s start_date/end_date. In bounded queries (e.g., dashboard ranges or hourly/day buckets), this causes undercounting near range edges when a session row is inside the range but its matching proxy row is just outside it; the same helper is reused by summary, trends, provider, and model aggregates, so the skew propagates across all of them.
Useful? React with 👍 / 👎.
Summary
This PR applies a minimal, low-intrusion fix for duplicate Codex usage appearing in aggregate statistics.
I encountered an issue where the macOS app usage dashboard showed a pseudo provider named
Codex (Session)alongside the actual manually imported Codex provider. In many cases, both rows had the same request/token/cost values, whileCodex (Session)had no latency / first-token timing data.After tracing the code, the likely cause is that the same Codex CLI request can be recorded from two sources:
_codex_sessionBecause Codex does not currently have a shared stable request id between these two sources, both records can be counted in aggregate stats. Codex session usage is also synced periodically, so the matching window needs to tolerate some timestamp skew between proxy logging and session-log import.
What changed
To respect the existing interface and minimize the patch size, this PR does not change:
Instead, it only adjusts aggregate usage queries to exclude a
codex_sessionrow when a matching real Codex proxy row exists within a small time window.The duplicate check currently requires:
app_type = codex_codex_sessionRaw request logs still show both records for inspection.
Why this is intentionally limited
This is a patch-level fix, not a redesign of the usage data model.
A more fundamental fix would likely require the project owner to decide how Codex proxy logs and Codex session logs should share identity, deduplicate at import time, or model multiple usage sources consistently through rollups. That may involve schema changes or broader product decisions, so I kept this PR narrowly scoped.
Known limitation: historical data already rolled up into
usage_daily_rollupscannot be precisely deduplicated by this patch because the rollup table does not retain enough source/detail fields.Tests
Ran:
cargo fmt --manifest-path src-tauri/Cargo.toml --check cargo test --manifest-path src-tauri/Cargo.toml services::usage_stats --libResult:
Thanks for maintaining the project.