Skip to content

Fix duplicate Codex session usage in aggregate stats#2371

Open
xandersbell wants to merge 1 commit intofarion1231:mainfrom
xandersbell:fix/codex-session-stat-dedup
Open

Fix duplicate Codex session usage in aggregate stats#2371
xandersbell wants to merge 1 commit intofarion1231:mainfrom
xandersbell:fix/codex-session-stat-dedup

Conversation

@xandersbell
Copy link
Copy Markdown

@xandersbell xandersbell commented Apr 26, 2026

Summary

This PR applies a minimal, low-intrusion fix for duplicate Codex usage appearing in aggregate statistics.

I encountered an issue where the macOS app usage dashboard showed a pseudo provider named Codex (Session) alongside the actual manually imported Codex provider. In many cases, both rows had the same request/token/cost values, while Codex (Session) had no latency / first-token timing data.

After tracing the code, the likely cause is that the same Codex CLI request can be recorded from two sources:

  • the real proxy request log, under the actual provider id
  • the Codex session log import, under _codex_session

Because Codex does not currently have a shared stable request id between these two sources, both records can be counted in aggregate stats. Codex session usage is also synced periodically, so the matching window needs to tolerate some timestamp skew between proxy logging and session-log import.

What changed

To respect the existing interface and minimize the patch size, this PR does not change:

  • public Tauri command interfaces
  • frontend data types
  • database schema
  • session import behavior
  • raw request log visibility

Instead, it only adjusts aggregate usage queries to exclude a codex_session row when a matching real Codex proxy row exists within a small time window.

The duplicate check currently requires:

  • app_type = codex
  • session row provider id is _codex_session
  • matching real proxy row exists
  • same status code
  • same model
  • same input/output/cache token counts
  • timestamps within 90 seconds

Raw request logs still show both records for inspection.

Why this is intentionally limited

This is a patch-level fix, not a redesign of the usage data model.

A more fundamental fix would likely require the project owner to decide how Codex proxy logs and Codex session logs should share identity, deduplicate at import time, or model multiple usage sources consistently through rollups. That may involve schema changes or broader product decisions, so I kept this PR narrowly scoped.

Known limitation: historical data already rolled up into usage_daily_rollups cannot be precisely deduplicated by this patch because the rollup table does not retain enough source/detail fields.

Tests

Ran:

cargo fmt --manifest-path src-tauri/Cargo.toml --check
cargo test --manifest-path src-tauri/Cargo.toml services::usage_stats --lib

Result:

11 passed; 0 failed

Thanks for maintaining the project.

@xandersbell xandersbell force-pushed the fix/codex-session-stat-dedup branch from 7dc2010 to f4922ec Compare April 26, 2026 20:24
Copy link
Copy Markdown
Owner

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4922ecfec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +175 to +176
AND p.created_at BETWEEN {created_at} - {CODEX_SESSION_DEDUP_WINDOW_SECS}
AND {created_at} + {CODEX_SESSION_DEDUP_WINDOW_SECS}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Constrain dedup match to the caller's time range

The duplicate filter can remove in-range Codex session rows based on proxy rows that are outside the requested window, because the EXISTS clause only checks p.created_at against l.created_at ± 90s and never against the query’s start_date/end_date. In bounded queries (e.g., dashboard ranges or hourly/day buckets), this causes undercounting near range edges when a session row is inside the range but its matching proxy row is just outside it; the same helper is reused by summary, trends, provider, and model aggregates, so the skew propagates across all of them.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants