Skip to content

feat: Token Analytics Pipeline (#55, #56, #65)#93

Merged
dean0x merged 18 commits intomainfrom
wave/5
Mar 25, 2026
Merged

feat: Token Analytics Pipeline (#55, #56, #65)#93
dean0x merged 18 commits intomainfrom
wave/5

Conversation

@dean0x
Copy link
Copy Markdown
Owner

@dean0x dean0x commented Mar 24, 2026

Summary

  • Add SQLite-backed token analytics persistence layer with WAL mode, auto-pruning, and background recording
  • Wire --show-stats flag through all subcommands (test, build, git) for unified token stats
  • Create skim stats subcommand with terminal dashboard and JSON output
  • Add --cost flag with economics estimates (default: Claude Sonnet $3/MTok)

Changes

Analytics Module (analytics/mod.rs, analytics/schema.rs)

  • SQLite database with versioned migrations (PRAGMA user_version)
  • WAL journal mode with 5-second busy timeout for concurrent access
  • Token savings recording: per-file (with counts) and per-command (fire-and-forget with background token counting)
  • Query functions: summary, daily, by-command, by-language, by-mode, tier distribution
  • Auto-prune records older than 90 days (checked daily)
  • SKIM_DISABLE_ANALYTICS env var to opt out
  • SKIM_ANALYTICS_DB env var to override database path
  • 13 unit tests covering all query paths

--show-stats for subcommands

  • Token stats now available for: skim test cargo, skim test go, skim test vitest, skim test pytest, skim build cargo, skim build clippy, skim build tsc, skim git status, skim git diff, skim git log
  • --show-stats is stripped from args before forwarding to underlying tools
  • Token counting is now always-on in process_file/process_stdin (no longer conditional on --show-stats)

skim stats subcommand

  • Terminal dashboard: summary, efficiency meter (Unicode bars), breakdowns by command/language/mode, parse tier distribution
  • --format json: machine-readable output with all breakdowns
  • --since <DURATION>: time filter (7d, 24h, 4w)
  • --cost: economics section with configurable pricing model
  • --clear: delete all analytics data
  • Uses colored crate (respects NO_COLOR)

Infrastructure

  • clear_cache now skips analytics.db (only removes .json cache files)
  • --disable-analytics CLI flag for per-invocation opt-out
  • --format registered as value-consuming flag to prevent subcommand mis-routing
  • rusqlite (bundled) and colored added as workspace dependencies

Test plan

  • 13 analytics unit tests pass (DB operations, queries, pruning, pricing)
  • All existing tests pass (33 test suites, 500+ unit + integration tests)
  • cargo clippy -- -D warnings clean
  • Smoke tested: skim stats, skim stats --help, skim stats --format json, skim stats --cost
  • Manual: Run skim src/main.rs then skim stats to verify recording works end-to-end
  • Manual: Verify SKIM_DISABLE_ANALYTICS=1 skim file.rs does not record

Dean Sharon and others added 9 commits March 24, 2026 13:32
- Add rusqlite (bundled) and colored workspace dependencies
- Create analytics module with SQLite-backed token savings storage
- Schema with migrations, WAL mode, and 5-second busy timeout
- Query functions: summary, daily, by-command, by-language, by-mode, tier distribution
- Fire-and-forget recording with background thread (non-blocking)
- PricingModel with env var override (SKIM_INPUT_COST_PER_MTOK)
- Auto-prune records older than 90 days (daily check)
- SKIM_DISABLE_ANALYTICS env var to opt out
- Fix clear_cache to skip analytics.db (only removes .json cache files)
- 13 unit tests covering all query paths, pruning, and pricing

Co-Authored-By: Claude <noreply@anthropic.com>
- Make count_token_pair pub(crate) for reuse across modules
- Add show_stats parameter to run_parsed_command_with_mode
- Strip --show-stats from args before forwarding to underlying tools
- Wire token stats through all test runners: cargo, go, vitest, pytest
- Wire token stats through build runners: cargo, clippy, tsc
- Wire token stats through git subcommands: status, diff, log
- Always count tokens in process_file/process_stdin (removes conditional)
- Token counts now cached for analytics pipeline in later steps

Co-Authored-By: Claude <noreply@anthropic.com>
- Create stats subcommand with terminal dashboard and JSON output
- Wire analytics recording into single-file, stdin, and multi-file paths
- Wire analytics into run_parsed_command_with_mode for test runners
- Add --disable-analytics CLI flag to opt out per-invocation
- Add --format as value-consuming flag to prevent subcommand mis-routing
- Register stats in KNOWN_SUBCOMMANDS and dispatch
- Dashboard shows: summary, efficiency meter, by-command, by-language,
  by-mode, parse tier distribution
- JSON mode includes daily breakdown
- Support --since filter using existing parse_duration_ago
- Support --clear to delete all analytics data

Co-Authored-By: Claude <noreply@anthropic.com>
- Cost estimates use PricingModel with claude-sonnet-4-6 default ($3/MTok)
- SKIM_INPUT_COST_PER_MTOK env var overrides pricing for custom models
- Terminal dashboard: model name, input cost, estimated savings in green
- JSON output: cost_estimate section with model, rate, savings, tokens
- Document environment variables in stats --help output:
  SKIM_INPUT_COST_PER_MTOK, SKIM_ANALYTICS_DB, SKIM_DISABLE_ANALYTICS

Co-Authored-By: Claude <noreply@anthropic.com>
- Add missing analytics recording to go, vitest, pytest, build, and git
  subcommands so `skim stats` shows complete data across all commands
- Replace .unwrap() with .unwrap_or_default() in prune_older_than to
  prevent panic if system clock is before epoch
- Deduplicate user_has_flag across modules by importing from crate::cmd
- Extract savings_percentage, now_unix_secs, persist_record helpers to
  reduce duplication in analytics recording functions
- Simplify dispatch match to remove dead unimplemented-subcommand code
- Use idiomatic Option<&str> instead of &Option<String> in stats dashboard
- Simplify tsc run() by removing unnecessary intermediate vector
- Add command_type parameter to run_parsed_command_with_mode instead of
  hardcoding CommandType::Test
- Add disable_analytics field to MultiFileOptions so the --disable-analytics
  flag is respected in multi-file (glob/directory) paths
Add DevFlow-generated file that protects against committing
sensitive files and context pollution. Ensures consistent
filtering across Claude Code sessions.

Co-Authored-By: Claude <noreply@anthropic.com>
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Mar 24, 2026

Wave 5 Review Summary: Token Analytics & Stats Dashboard

🔴 Blocking Issues

1. Unconditional Token Counting Performance Regression (CRITICAL)

  • Files: crates/rskim/src/process.rs:113,169,182,265,295
  • Confidence: 95%
  • Issue: Token counting via tiktoken (BPE encoding) runs unconditionally on every file operation, even when --show-stats is not requested and analytics is disabled. This adds 5-15ms overhead per file operation, risking violation of the <50ms performance target.
  • Fix: Gate counting on whether it will be consumed:
    let (orig_tokens, trans_tokens) = if options.show_stats || crate::analytics::is_analytics_enabled() {
        count_token_pair(&contents, &final_output)
    } else {
        (None, None)
    };

2. Duplicated Analytics Recording Pattern (6 call sites)

  • Files: cmd/build/mod.rs:167-187, cmd/git.rs:245-260, cmd/test/go.rs:97-112, cmd/test/pytest.rs:82-97, cmd/test/vitest.rs:69-84, cmd/mod.rs:196-211
  • Confidence: 90%
  • Issue: Copy-pasted analytics recording block across 6 locations. Each site duplicates the is_analytics_enabled() check, current_dir().unwrap_or_default() pattern, and record_fire_and_forget() call. When recording API changes, all 6 must be updated in lockstep.
  • Fix: Extract a helper in the analytics module:
    pub(crate) fn try_record_command(
        raw_text: String,
        compressed_text: String,
        original_cmd: String,
        command_type: CommandType,
        duration: Duration,
        parse_tier: Option<String>,
    ) {
        if !is_analytics_enabled() {
            return;
        }
        let cwd = std::env::current_dir()
            .unwrap_or_default()
            .display()
            .to_string();
        record_fire_and_forget(
            raw_text, compressed_text, original_cmd,
            command_type, duration, cwd, parse_tier,
        );
    }

3. analytics_meta Table Created Outside Migration System

  • Files: analytics/mod.rs:363-389 and analytics/schema.rs
  • Confidence: 95%
  • Issue: The analytics_meta table is created via CREATE TABLE IF NOT EXISTS in maybe_prune(), bypassing the PRAGMA user_version migration system. Future schema changes cannot be safely versioned.
  • Fix: Move table creation into schema::run_migrations():
    // In schema.rs, add to migration v1:
    "CREATE TABLE IF NOT EXISTS analytics_meta (
        key TEXT PRIMARY KEY,
        value INTEGER
    );"

4. Silent Row Deserialization Errors in Query Methods (4 occurrences)

  • Files: analytics/mod.rs:252,270,293,316
  • Confidence: 85%
  • Issue: filter_map(|r| r.ok()) silently swallows row deserialization errors. Schema drift or corrupt data would cause entire rows to vanish with no indication.
  • Fix: Propagate errors instead:
    let rows: Result<Vec<_>, _> = stmt
        .query_map(...)?
        .collect();
    Ok(rows?)

5. README and CLAUDE.md Omit Analytics Feature

  • Files: README.md, CLAUDE.md
  • Confidence: 95%
  • Issue: The skim stats subcommand, --disable-analytics flag, and three environment variables are completely missing from the project's primary documentation. Users have no way to discover this feature.
  • Fix:
    • Add analytics section to README Features
    • Document skim stats in Usage with examples
    • Add a section to CLAUDE.md describing the analytics module architecture and migration pattern

6. Missing test_db() Fixture Lifetime Management

  • Files: analytics/mod.rs:515-518
  • Confidence: 90%
  • Issue: NamedTempFile is dropped before AnalyticsDb is used. On Windows/some Linux configurations, the file is deleted immediately, breaking the WAL database. This works by accident on macOS because the inode persists, but is not portable.
  • Fix: Return the NamedTempFile to keep it alive:
    fn test_db() -> (AnalyticsDb, NamedTempFile) {
        let tmp = NamedTempFile::new().unwrap();
        let db = AnalyticsDb::open(tmp.path()).unwrap();
        (db, tmp)
    }

⚠️ Should-Fix Issues (High Confidence)

String Cloning & Redundant Token Counting in Analytics Paths

  • Files: cmd/mod.rs:202-204, cmd/build/mod.rs:169-173, cmd/git.rs:247-248, etc.
  • Issue: Command subcommands clone full output strings and re-tokenize on background threads. Use pre-computed counts instead via record_with_counts().

--disable-analytics Flag Only Respected on File Operations, Not Subcommands

  • Files: main.rs:297,498-505,529
  • Issue: The flag works for skim file.rs --disable-analytics but not skim test cargo --disable-analytics. Subcommands only check the env var.
  • Fix: Set env var early in main() when flag is present, or thread the flag through subcommand dispatch.

Analytics Recording Missing from Git Passthrough Path

  • Files: cmd/git.rs:184-209
  • Issue: run_passthrough() reports --show-stats but does NOT record analytics, creating a gap in the stats dashboard for commands like skim git status --porcelain.

Vitest Records Duration::ZERO Instead of Actual Duration

  • Files: cmd/test/vitest.rs:79
  • Issue: vitest/jest runs show 0ms in skim stats, skewing timing analytics. All other runners pass output.duration.

Missing Integration Tests for skim stats Subcommand

  • Files: crates/rskim/tests/ (missing cli_stats.rs)
  • Issue: The 303-line stats subcommand has zero integration tests, while every other subcommand has a corresponding cli_*.rs file.

is_analytics_enabled() Undocumented and Untested

  • Files: analytics/mod.rs:160-162
  • Issue: The env var toggle has zero unit tests. It is called in 8 code paths and is critical to the opt-out mechanism.

ℹ️ Additional Notes (Medium-Confidence / Documentation)

  • Dependency issue: rusqlite 0.31 is 8 major versions behind (0.39). Bundled SQLite may lack security patches.
  • License issue: colored crate uses MPL-2.0 (incompatible with MIT project). Consider owo-colors instead.
  • Dependency validation: SKIM_INPUT_COST_PER_MTOK accepts negative/infinity values. Add bounds checking.
  • Query documentation: Result structs (AnalyticsSummary, TierDistribution, etc.) lack doc comments explaining JSON schema.
  • Schema documentation: Future migration authors need guidance on the PRAGMA user_version pattern.
  • Parameter bloat: run_parsed_command_with_mode has grown to 8 parameters. Consider a CommandContext struct.

Summary

Blocking Issues: 6 (requires fixes before merge)
Should-Fix Issues: 6 (high confidence, significant impact)
Documentation Issues: 5 (blocks user/contributor discovery)

The analytics module is fundamentally sound (clean schema, proper WAL mode, parameterized queries), but integration issues are significant:

  1. Unconditional token counting threatens performance targets
  2. Cross-cutting recording pattern is fragmented and will be hard to maintain
  3. Documentation gaps mean users/contributors won't discover the feature

Recommendation: Changes requested. These are addressable issues, not blockers, but they should be resolved before merge to maintain code quality and performance expectations.

Dean Sharon and others added 6 commits March 24, 2026 16:53
Add cli_stats.rs with 5 tests covering the stats subcommand:
- help output verification
- graceful empty database message
- JSON format output validation
- clear command success
- cost flag with JSON output structure

Uses SKIM_ANALYTICS_DB env var for test isolation with tempfiles.

Co-Authored-By: Claude <noreply@anthropic.com>
Unconditional tiktoken BPE token counting in process_file(),
process_stdin(), and try_cached_result() violated the <50ms
performance target on every invocation -- even when neither
--show-stats nor analytics recording was enabled.

Gate all three count_token_pair() call sites behind
`options.show_stats || crate::analytics::is_analytics_enabled()`
so the BPE encoding is skipped in the common case.

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix test_db() NamedTempFile lifetime: return the handle alongside
  AnalyticsDb so the temp file survives for the test duration
- Fix is_analytics_enabled() to check for truthy values ("1", "true",
  "yes") instead of treating any env var presence as disabled
- Add 9 unit tests covering is_analytics_enabled() behavior

Co-Authored-By: Claude <noreply@anthropic.com>
- Record analytics in git passthrough path (run_passthrough) matching
  the existing pattern in run_parsed_command
- Propagate --disable-analytics flag to SKIM_DISABLE_ANALYTICS env var
  in run_file_operation so multi-file workers respect it
- Replace Duration::ZERO with actual elapsed timing in vitest parser

Co-Authored-By: Claude <noreply@anthropic.com>
- Reject Infinity in PricingModel::from_env_or_default() with is_finite() guard
- Add tests for inf and NaN pricing env var values
Refactor AnalyticsDb usage in stats command to use AnalyticsStore trait,
enabling easier testing without database dependencies. Add comprehensive
test suite for dashboard rendering logic.

Also update README with analytics and cost estimation documentation.

Co-Authored-By: Claude <noreply@anthropic.com>
.cloned()
.collect();

let runner = filtered_args[0].as_str();
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: Panic on skim test --show-stats without runner

When invoked as skim test --show-stats (without a runner argument), this crashes with index-out-of-bounds.

The --show-stats flag is filtered from args AFTER the emptiness check on line 21. When args contains only --show-stats, the guard passes. After filtering (lines 27-31), filtered_args becomes empty. Line 33 indexes filtered_args[0], panicking.

Fix: Move the filtering BEFORE the emptiness check:

let show_stats = args.iter().any(|a| a == "--show-stats");
let filtered_args: Vec<String> = args
    .iter()
    .filter(|a| a.as_str() != "--show-stats")
    .cloned()
    .collect();

if filtered_args.is_empty() || filtered_args.iter().any(|a| matches!(a.as_str(), "--help" | "-h")) {
    print_help();
    return Ok(ExitCode::SUCCESS);
}

let runner = filtered_args[0].as_str();

This blocks merge. Reproduced: cargo run --bin skim -- test --show-stats panics.

impl AnalyticsDb {
/// Open database at the given path, run migrations, enable WAL mode.
pub(crate) fn open(path: &Path) -> anyhow::Result<Self> {
let conn = Connection::open(path)?;
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: SQLite database created without restrictive permissions

Connection::open(path) creates files with default umask permissions (typically 0644). This is a security issue when SKIM_ANALYTICS_DB points outside the protected ~/.cache/skim/ directory.

The database contains filesystem paths (project_path), command history (original_cmd), and usage patterns -- all world-readable if permissions are not explicitly set to 0600.

Fix: Restrict file permissions after database creation:

pub(crate) fn open(path: &Path) -> anyhow::Result<Self> {
    let conn = Connection::open(path)?;
    
    // Restrict database file permissions on Unix
    #[cfg(unix)]
    {
        use std::os::unix::fs::PermissionsExt;
        let perms = std::fs::Permissions::from_mode(0o600);
        let _ = std::fs::set_permissions(path, perms);
    }
    
    conn.busy_timeout(Duration::from_millis(5000))?;
    conn.execute_batch("PRAGMA journal_mode=WAL;")?;
    schema::run_migrations(&conn)?;
    Ok(Self { conn })
}

// Propagate --disable-analytics to env var so that all code paths
// (including multi-file workers) respect it via is_analytics_enabled().
if args.disable_analytics {
std::env::set_var("SKIM_DISABLE_ANALYTICS", "1");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: --disable-analytics flag silently dropped for subcommands

The flag is consumed in the pre-parser (lines 111-124) but NOT included in remaining_args, so subcommands never see it. The env var set_var("SKIM_DISABLE_ANALYTICS", "1") only runs in run_file_operation() (line 478), NOT reached for subcommands.

Result: skim --disable-analytics test cargo is silently ignored -- the test still records analytics. Users who explicitly opt out are still tracked.

Fix: Move the set_var call to main() BEFORE routing:

fn main() -> ExitCode {
    // Check for --disable-analytics in raw args before routing
    let raw_args: Vec<String> = std::env::args().collect();
    if raw_args.iter().any(|a| a == "--disable-analytics") {
        std::env::set_var("SKIM_DISABLE_ANALYTICS", "1");
    }
    
    let result: anyhow::Result<ExitCode> = match resolve_invocation() {
        Invocation::FileOperation => run_file_operation().map(|()| ExitCode::SUCCESS),
        Invocation::Subcommand { name, args } => cmd::dispatch(&name, &args),
    };
    // ...
}

// Propagate --disable-analytics to env var so that all code paths
// (including multi-file workers) respect it via is_analytics_enabled().
if args.disable_analytics {
std::env::set_var("SKIM_DISABLE_ANALYTICS", "1");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: std::env::set_var unsound in multi-threaded context

set_var("SKIM_DISABLE_ANALYTICS", "1") modifies shared process state without synchronization. While this call likely occurs early enough that threads have not started, the approach is technically unsound and may fail under future Rust editions or with MIRI.

Fix: Use an AtomicBool instead:

// In analytics/mod.rs
use std::sync::atomic::{AtomicBool, Ordering};
static ANALYTICS_FORCE_DISABLED: AtomicBool = AtomicBool::new(false);

pub(crate) fn force_disable_analytics() {
    ANALYTICS_FORCE_DISABLED.store(true, Ordering::Release);
}

pub(crate) fn is_analytics_enabled() -> bool {
    if ANALYTICS_FORCE_DISABLED.load(Ordering::Acquire) {
        return false;
    }
    // ... rest of logic
}

Then call analytics::force_disable_analytics() from main instead of using set_var.


let sub = args.first().map(String::as_str);
let remaining = if args.len() > 1 { &args[1..] } else { &[] };
let show_stats = args.iter().any(|a| a == "--show-stats");
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH: Duplicated --show-stats filtering pattern across 3 subcommand dispatchers

The exact same 6-line pattern appears in build/mod.rs, git.rs, and test/mod.rs:

let show_stats = args.iter().any(|a| a == "--show-stats");
let filtered_args: Vec<String> = args
    .iter()
    .filter(|a| a.as_str() != "--show-stats")
    .cloned()
    .collect();

This is the same class of problem as the analytics DRY violation that was already fixed with try_record_command.

Fix: Extract a shared helper in cmd/mod.rs:

pub(crate) struct ParsedSubcommandArgs {
    pub args: Vec<String>,
    pub show_stats: bool,
}

pub(crate) fn extract_skim_flags(args: &[String]) -> ParsedSubcommandArgs {
    ParsedSubcommandArgs {
        show_stats: args.iter().any(|a| a == "--show-stats"),
        args: args.iter()
            .filter(|a| !matches!(a.as_str(), "--show-stats"))
            .cloned()
            .collect(),
    }
}

This also provides the extension point for threading --disable-analytics through subcommands.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Mar 25, 2026

Code Review Summary: Wave 5 (#93)

Overview

Review of 8 comprehensive reports (security, architecture, performance, complexity, consistency, regression, tests, documentation) spanning the analytics pipeline and stats subcommand.

Inline Comments Posted ✓

The following critical issues have been posted as inline comments:

  • CRITICAL: skim test --show-stats panic (index out of bounds)
  • HIGH (5 detailed comments):
    • SQLite database file permissions (0644 vs 0600)
    • --disable-analytics silently dropped for subcommands
    • std::env::set_var unsound in multi-threaded context
    • Duplicated --show-stats filtering pattern (6 lines in 3 places)
    • Duplicate user_has_flag in go.rs with divergent semantics

Additional HIGH Blocking Issues (60-79% confidence or documentation)

  1. Token counting unconditionally on hot path (process.rs:268,303 - 88% confidence)

    • Main-thread tiktoken BPE runs on EVERY file operation when analytics enabled (default)
    • Performance regression: adds 5-15ms per file within <50ms target
    • Fix: Gate token counting to --show-stats only; defer to background thread for analytics
  2. Cache hit file re-read when analytics enabled (process.rs:114-115 - 88% confidence)

    • Stale cache entries re-read from disk when analytics enabled
    • Degrades documented 40-50x cache speedup
    • Fix: Skip recount unless show_stats explicitly requested
  3. Scope inconsistency: --disable-analytics vs --show-stats (83% confidence)

    • --show-stats threaded through all subcommands; --disable-analytics only file ops
    • Users expect consistency; skim test cargo --disable-analytics silently ignored
    • Fix: Extract shared flag helper for both flags
  4. analytics/mod.rs 1048 lines with 6 duplicate query methods (90% confidence)

    • Exceeds warning threshold; identical pattern with triplicated WHERE composition
    • Maintainability risk as module evolves
    • Fix: Extract generic query_grouped<T> helper
  5. README: --cost misleadingly in Common options (README.md:189 - 90% confidence)

    • Flag listed alongside global options but only valid for skim stats
    • Users will try skim file.ts --cost and get unexpected behavior
    • Fix: Remove from Common options (already under Analytics section)

Medium Priority (Should-Fix)

  • Stats dashboard tests assert only is_ok() -- no output validation (7 tests)
  • output.stdout.clone() before analytics enabled check (git.rs:211 - 85% confidence)
  • String cloning unconditionally in 6 analytics call sites before checking enabled (82% confidence)
  • Double token counting when show_stats + analytics both active (82% confidence)
  • Parameter bloat: run_parsed_command_with_mode (8 params), record_with_counts (8 params)
  • README duplicates research claim verbatim (README.md:16,20 - 82% confidence)
  • CLAUDE.md not updated for analytics/stats features (85% confidence)

Test Coverage Gaps

✅ Strong: is_analytics_enabled tests, PricingModel edge cases, skim stats integration tests, trait abstraction

⚠️ Gaps:

  • Stats dashboard rendering no output validation
  • --since time filtering untested end-to-end
  • Fire-and-forget pipeline untested
  • --disable-analytics flag propagation untested
  • MockStore ignores since parameter

Recommendation

Status: CHANGES_REQUESTED

Blockers requiring resolution before merge:

  1. CRITICAL: skim test --show-stats panic
  2. HIGH: Database permissions, flag propagation, env var soundness
  3. HIGH: Token counting performance regression
  4. HIGH: Cache re-read performance regression
  5. HIGH: Duplicate user_has_flag in go.rs
  6. HIGH: Documentation clarity (README --cost placement)

All deferred findings catalogued in review reports for post-merge backlog.


Report sources: security.md, architecture.md, performance.md, complexity.md, consistency.md, regression.md, tests.md, documentation.md
Inline confidence threshold: ≥80%
Summary threshold: 60-79% + documentation issues*

Dean Sharon added 3 commits March 25, 2026 16:03
CRITICAL:
- Fix `skim test --show-stats` index-out-of-bounds panic (split_first guard)

HIGH:
- Extract `extract_show_stats()` shared helper, replacing 3 copy-pasted blocks
- Rename go.rs local `user_has_flag` to `go_has_flag` with doc explaining
  Go's `-flag=false` semantics
- Gate main-thread token counting on `show_stats` only (was analytics-gated,
  adding 5-15ms to every file operation by default)

MEDIUM (security):
- Set SQLite DB permissions to 0600 on Unix
- Replace `std::env::set_var` with AtomicBool for thread-safe disable flag
- Propagate `--disable-analytics` to subcommands (was file-ops only)
- Truncate `original_cmd` to 500 chars before INSERT

MEDIUM (performance):
- Cache `needs_recount` gated on `show_stats` only (preserves 40-50x speedup)
- Guard all analytics call sites with `is_analytics_enabled()` before allocation

MEDIUM (code quality):
- `ParsedCommandConfig` struct replaces 8 positional params
- `record_with_counts` accepts `TokenSavingsRecord` directly
- `since_clause_with_extra()` deduplicates WHERE clause logic
- Add ANSI stripping to build's run_parsed_command
- Stats tests validate output content (JSON deserialize, dashboard substrings)

MEDIUM (documentation):
- Remove `--cost` from README common options (stats-only flag)
- Consolidate duplicate "Research consistently shows..." paragraph
- stats.rs help text specifies accepted values for SKIM_DISABLE_ANALYTICS
- Expand analytics module doc (WAL, threading, pruning, trait, migrations)
@dean0x dean0x merged commit 1dda698 into main Mar 25, 2026
4 of 5 checks passed
@dean0x dean0x deleted the wave/5 branch March 25, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant