Conversation
- Add rusqlite (bundled) and colored workspace dependencies - Create analytics module with SQLite-backed token savings storage - Schema with migrations, WAL mode, and 5-second busy timeout - Query functions: summary, daily, by-command, by-language, by-mode, tier distribution - Fire-and-forget recording with background thread (non-blocking) - PricingModel with env var override (SKIM_INPUT_COST_PER_MTOK) - Auto-prune records older than 90 days (daily check) - SKIM_DISABLE_ANALYTICS env var to opt out - Fix clear_cache to skip analytics.db (only removes .json cache files) - 13 unit tests covering all query paths, pruning, and pricing Co-Authored-By: Claude <noreply@anthropic.com>
- Make count_token_pair pub(crate) for reuse across modules - Add show_stats parameter to run_parsed_command_with_mode - Strip --show-stats from args before forwarding to underlying tools - Wire token stats through all test runners: cargo, go, vitest, pytest - Wire token stats through build runners: cargo, clippy, tsc - Wire token stats through git subcommands: status, diff, log - Always count tokens in process_file/process_stdin (removes conditional) - Token counts now cached for analytics pipeline in later steps Co-Authored-By: Claude <noreply@anthropic.com>
- Create stats subcommand with terminal dashboard and JSON output - Wire analytics recording into single-file, stdin, and multi-file paths - Wire analytics into run_parsed_command_with_mode for test runners - Add --disable-analytics CLI flag to opt out per-invocation - Add --format as value-consuming flag to prevent subcommand mis-routing - Register stats in KNOWN_SUBCOMMANDS and dispatch - Dashboard shows: summary, efficiency meter, by-command, by-language, by-mode, parse tier distribution - JSON mode includes daily breakdown - Support --since filter using existing parse_duration_ago - Support --clear to delete all analytics data Co-Authored-By: Claude <noreply@anthropic.com>
- Cost estimates use PricingModel with claude-sonnet-4-6 default ($3/MTok) - SKIM_INPUT_COST_PER_MTOK env var overrides pricing for custom models - Terminal dashboard: model name, input cost, estimated savings in green - JSON output: cost_estimate section with model, rate, savings, tokens - Document environment variables in stats --help output: SKIM_INPUT_COST_PER_MTOK, SKIM_ANALYTICS_DB, SKIM_DISABLE_ANALYTICS Co-Authored-By: Claude <noreply@anthropic.com>
- Add missing analytics recording to go, vitest, pytest, build, and git subcommands so `skim stats` shows complete data across all commands - Replace .unwrap() with .unwrap_or_default() in prune_older_than to prevent panic if system clock is before epoch - Deduplicate user_has_flag across modules by importing from crate::cmd - Extract savings_percentage, now_unix_secs, persist_record helpers to reduce duplication in analytics recording functions - Simplify dispatch match to remove dead unimplemented-subcommand code - Use idiomatic Option<&str> instead of &Option<String> in stats dashboard - Simplify tsc run() by removing unnecessary intermediate vector
- Add command_type parameter to run_parsed_command_with_mode instead of hardcoding CommandType::Test - Add disable_analytics field to MultiFileOptions so the --disable-analytics flag is respected in multi-file (glob/directory) paths
Add DevFlow-generated file that protects against committing sensitive files and context pollution. Ensures consistent filtering across Claude Code sessions. Co-Authored-By: Claude <noreply@anthropic.com>
Wave 5 Review Summary: Token Analytics & Stats Dashboard🔴 Blocking Issues1. Unconditional Token Counting Performance Regression (CRITICAL)
2. Duplicated Analytics Recording Pattern (6 call sites)
3.
|
Add cli_stats.rs with 5 tests covering the stats subcommand: - help output verification - graceful empty database message - JSON format output validation - clear command success - cost flag with JSON output structure Uses SKIM_ANALYTICS_DB env var for test isolation with tempfiles. Co-Authored-By: Claude <noreply@anthropic.com>
Unconditional tiktoken BPE token counting in process_file(), process_stdin(), and try_cached_result() violated the <50ms performance target on every invocation -- even when neither --show-stats nor analytics recording was enabled. Gate all three count_token_pair() call sites behind `options.show_stats || crate::analytics::is_analytics_enabled()` so the BPE encoding is skipped in the common case. Co-Authored-By: Claude <noreply@anthropic.com>
- Fix test_db() NamedTempFile lifetime: return the handle alongside
AnalyticsDb so the temp file survives for the test duration
- Fix is_analytics_enabled() to check for truthy values ("1", "true",
"yes") instead of treating any env var presence as disabled
- Add 9 unit tests covering is_analytics_enabled() behavior
Co-Authored-By: Claude <noreply@anthropic.com>
- Record analytics in git passthrough path (run_passthrough) matching the existing pattern in run_parsed_command - Propagate --disable-analytics flag to SKIM_DISABLE_ANALYTICS env var in run_file_operation so multi-file workers respect it - Replace Duration::ZERO with actual elapsed timing in vitest parser Co-Authored-By: Claude <noreply@anthropic.com>
- Reject Infinity in PricingModel::from_env_or_default() with is_finite() guard - Add tests for inf and NaN pricing env var values
Refactor AnalyticsDb usage in stats command to use AnalyticsStore trait, enabling easier testing without database dependencies. Add comprehensive test suite for dashboard rendering logic. Also update README with analytics and cost estimation documentation. Co-Authored-By: Claude <noreply@anthropic.com>
crates/rskim/src/cmd/test/mod.rs
Outdated
| .cloned() | ||
| .collect(); | ||
|
|
||
| let runner = filtered_args[0].as_str(); |
There was a problem hiding this comment.
CRITICAL: Panic on skim test --show-stats without runner
When invoked as skim test --show-stats (without a runner argument), this crashes with index-out-of-bounds.
The --show-stats flag is filtered from args AFTER the emptiness check on line 21. When args contains only --show-stats, the guard passes. After filtering (lines 27-31), filtered_args becomes empty. Line 33 indexes filtered_args[0], panicking.
Fix: Move the filtering BEFORE the emptiness check:
let show_stats = args.iter().any(|a| a == "--show-stats");
let filtered_args: Vec<String> = args
.iter()
.filter(|a| a.as_str() != "--show-stats")
.cloned()
.collect();
if filtered_args.is_empty() || filtered_args.iter().any(|a| matches!(a.as_str(), "--help" | "-h")) {
print_help();
return Ok(ExitCode::SUCCESS);
}
let runner = filtered_args[0].as_str();This blocks merge. Reproduced: cargo run --bin skim -- test --show-stats panics.
| impl AnalyticsDb { | ||
| /// Open database at the given path, run migrations, enable WAL mode. | ||
| pub(crate) fn open(path: &Path) -> anyhow::Result<Self> { | ||
| let conn = Connection::open(path)?; |
There was a problem hiding this comment.
HIGH: SQLite database created without restrictive permissions
Connection::open(path) creates files with default umask permissions (typically 0644). This is a security issue when SKIM_ANALYTICS_DB points outside the protected ~/.cache/skim/ directory.
The database contains filesystem paths (project_path), command history (original_cmd), and usage patterns -- all world-readable if permissions are not explicitly set to 0600.
Fix: Restrict file permissions after database creation:
pub(crate) fn open(path: &Path) -> anyhow::Result<Self> {
let conn = Connection::open(path)?;
// Restrict database file permissions on Unix
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let perms = std::fs::Permissions::from_mode(0o600);
let _ = std::fs::set_permissions(path, perms);
}
conn.busy_timeout(Duration::from_millis(5000))?;
conn.execute_batch("PRAGMA journal_mode=WAL;")?;
schema::run_migrations(&conn)?;
Ok(Self { conn })
}
crates/rskim/src/main.rs
Outdated
| // Propagate --disable-analytics to env var so that all code paths | ||
| // (including multi-file workers) respect it via is_analytics_enabled(). | ||
| if args.disable_analytics { | ||
| std::env::set_var("SKIM_DISABLE_ANALYTICS", "1"); |
There was a problem hiding this comment.
HIGH: --disable-analytics flag silently dropped for subcommands
The flag is consumed in the pre-parser (lines 111-124) but NOT included in remaining_args, so subcommands never see it. The env var set_var("SKIM_DISABLE_ANALYTICS", "1") only runs in run_file_operation() (line 478), NOT reached for subcommands.
Result: skim --disable-analytics test cargo is silently ignored -- the test still records analytics. Users who explicitly opt out are still tracked.
Fix: Move the set_var call to main() BEFORE routing:
fn main() -> ExitCode {
// Check for --disable-analytics in raw args before routing
let raw_args: Vec<String> = std::env::args().collect();
if raw_args.iter().any(|a| a == "--disable-analytics") {
std::env::set_var("SKIM_DISABLE_ANALYTICS", "1");
}
let result: anyhow::Result<ExitCode> = match resolve_invocation() {
Invocation::FileOperation => run_file_operation().map(|()| ExitCode::SUCCESS),
Invocation::Subcommand { name, args } => cmd::dispatch(&name, &args),
};
// ...
}
crates/rskim/src/main.rs
Outdated
| // Propagate --disable-analytics to env var so that all code paths | ||
| // (including multi-file workers) respect it via is_analytics_enabled(). | ||
| if args.disable_analytics { | ||
| std::env::set_var("SKIM_DISABLE_ANALYTICS", "1"); |
There was a problem hiding this comment.
HIGH: std::env::set_var unsound in multi-threaded context
set_var("SKIM_DISABLE_ANALYTICS", "1") modifies shared process state without synchronization. While this call likely occurs early enough that threads have not started, the approach is technically unsound and may fail under future Rust editions or with MIRI.
Fix: Use an AtomicBool instead:
// In analytics/mod.rs
use std::sync::atomic::{AtomicBool, Ordering};
static ANALYTICS_FORCE_DISABLED: AtomicBool = AtomicBool::new(false);
pub(crate) fn force_disable_analytics() {
ANALYTICS_FORCE_DISABLED.store(true, Ordering::Release);
}
pub(crate) fn is_analytics_enabled() -> bool {
if ANALYTICS_FORCE_DISABLED.load(Ordering::Acquire) {
return false;
}
// ... rest of logic
}Then call analytics::force_disable_analytics() from main instead of using set_var.
crates/rskim/src/cmd/build/mod.rs
Outdated
|
|
||
| let sub = args.first().map(String::as_str); | ||
| let remaining = if args.len() > 1 { &args[1..] } else { &[] }; | ||
| let show_stats = args.iter().any(|a| a == "--show-stats"); |
There was a problem hiding this comment.
HIGH: Duplicated --show-stats filtering pattern across 3 subcommand dispatchers
The exact same 6-line pattern appears in build/mod.rs, git.rs, and test/mod.rs:
let show_stats = args.iter().any(|a| a == "--show-stats");
let filtered_args: Vec<String> = args
.iter()
.filter(|a| a.as_str() != "--show-stats")
.cloned()
.collect();This is the same class of problem as the analytics DRY violation that was already fixed with try_record_command.
Fix: Extract a shared helper in cmd/mod.rs:
pub(crate) struct ParsedSubcommandArgs {
pub args: Vec<String>,
pub show_stats: bool,
}
pub(crate) fn extract_skim_flags(args: &[String]) -> ParsedSubcommandArgs {
ParsedSubcommandArgs {
show_stats: args.iter().any(|a| a == "--show-stats"),
args: args.iter()
.filter(|a| !matches!(a.as_str(), "--show-stats"))
.cloned()
.collect(),
}
}This also provides the extension point for threading --disable-analytics through subcommands.
Code Review Summary: Wave 5 (#93)OverviewReview of 8 comprehensive reports (security, architecture, performance, complexity, consistency, regression, tests, documentation) spanning the analytics pipeline and stats subcommand. Inline Comments Posted ✓The following critical issues have been posted as inline comments:
Additional HIGH Blocking Issues (60-79% confidence or documentation)
Medium Priority (Should-Fix)
Test Coverage Gaps✅ Strong: is_analytics_enabled tests, PricingModel edge cases, skim stats integration tests, trait abstraction
RecommendationStatus: CHANGES_REQUESTED Blockers requiring resolution before merge:
All deferred findings catalogued in review reports for post-merge backlog. Report sources: security.md, architecture.md, performance.md, complexity.md, consistency.md, regression.md, tests.md, documentation.md |
CRITICAL: - Fix `skim test --show-stats` index-out-of-bounds panic (split_first guard) HIGH: - Extract `extract_show_stats()` shared helper, replacing 3 copy-pasted blocks - Rename go.rs local `user_has_flag` to `go_has_flag` with doc explaining Go's `-flag=false` semantics - Gate main-thread token counting on `show_stats` only (was analytics-gated, adding 5-15ms to every file operation by default) MEDIUM (security): - Set SQLite DB permissions to 0600 on Unix - Replace `std::env::set_var` with AtomicBool for thread-safe disable flag - Propagate `--disable-analytics` to subcommands (was file-ops only) - Truncate `original_cmd` to 500 chars before INSERT MEDIUM (performance): - Cache `needs_recount` gated on `show_stats` only (preserves 40-50x speedup) - Guard all analytics call sites with `is_analytics_enabled()` before allocation MEDIUM (code quality): - `ParsedCommandConfig` struct replaces 8 positional params - `record_with_counts` accepts `TokenSavingsRecord` directly - `since_clause_with_extra()` deduplicates WHERE clause logic - Add ANSI stripping to build's run_parsed_command - Stats tests validate output content (JSON deserialize, dashboard substrings) MEDIUM (documentation): - Remove `--cost` from README common options (stats-only flag) - Consolidate duplicate "Research consistently shows..." paragraph - stats.rs help text specifies accepted values for SKIM_DISABLE_ANALYTICS - Expand analytics module doc (WAL, threading, pruning, trait, migrations)
Summary
--show-statsflag through all subcommands (test, build, git) for unified token statsskim statssubcommand with terminal dashboard and JSON output--costflag with economics estimates (default: Claude Sonnet $3/MTok)Changes
Analytics Module (
analytics/mod.rs,analytics/schema.rs)SKIM_DISABLE_ANALYTICSenv var to opt outSKIM_ANALYTICS_DBenv var to override database path--show-statsfor subcommandsskim test cargo,skim test go,skim test vitest,skim test pytest,skim build cargo,skim build clippy,skim build tsc,skim git status,skim git diff,skim git log--show-statsis stripped from args before forwarding to underlying toolsprocess_file/process_stdin(no longer conditional on--show-stats)skim statssubcommand--format json: machine-readable output with all breakdowns--since <DURATION>: time filter (7d, 24h, 4w)--cost: economics section with configurable pricing model--clear: delete all analytics datacoloredcrate (respectsNO_COLOR)Infrastructure
clear_cachenow skipsanalytics.db(only removes.jsoncache files)--disable-analyticsCLI flag for per-invocation opt-out--formatregistered as value-consuming flag to prevent subcommand mis-routingrusqlite(bundled) andcoloredadded as workspace dependenciesTest plan
cargo clippy -- -D warningscleanskim stats,skim stats --help,skim stats --format json,skim stats --costskim src/main.rsthenskim statsto verify recording works end-to-endSKIM_DISABLE_ANALYTICS=1 skim file.rsdoes not record