Skip to content

Latest commit

 

History

History
240 lines (174 loc) · 8.69 KB

File metadata and controls

240 lines (174 loc) · 8.69 KB

ccutils

Claude utilities for session transcripts, star schema analytics, data exploration, and probably more as it comes up as a use case in my day to day.

Origin: This project began as a fork of Simon Willison's claude-code-transcripts. It has since diverged significantly with star schema analytics, a visual data explorer, modular architecture, and as a broader Claude utility.

Installation

uv tool install -e .

Or run without installing:

uv run ccutils --help

Quick Start

# Interactive two-phase picker - select projects, then sessions
ccutils

# Convert a single session file (opens in browser)
ccutils session.jsonl

# Export to DuckDB for SQL analytics
ccutils --format duckdb -o ./archive

# Export with star schema (22 tables + 10 views)
ccutils --format duckdb-star -o ./analytics

# Launch visual data explorer
ccutils explore ./analytics/archive.duckdb

Commands

Command Description
local Interactive picker + single-file conversion -- default (no subcommand needed)
all Batch convert all sessions (HTML archive, DuckDB, or JSON)
web Import from Claude API (auto-detects credentials from macOS keychain)
explore Launch Data Explorer web UI for star schema databases
import Import Claude.ai account exports (Settings > Privacy > Export)
schema Inspect JSON structure without exposing content (safe to share publicly)

local (default)

The default command. With no arguments, launches a two-phase interactive picker (projects then sessions). Pass a session file to convert it directly.

Thinking blocks and subagent sessions are included by default.

ccutils                                          # Interactive picker
ccutils session.jsonl                            # Convert file, open in browser
ccutils session.jsonl --format duckdb-star -o .  # Star schema from file
ccutils --format duckdb-star -o ./analytics      # Pick sessions, star schema
ccutils -p myproject                             # Filter by project name
ccutils --flat                                   # Legacy single-list mode
ccutils --no-thinking --no-subagents             # Exclude thinking/agents
ccutils --format duckdb-star --embed -o .        # With ColBERT embeddings

all

Batch convert every session. Agents and thinking blocks included by default.

ccutils all -o ./archive                         # HTML archive with search index
ccutils all --format duckdb-star -o ./analytics   # Star schema for all sessions
ccutils all --format duckdb-star --embed -o ./out # With ColBERT embeddings
ccutils all -j 4 --batch-size 20 -o ./archive    # Parallel processing
ccutils all --no-agents --no-thinking             # Exclude agents and thinking
ccutils all --dry-run                            # Preview without converting

import

Import Claude.ai web conversation exports (the ZIP/directory from Settings > Privacy).

ccutils import ./my-claude-export --open         # HTML, opens in browser
ccutils import ./export --format duckdb -o data.duckdb  # DuckDB
ccutils import ./export --interactive            # Pick conversations
ccutils import ./export --list                   # List without converting

explore

Visual query builder for star schema DuckDB databases. Runs a local web server.

ccutils explore ./analytics/archive.duckdb

schema

Inspect JSON file structure without exposing content. Output is safe to share publicly or paste into AI assistants.

ccutils schema conversations.json
ccutils schema ./my-claude-export/               # Inspect all files in directory
ccutils schema ./export --json > schema.json     # Machine-readable output

Export Formats

Schema type is auto-inferred from --format: duckdb-star and json-star use star schema, plain duckdb and json use simple.

HTML Transcripts

Clean, mobile-friendly HTML with pagination, commit timeline, tool stats, and full-text search.

ccutils -o ./transcript --open
ccutils all -o ./archive                    # Archive with master index and search

DuckDB Analytics

Simple Schema (4 tables)

ccutils --format duckdb -o ./archive

Tables: sessions, messages, tool_calls, thinking

Star Schema (22 tables + 10 views)

ccutils --format duckdb-star -o ./analytics

Dimensional model designed for analytics:

  • 6 dimensions: sessions (with heuristic classifications), projects, tools (with categories), models (with families), dates, times
  • 6 core facts: messages, tool calls (with duration tracking), session summaries (with inclusive agent metric rollup), file operations, errors (with type classification), tool chain steps
  • 5 granular tables: files (with language detection), session chains, content blocks, code blocks, entity mentions
  • 3 agent/bridge tables: agent delegations (with denormalized metrics), cross-session file tracking, task-agent mapping
  • 2 optional: ColBERT embeddings, tool input parameters
  • 10 semantic views: pre-joined views for common queries (includes project context and file tracking)

Heuristic Classification

The star schema ETL runs heuristic classification during ingestion with zero external dependencies -- no LLM, no API key needed. Results are stored on dim_session:

Classifier Method Values
Intent Score-based keyword matching on first user message bug_fix, feature, refactor, debug, test, docs, review, explore
Complexity Points-based scoring from session metrics trivial, simple, moderate, complex
Outcome Inferred from last assistant message + error rate success, failure, unknown
Domain Inferred from file extensions touched web, backend, data, devops, docs, mixed, unknown
Error type Classified from error message text (on fact_errors) permission_denied, file_not_found, syntax_error, timeout, import_error, tool_error
-- What kinds of sessions do I have?
SELECT intent, complexity, COUNT(*) as sessions
FROM dim_session GROUP BY intent, complexity ORDER BY sessions DESC;

-- Tool usage by category
SELECT dt.tool_category, COUNT(*) as uses
FROM fact_tool_calls ftc
JOIN dim_tool dt ON ftc.tool_key = dt.tool_key
GROUP BY dt.tool_category ORDER BY uses DESC;

-- Am I more productive mornings or evenings?
SELECT dti.time_of_day, COUNT(*) as sessions, AVG(fss.total_messages) as avg_msgs
FROM fact_session_summary fss
JOIN dim_time dti ON fss.time_key = dti.time_key
GROUP BY dti.time_of_day;

-- Most-touched files across all sessions
SELECT df.file_path, SUM(bsf.write_count + bsf.edit_count) as modifications
FROM bridge_session_file bsf
JOIN dim_file df ON bsf.file_key = df.file_key
GROUP BY df.file_path ORDER BY modifications DESC LIMIT 20;

-- Catch up on a project (what was worked on recently)
SELECT first_user_message, last_assistant_message, intent, created_at
FROM semantic_project_context
WHERE project_name = 'my-project' LIMIT 5;

JSON Export

# Simple schema - single file
ccutils --format json -o ./sessions.json

# Star schema - directory structure (dimensions/ + facts/ + meta.json)
ccutils --format json-star -o ./star-export/

Common Options

# Output
-o, --output PATH          Output directory or file
--format FORMAT            html, duckdb, duckdb-star, json, json-star
--open                     Open result in browser

# Content (included by default -- use flags to exclude)
--no-thinking              Exclude thinking blocks
--no-subagents             Exclude related agent sessions (local)
--no-agents                Exclude agent-* session files (all)
--private                  Sanitize file paths for sharing

# Selection (local command)
--flat                     Flat single-list mode (skip project grouping)
--expand-chains            Show individual sessions in resumed chains
-p, --project TEXT         Filter by project name

# Embeddings (local and all commands, star schema only)
--embed [MODEL]            Run ColBERT embeddings (optionally specify model)

# Batch processing (all command)
-j, --jobs N               Parallel workers (default: 1)
--batch-size N             Sessions per transaction (default: 10)
--no-search-index          Skip search index generation

Documentation

Development

uv run pytest              # Run tests (~737 passing)
uv run ccutils --help      # Run development version
uv run pytest --cov=ccutils  # Coverage

License

Apache-2.0