Skip to content

Commit e4d203a

Browse files
Coldaineclaude
andauthored
feat: multi-crate workspace refactoring with CI and platform detection (#9)
* corrected documentation and prepared for CI review/update * feat: complete multi-crate workspace refactoring with enhanced CI This comprehensive refactor splits the ColdVox codebase into specialized crates: - coldvox-foundation: Core types, state management, and error handling - coldvox-audio: Audio capture, processing, and chunking pipeline - coldvox-vad: Voice activity detection with Level3 and Silero backends - coldvox-stt: Speech-to-text core abstractions - coldvox-stt-vosk: Vosk STT integration - coldvox-text-injection: Multi-backend text injection library - coldvox-telemetry: Pipeline metrics and performance tracking - coldvox-gui: GUI application components - app: Main application crate integrating all components Key improvements: - Enhanced CI/CD with feature matrix testing, cross-platform builds - Comprehensive test coverage across all crates - Security hardening and input validation - Performance optimizations and benchmarking - Modular architecture enabling selective feature compilation - Improved error handling and recovery mechanisms - Thread-safe concurrent processing - Platform-specific optimizations for Linux desktop environments Breaking changes: - Module restructuring requires import path updates - Configuration format changes for multi-crate setup - Some API signatures updated for better type safety 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: timestamp calculation and test failures - Fix AudioChunker to calculate timestamps based on audio position instead of Instant::now() - Fix chunker_timing_tests to properly compare timestamp durations - Add feature gates for Level3 VAD tests to prevent build failures - Fix clippy warnings: remove unnecessary casts, unused variables, field reassignments - Update CLAUDE.md to clarify text injection is core functionality - Remove outdated vad_demo example that used deprecated APIs - Remove outdated CI documentation files All 47 tests now pass. Core audio pipeline and text injection are functional. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * feat: add automatic platform detection for text-injection backends - Add platform-specific dependencies that auto-enable appropriate backends - Linux: AT-SPI, Wayland clipboard, ydotool - Windows/macOS: Enigo - Create build.rs to detect Wayland vs X11 at compile time - Remove duplicate text_injection code from app crate - Delete 137KB of duplicated code in crates/app/src/text_injection/ - Replace with simple re-export from coldvox-text-injection crate - Fix architecture consistency: all functionality in coldvox-* crates - Add placeholder AT-SPI implementation (atspi 0.22 API differs from expected) BREAKING: Import paths for text_injection types may need updating 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * chore: update dependencies and clean up processor imports * fix: resolve GitHub Actions benchmark workflow issues - Add missing parameters to benchmark-action/github-action-benchmark - Feature-gate text_injection module to prevent compilation errors - Target specific benchmark with minimal features for CI stability - Configure gh-pages branch and benchmark data directory 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * CI: align workflows with shared router; add least-privilege permissions & concurrency; replace unresolved actions; gate benchmarks; fix YAML quoting/indentation * fix: address PR #9 review comments - import ordering and code fixes - Fix import ordering across multiple files (external before local, alphabetical) - Fix re-exports in coldvox-vad lib.rs (grouped and sorted) - Fix injection_shutdown_tx mutability in main.rs - Restore injection_rx variable in main.rs - Fix PipelineMetrics type conflict - Add missing trait imports in test files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * fix: address linter warnings in text-injection crate - Prefix unused variables with underscore (_duration, _config, _previous_clipboard) - Add #[allow(dead_code)] to unused helper methods that may be used in future - Remove needless return statement in focus.rs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent 39d7210 commit e4d203a

File tree

139 files changed

+3265
-11668
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

139 files changed

+3265
-11668
lines changed

.github/copilot-instructions.md

Lines changed: 102 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,100 @@
11
# ColdVox – AI workspace instructions
22

3-
Use these notes to help AI agents work productively in this Rust repo. Main crate: `crates/app`. A vendored VAD library lives in `Forks/ColdVox-voice_activity_detector` (integrated via Silero V5).
4-
5-
## Architecture
6-
- `foundation/` (app scaffolding)
7-
- `state.rs`: `AppState` + `StateManager` with validated transitions.
8-
- `shutdown.rs`: Ctrl+C handler + panic hook via `ShutdownHandler`/`ShutdownGuard`.
9-
- `health.rs`: `HealthMonitor` with periodic checks (none registered yet).
10-
- `error.rs`: `AppError`/`AudioError`, `AudioConfig { silence_threshold }`, `recovery_strategy()` hints.
11-
- `audio/` (capture pipeline)
12-
- `device.rs`: CPAL host/device discovery; prefers 16 kHz mono when available.
13-
- `ring_buffer.rs`: rtrb SPSC ring buffer for i16 samples (producer/consumer split).
14-
- `capture.rs`: builds CPAL input stream; writes samples into the rtrb ring buffer (non-blocking, drop-on-full).
15-
- `watchdog.rs`: 5s no-data watchdog; `is_triggered()` used to drive recovery.
16-
- `detector.rs`: RMS-based silence detection using `AudioConfig.silence_threshold`.
17-
- `chunker.rs`: Converts variable-sized frames to fixed 512-sample chunks for VAD.
18-
- `vad_processor.rs`: VAD processing pipeline with broadcast channel distribution.
19-
- `vad/` (voice activity detection)
20-
- `silero_wrapper.rs`: Silero V5 model integration via ONNX runtime.
21-
- `processor.rs`: VAD state machine and event generation.
22-
- `config.rs`: Unified VAD configuration (Silero mode is default).
23-
- `stt/` (speech-to-text - behind the `vosk` feature)
24-
- `mod.rs`: TranscriptionEvent, WordInfo, TranscriptionConfig.
25-
- `processor.rs`: STT processor gated by VAD events; emits TranscriptionEvent.
26-
- `vosk.rs`: VoskTranscriber implementation (requires libvosk system library).
27-
- `persistence.rs`: Optional persistence of transcripts/audio.
28-
- `telemetry/`: in-process counters/gauges (`PipelineMetrics`).
29-
- Binaries: `src/main.rs` (app, STT when built with `--features vosk`), `bin/mic_probe.rs`, `bin/foundation_probe.rs`, `bin/tui_dashboard.rs`.
3+
Use these notes to help AI agents work effectively in this Rust workspace. Main application crate: `crates/app` (package `coldvox-app`). Core subsystems live in split crates and are re-exported by the app where convenient.
4+
5+
## Architecture (multi-crate)
6+
7+
- `crates/coldvox-foundation/` — App scaffolding
8+
- `state.rs`: `AppState` + `StateManager` with validated transitions
9+
- `shutdown.rs`: Ctrl+C handler + panic hook (`ShutdownHandler`/`ShutdownGuard`)
10+
- `health.rs`: `HealthMonitor`
11+
- `error.rs`: `AppError`/`AudioError`, `AudioConfig { silence_threshold }`
12+
13+
- `crates/coldvox-audio/` — Capture & chunking pipeline
14+
- `device.rs`: CPAL host/device discovery; PipeWire-aware candidates
15+
- `capture.rs`: `AudioCaptureThread::spawn(...)` input stream, watchdog, silence detection
16+
- `ring_buffer.rs`: `AudioRingBuffer` (rtrb SPSC for i16 samples)
17+
- `frame_reader.rs`: `FrameReader` to normalize device frames
18+
- `chunker.rs`: `AudioChunker` → fixed 512-sample frames (32 ms at 16 kHz)
19+
- `watchdog.rs`: 5s no-data watchdog used for auto-recovery
20+
- `detector.rs`: RMS-based `SilenceDetector` using `AudioConfig.silence_threshold`
21+
22+
- `crates/coldvox-vad/` — VAD traits, config, Level3 energy VAD (feature `level3`)
23+
- `config.rs`: `UnifiedVadConfig`, `VadMode`
24+
- `engine.rs`, `types.rs`, `constants.rs`, `VadProcessor` trait
25+
26+
- `crates/coldvox-vad-silero/` — Silero V5 ONNX VAD (feature `silero`)
27+
- `silero_wrapper.rs`: `SileroEngine` implementing `VadEngine`
28+
- Uses the external `voice_activity_detector` crate (Silero V5 backend)
29+
30+
- `crates/coldvox-stt/` — STT core abstractions
31+
32+
- `crates/coldvox-stt-vosk/` — Vosk integration (feature `vosk`)
33+
34+
- `crates/coldvox-telemetry/` — In-process metrics (`PipelineMetrics`, `FpsTracker`)
35+
36+
- `crates/coldvox-text-injection/` — Text injection backends (feature-gated)
37+
38+
- `crates/app/` — App glue, UI, re-exports
39+
- `src/audio/`:
40+
- `vad_adapter.rs`: Bridges `UnifiedVadConfig` to a concrete `VadEngine` (Silero or Level3)
41+
- `vad_processor.rs`: Async VAD pipeline task publishing `VadEvent`s
42+
- `mod.rs`: Re-exports from `coldvox-audio`
43+
- `src/vad/mod.rs`: Re-exports VAD types from `coldvox-vad` and `coldvox-vad-silero`
44+
- `src/stt/`: Processor/persistence wrappers and re-exports for Vosk
45+
- Binaries: `src/main.rs` (app), `src/bin/tui_dashboard.rs`, probes under `src/probes/`
3046

3147
## Build, run, debug
32-
- From `crates/app`:
33-
- App (basic): `cargo run`
34-
- App (with STT): `cargo run --features vosk` (requires libvosk system library)
48+
49+
- From `crates/app` (package `coldvox-app`):
50+
- App: `cargo run`
51+
- App + STT (Vosk): `cargo run --features vosk` (requires system libvosk and a model)
3552
- TUI Dashboard:
36-
- Without STT: `cargo run --bin tui_dashboard`
53+
- No STT: `cargo run --bin tui_dashboard`
3754
- With STT: `cargo run --features vosk --bin tui_dashboard`
3855
- Device selection: append `-- -D "<device name>"`
39-
- Probes:
56+
- Probes (examples live at repo root under `examples/`, wired via Cargo metadata):
4057
- `cargo run --bin mic_probe -- --duration 30 --device "<name>" --silence_threshold 120`
4158
- `cargo run --bin foundation_probe -- --duration 30 --simulate_errors --simulate_panics`
4259
- Release: `cargo build --release` or `cargo build --release --features vosk`
4360
- Logging: `tracing` with `RUST_LOG` or `--log-level` in TUI; daily-rotated file at `logs/coldvox.log`.
4461
- App: logs to stderr and file.
4562
- TUI Dashboard: logs to file only (to avoid corrupting the TUI). Default level is `debug`; override with `--log-level <level>`.
46-
- Tests: unit tests in source modules; VAD crate has extensive tests; run from its folder with optional `--features async`.
63+
- Tests: unit tests in source modules; integration tests under `crates/app/tests/`; VAD crates include unit tests.
4764

4865
## Audio data flow and contracts
49-
- Callback thread (CPAL) → i16 samples → rtrb ring buffer (SPSC) → FrameReader → AudioChunker → broadcast channel.
50-
- AudioChunker output: 512-sample frames (32ms) distributed via broadcast to VAD and STT processors.
51-
- VAD processing: Silero V5 model evaluates speech probability, generates VadEvent stream.
52-
- STT processing: Gated by VAD events, transcribes speech segments when detected (requires vosk feature).
53-
- TUI: when STT is enabled and a model is present, partial/final transcripts are logged; the Status panel shows the last final transcript.
54-
- Backpressure: if the consumer is slow, ring writes drop when full (warn logged); keep a reader draining via `FrameReader`.
55-
- Preferred format: 16 kHz mono if supported; otherwise first supported config with automatic conversion.
56-
- Watchdog: feed on each callback; after ~5s inactivity, `is_triggered()` becomes true; `AudioCapture::recover()` attempts up to 3 restarts.
57-
- Silence: RMS-based; >3s continuous silence logs a warning (hinting device issues).
66+
- CPAL callback → i16 samples → `AudioRingBuffer` (SPSC) → `FrameReader``AudioChunker` → broadcast channel
67+
- Chunker output: 512-sample frames (32 ms) at 16 kHz to VAD/STT subscribers
68+
- VAD: Silero V5 (default) or Level3 energy engine generates `VadEvent`s
69+
- STT: Gated by VAD events; transcribes segments when speech is active (feature `vosk`)
70+
- TUI: when STT is enabled and a model is present, partial/final transcripts are logged; Status shows last final transcript
71+
- Backpressure: if the consumer is slow, writes drop when full (warn logged); keep a reader draining via `FrameReader`
72+
- Preferred device format: choose 16 kHz mono when available; otherwise select best supported config and convert downstream
73+
- Watchdog: 5s no-data triggers restart logic in capture thread
74+
- Silence: RMS-based detector; >3s continuous silence logs a warning
5875

5976
## Tuning knobs (where to tweak)
6077

61-
- Chunker (`audio/chunker.rs``ChunkerConfig`)
62-
- `frame_size_samples` (default 512): output frame size; matches VAD window.
63-
- `sample_rate_hz` (default 16000): target internal rate.
64-
- `resampler_quality`: `Fast` | `Balanced` (default) | `Quality`.
65-
66-
- VAD (`vad/config.rs`, `vad/types.rs`)
67-
- Mode: `UnifiedVadConfig.mode``Silero` (default) | `Level3`.
68-
- Silero (`SileroConfig`)
69-
- `threshold` (default 0.3): speech probability cutoff.
70-
- `min_speech_duration_ms` (default 250): min speech length before start.
71-
- `min_silence_duration_ms` (default 100): min silence before end.
72-
- `window_size_samples` (default 512): analysis window; aligns with chunker.
73-
- Level3 energy VAD (`Level3Config`) [disabled by default]
74-
- `enabled` (default false): toggle fallback engine.
75-
- `onset_threshold_db` (default 9.0 over floor).
76-
- `offset_threshold_db` (default 6.0 over floor).
77-
- `ema_alpha` (default 0.02): noise floor smoothing.
78-
- `speech_debounce_ms` (default 200): frames to confirm start.
79-
- `silence_debounce_ms` (default 400): frames to confirm end.
80-
- `initial_floor_db` (default -50.0): starting noise floor.
81-
- Frame basics
82-
- `UnifiedVadConfig.frame_size_samples` (default 512) and `sample_rate_hz` (default 16000) control window duration.
83-
84-
- STT (`stt/mod.rs`, `stt/processor.rs`, `stt/vosk.rs`) [feature `vosk`]
85-
- `TranscriptionConfig`
86-
- `enabled` (bool): gate STT.
87-
- `model_path` (string): defaults via `VOSK_MODEL_PATH` or `models/vosk-model-small-en-us-0.15`.
88-
- `partial_results` (bool, default true): emit interim text.
89-
- `max_alternatives` (u32, default 1): candidate count.
90-
- `include_words` (bool, default false): word timings/confidence.
91-
- `buffer_size_ms` (u32, default 512): STT chunk size fed to Vosk.
92-
93-
- Text Injection (`text_injection/session.rs`, `text_injection/processor.rs`)
94-
- `SessionConfig`
95-
- `silence_timeout_ms` (default 1500): finalize after silence.
96-
- `buffer_pause_timeout_ms` (default 500): pause boundary between chunks.
97-
- `max_buffer_size` (default 5000 chars): cap transcript buffer.
98-
- `InjectionProcessorConfig`
99-
- `poll_interval_ms` (in code via comments, default 100ms).
100-
101-
- Audio foundation (`foundation/error.rs`)
102-
- `AudioConfig.silence_threshold` (default 100): RMS-based silence detector threshold.
78+
- Chunker (`crates/coldvox-audio/src/chunker.rs``ChunkerConfig`)
79+
- `frame_size_samples` (default 512), `sample_rate_hz` (default 16000)
80+
- `resampler_quality`: `Fast` | `Balanced` (default) | `Quality`
81+
82+
- VAD (`crates/coldvox-vad/src/config.rs`)
83+
- `UnifiedVadConfig.mode``Silero` (default) | `Level3`
84+
- Silero (`crates/coldvox-vad-silero/src/config.rs`)
85+
- `threshold` (default 0.3), `min_speech_duration_ms` (250), `min_silence_duration_ms` (100), `window_size_samples` (512)
86+
- Level3 (`feature = "level3"`, disabled by default)
87+
- `onset_threshold_db` (9.0), `offset_threshold_db` (6.0), `ema_alpha` (0.02)
88+
- `speech_debounce_ms` (200), `silence_debounce_ms` (400), `initial_floor_db` (-50.0)
89+
90+
- STT (`crates/app/src/stt/` wrappers; core types in `crates/coldvox-stt/`) [feature `vosk`]
91+
- `TranscriptionConfig`: `model_path`, `partial_results`, `max_alternatives`, `include_words`, `buffer_size_ms`
92+
93+
- Text Injection (`crates/coldvox-text-injection/`; app glue in `crates/app/src/text_injection/`)
94+
- `SessionConfig`, injector backends via features: `text-injection-*`
95+
96+
- Foundation (`crates/coldvox-foundation/src/error.rs`)
97+
- `AudioConfig.silence_threshold` (default 100)
10398

10499
## Logging for tuning
105100

@@ -114,23 +109,26 @@ Use these notes to help AI agents work productively in this Rust repo. Main crat
114109
- Logs to stderr and daily-rotated `logs/coldvox.log`.
115110

116111
## Usage patterns
117-
- Start capture: `AudioCaptureThread::spawn(config, ring_producer, device)`.
118-
- Create pipeline: `FrameReader``AudioChunker` → broadcast channel → VAD/STT processors.
119-
- VAD integration: `VadProcessor::spawn(config, audio_rx, event_tx, metrics)`.
120-
- STT integration: `SttProcessor::new(audio_rx, vad_rx, transcription_tx, config)` (requires vosk feature).
121-
- Metrics: pass `Arc<PipelineMetrics>` to all components for unified telemetry.
122-
- Enumerate devices: `DeviceManager::new()?.enumerate_devices()`; marks default device.
123-
124-
## VAD system (fully integrated)
125-
- `Forks/ColdVox-voice_activity_detector`: Silero V5 via ONNX Runtime. 16 kHz expects 512-sample windows per prediction.
126-
- Runtime binaries provided under `runtimes/` for major platforms; see its `README.md` for usage and feature flags (`async`, `load-dynamic`).
127-
- Integration: `vad/silero_wrapper.rs` provides `SileroEngine` implementation.
128-
- State machine: VAD events (SpeechStart, SpeechEnd) generated with configurable thresholds and debouncing.
129-
- Fallback: Energy-based VAD available as alternative (currently disabled by default).
130-
131-
## STT system (feature-gated, available when enabled)
132-
- Vosk-based transcription via `stt/vosk.rs` (requires libvosk system library).
133-
- Gated by VAD: transcribes during detected speech segments.
134-
- Event-driven: emits `TranscriptionEvent::{Partial,Final,Error}` via mpsc channel.
135-
- Configuration: model path via `VOSK_MODEL_PATH` env var; defaults to `models/vosk-model-small-en-us-0.15` if unset.
136-
- Build: enable with `--features vosk`. If the model path exists, STT runs; otherwise STT stays disabled.
112+
- Start capture (coldvox-audio):
113+
- `(capture, device_cfg, cfg_rx) = AudioCaptureThread::spawn(audio_cfg, ring_producer, device_name_opt)?`
114+
- Stop: `capture.stop()`
115+
- Create pipeline:
116+
- `FrameReader` (from consumer) → `AudioChunker``broadcast::Sender<AudioFrame>`
117+
- VAD (app glue): `VadProcessor::spawn(vad_cfg, audio_rx, event_tx, Some(metrics))?`
118+
- STT (feature `vosk`): construct processor under `crates/app/src/stt/processor.rs`
119+
- Metrics: use `Arc<PipelineMetrics>` across components
120+
- Devices: `DeviceManager::new()?.enumerate_devices()`; `candidate_device_names()` prefers PipeWire → default → others
121+
122+
## VAD system
123+
- Silero V5 via `crates/coldvox-vad-silero/` (feature `silero`, default enabled in app)
124+
- Depends on external `voice_activity_detector` crate for ONNX runtime integration
125+
- 16 kHz, 512-sample windows per prediction
126+
- Events: `VadEvent::{SpeechStart, SpeechEnd}` with debouncing and thresholds
127+
- Fallback: Level3 energy VAD available (feature `level3`, disabled by default)
128+
129+
## STT system (feature-gated)
130+
- Vosk-based transcription via `crates/coldvox-stt-vosk/` (re-exported in `crates/app/src/stt/vosk.rs`)
131+
- Gated by VAD: transcribes during detected speech segments
132+
- Events: `TranscriptionEvent::{Partial, Final, Error}` via mpsc
133+
- Model path via `VOSK_MODEL_PATH` or default `models/vosk-model-small-en-us-0.15`
134+
- Enable with `--features vosk`; if model path is missing, STT stays disabled

.github/dependabot.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: "github-actions"
4+
directory: "/"
5+
schedule:
6+
interval: "weekly"
7+
commit-message:
8+
prefix: "chore(deps)"
9+
labels:
10+
- "dependencies"
11+
- "github-actions"
12+
# Group updates to reduce PR noise
13+
groups:
14+
actions:
15+
patterns:
16+
- "*"
17+
18+
- package-ecosystem: "cargo"
19+
directory: "/"
20+
schedule:
21+
interval: "weekly"
22+
commit-message:
23+
prefix: "chore(deps)"
24+
labels:
25+
- "dependencies"
26+
- "rust"
27+
# Don't update workspace members
28+
ignore:
29+
- dependency-name: "coldvox-*"

.github/workflows/benchmarks.yml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
name: Benchmarks
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
types: [opened, synchronize]
8+
paths:
9+
- 'crates/**'
10+
- 'Cargo.toml'
11+
- 'Cargo.lock'
12+
- '.github/workflows/benchmarks.yml'
13+
14+
permissions:
15+
contents: read
16+
pull-requests: read
17+
18+
concurrency:
19+
group: benchmarks-${{ github.ref }}
20+
cancel-in-progress: true
21+
22+
jobs:
23+
benchmark:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7
27+
- uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1
28+
with:
29+
toolchain: stable
30+
- uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3
31+
32+
- name: Install system dependencies
33+
run: |
34+
sudo apt-get update
35+
sudo apt-get install -y libasound2-dev
36+
37+
- name: Run benchmarks
38+
run: |
39+
# Run specific benchmark with minimal features to avoid compilation issues
40+
cargo bench --bench text_chunking_bench --no-default-features --features silero -- --output-format bencher | tee output.txt
41+
42+
- name: Store benchmark result
43+
uses: benchmark-action/github-action-benchmark@4de1bed97a47495fc4c5404952da0499e31f5c29 # v1.20.3
44+
with:
45+
name: 'Benchmark'
46+
tool: 'cargo'
47+
output-file-path: output.txt
48+
github-token: ${{ secrets.GITHUB_TOKEN }}
49+
auto-push: false
50+
comment-on-alert: true
51+
alert-threshold: '120%'
52+
fail-on-alert: false
53+
gh-pages-branch: gh-pages
54+
benchmark-data-dir-path: dev/bench
55+
skip-fetch-gh-pages: false
56+
comment-always: false
57+
summary-always: false
58+
save-data-file: true

0 commit comments

Comments
 (0)