Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 0 additions & 118 deletions .kiro/specs/voice-pipeline-core/requirements.md

This file was deleted.

53 changes: 25 additions & 28 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ Multi-crate Cargo workspace:
### Building

```bash
# Main app with default features (Silero VAD + text injection, no STT by default)
# Main app with default features (Silero VAD + Vosk STT + text injection)
cargo build

# With Vosk STT
Expand All @@ -94,29 +94,24 @@ cargo build --release --features vosk,text-injection
### Running

```bash
# Main application (default features)
# Main application (uses config/default.toml + env overrides)
cargo run

# With specific device
cargo run -- --device "USB Microphone"
# Override input device for a single launch
COLDVOX_DEVICE="USB Microphone" cargo run

# With Vosk STT (for actual voice dictation)
# Ensure Vosk STT + text injection are compiled (defaults already include these)
cargo run --features vosk,text-injection

# With specific device and STT
cargo run --features vosk,text-injection -- --device "HyperX QuadCast"
# TUI Dashboard (shared runtime; keyboard shortcuts S/A/R/Q)
cargo run --bin tui_dashboard

# TUI Dashboard (shared runtime)
cargo run --bin tui_dashboard # S=Start, A=Toggle VAD/PTT, R=Reset, Q=Quit
# Optional explicit device or extra logging
cargo run --bin tui_dashboard -- --device "USB Microphone" --log-level "info,stt=debug,coldvox_audio=debug"

# Mic probe utility
# Mic probe utility (duration in seconds)
cargo run --bin mic_probe -- --duration 30

# Examples (must include required features)
# Examples (enable required features explicitly)
cargo run --example foundation_probe
cargo run --example record_10s
cargo run --example record_10s --features examples
cargo run --example vosk_test --features vosk,examples
cargo run --example inject_demo --features text-injection
cargo run --example test_silero_wav --features examples
Expand Down Expand Up @@ -185,30 +180,32 @@ Platform-specific text injection backends are automatically enabled at build tim
- **Events**: `TranscriptionEvent::{Partial, Final, Error}`

### Text Injection
- **Direct insertion**: AT-SPI (accessibility API for text insertion)
- **Composite strategy**: ClipboardPaste (sets clipboard + triggers paste via AT-SPI/ydotool)
- Note: There is no "clipboard-only" injector - setting clipboard without pasting is useless for automation
- ClipboardPaste is ONE strategy that: saves clipboard → sets new text → pastes via AT-SPI or ydotool → restores clipboard
- **Optional backends**: ydotool (Wayland), kdotool (X11), enigo (cross-platform)
- **Strategy management**: Runtime selection with per-app success caching and fallback chains
- **Clipboard preservation**: Clipboard-based strategies automatically save/restore user clipboard (default 500ms delay)
- **Direct insertion**: AT-SPI injector exists but current `FocusTracker` path returns `FocusStatus::Unknown` until the AT-SPI API regression is resolved (`focus.rs` short-circuits the call).
- **Composite strategy**: `ClipboardPasteInjector` sets the clipboard then triggers paste (AT-SPI first, ydotool fallback) and schedules clipboard restoration.
- **Optional backends**: ydotool (Wayland), kdotool (X11), enigo (cross-platform); enable per-target platform features.
- **Strategy management**: `StrategyManager` keeps per-app success metrics and reorders fallbacks accordingly.
- **Clipboard preservation**: Clipboard-based strategies restore prior clipboard contents after `clipboard_restore_delay_ms` (defaults to 500 ms).

## Configuration

- Primary defaults live in `config/default.toml`; `Settings::new()` loads this file and applies env overrides (`COLDVOX_*` with `__` for nested keys).
- `config/overrides.toml` is not loaded automatically; extend `Settings` construction if layered configs are required.

### Audio Pipeline
- Target: 16 kHz, 16-bit i16, mono
- Frame size: 512 samples (32 ms)
- Resampler quality: Fast/Balanced/Quality

### VAD Config
- Silero threshold: 0.3
- Min speech duration: 250ms
- Min silence duration: 100ms
- Silero threshold: 0.1
- Min speech duration: 100 ms
- Min silence duration: 500 ms (documented rationale in code/docs)
- Window size: 512 samples @ 16 kHz

### Logging
- Main app: stderr + `logs/coldvox.log` (daily rotation)
- TUI: file-only to `logs/coldvox.log` (avoids display corruption)
- Control: `RUST_LOG` environment variable or `--log-level` flag
- Main app: stderr + daily rotated `logs/coldvox.log` via `tracing-appender`
- TUI: logs to file to avoid terminal conflicts
- Control: set `RUST_LOG=info,stt=debug` (no dedicated CLI flag in current branch)

## Platform Detection

Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@ More detail: See [`CLAUDE.md`](CLAUDE.md) for full developer guide.

**Workaround for other devices**: Manually edit the device name in the probe source code if you need to test with a different microphone during this transition period.

### Documentation Review (Pending)
- [ ] Recent text injection changes consolidated paste behavior. A docs/diagram sweep is pending to reflect:
- Clipboard-only injector is internal-only.
- Single paste path (Clipboard+Paste with AT‑SPI→ydotool fallback) is last in order.
- Updated diagrams exported in `diagrams/`.

## Slow / Environment-Sensitive Tests
Some end‑to‑end tests exercise real injection & STT. Gate them locally by setting an env variable (planned):
```bash
Expand Down
23 changes: 23 additions & 0 deletions agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Agents Guide

This branch (`anchor/oct-06-2025`) introduces a large documentation/configuration refactor but leaves many tracked issues unresolved. Use the notes below when collaborating with additional agents.

## Branch State Overview

- **Configuration:** Runtime settings now come from `config/default.toml`, layered with `COLDVOX_*` environment overrides. CLI flags have been reduced to `--list-devices`, `--tui`, and `--injection-fail-fast`.
- **Text Injection:** `StrategyManager` gained clipboard restoration logic, but AT-SPI focus detection currently returns `FocusStatus::Unknown` (`crates/coldvox-text-injection/src/focus.rs`). Clipboard restoration tests only assert behaviour when `wl_clipboard` is enabled.
- **Audio/STT:** No functional changes landed for callback allocations or STT pipeline improvements—most code matches `main`.
- **Documentation:** Several docs authored in this branch contained optimistic claims; updated copies in `docs/` now highlight the real status.

## Critical Caveats

1. **AT-SPI regression:** Restore accurate focus detection before shipping; current logic short-circuits to `Unknown`.
2. **Testing gaps:** Only `cargo test -p coldvox-text-injection -- --list` has been re-run. Workspace builds/tests will still fail locally without system ALSA headers.
3. **GUI features:** GuiBridge remains a stub (state toggles only); GUI integration issues (#58-#60, #62) stay open.

## Recommended Next Steps

- Revert or fix the AT-SPI focus tracker regression and add coverage that runs without Wayland-specific features.
- Add CI jobs that exercise clipboard restoration with `wl_clipboard` enabled, or provide mocks so the tests assert on all platforms.
- Reconcile CLI documentation with the new configuration approach (see `docs/user/runflags.md`) and ensure future docs avoid aspirational language.
- Re-evaluate outstanding issues (#100, #63, #36, #40, #38, #58-#62, STT backlog) before attempting to close anything in PR #121.
4 changes: 2 additions & 2 deletions crates/app/src/stt/tests/end_to_end_wav.rs
Original file line number Diff line number Diff line change
Expand Up @@ -987,11 +987,11 @@ async fn test_clipboard_injection() {
#[cfg(feature = "text-injection")]
{
use crate::text_injection::{
clipboard_injector::ClipboardInjector, InjectionConfig, TextInjector,
clipboard_paste_injector::ClipboardPasteInjector, InjectionConfig, TextInjector,
};

let config = InjectionConfig::default();
let injector = ClipboardInjector::new(config);
let injector = ClipboardPasteInjector::new(config);

// Check availability
if !injector.is_available().await {
Expand Down
22 changes: 12 additions & 10 deletions crates/app/tests/integration/mock_injection_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -249,18 +249,20 @@ mod mock_injection_tests {
// Get the method order to verify AT-SPI is tried first
let methods = manager.get_method_order_uncached();

// Should include AT-SPI methods when available
let has_atspi = methods.iter().any(|m| {
matches!(m, coldvox_text_injection::types::InjectionMethod::AtspiInsert |
coldvox_text_injection::types::InjectionMethod::AtspiPaste)
});

let has_ydotool = methods.iter().any(|m| {
matches!(m, coldvox_text_injection::types::InjectionMethod::Ydotool)
// Should include AT-SPI insert and the single ClipboardPasteFallback method
let has_atspi = methods
.iter()
.any(|m| matches!(m, coldvox_text_injection::types::InjectionMethod::AtspiInsert));

let has_clipboard_paste = methods.iter().any(|m| {
matches!(
m,
coldvox_text_injection::types::InjectionMethod::ClipboardPasteFallback
)
});

println!("Available methods: {:?}", methods);
assert!(has_ydotool, "Should include ydotool method");
assert!(has_clipboard_paste, "Should include ClipboardPasteFallback method");

// AT-SPI might not be available in test environment, but ydotool should be
if has_atspi {
Expand All @@ -269,7 +271,7 @@ mod mock_injection_tests {
println!("⚠️ AT-SPI not available (expected in headless environment)");
}

assert!(has_ydotool, "Should have ydotool as fallback method");
assert!(has_clipboard_paste, "Should have ClipboardPasteFallback as fallback method");
}

#[tokio::test]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ mod tests {
assert!(metrics_guard.attempts > 0, "Should attempt injection");

// The specific number of attempts depends on available backends
// but should be at least the base methods (AtspiInsert, ClipboardPaste)
// but should be at least the base methods (AtspiInsert, ClipboardPasteFallback)
assert!(metrics_guard.attempts >= 2, "Should try at least 2 methods");
}

Expand Down
15 changes: 7 additions & 8 deletions crates/coldvox-text-injection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This crate provides text injection capabilities that automatically type transcri
- Automatically saves and restores user's clipboard after configurable delay (`clipboard_restore_delay_ms`, default 500ms)
- **Critical**: This is ONE unified strategy, not separate "clipboard" and "paste" methods
- **Requires**: Either AT-SPI paste support OR ydotool installed to actually trigger the paste
- **YDotool**: Direct uinput-based key simulation (opt-in, useful when AT-SPI unavailable)
- **Ydotool (fallback only)**: Used internally by ClipboardPaste to issue Ctrl+V when AT-SPI paste isn't available; not registered as a standalone strategy
- **KDotool Assist**: KDE/X11 window activation assistance (opt-in)
- **Enigo**: Cross-platform input simulation library (opt-in)

Expand Down Expand Up @@ -62,12 +62,11 @@ This crate provides text injection capabilities that automatically type transcri
The system tries backends in this order (skips unavailable methods):

1. **AT-SPI Insert** - Direct text insertion via accessibility API (most reliable when supported)
2. **ClipboardPaste** - Composite strategy: set clipboard → paste via AT-SPI or ydotool
- Only registered if AT-SPI paste actions OR ydotool available
- Fails if neither paste mechanism works
3. **YDotool** - Direct uinput key simulation (opt-in, requires ydotool daemon)
4. **KDotool Assist** - Window activation help (opt-in, X11 only)
5. **Enigo** - Cross-platform input simulation (opt-in)
2. **ClipboardPaste** - Composite strategy: set clipboard → paste via AT-SPI or ydotool (fallback)
- Only registered if at least one paste mechanism works
- Fails if neither paste mechanism works
3. **KDotool Assist** - Window activation help (opt-in, X11 only)
4. **Enigo** - Cross-platform input simulation (opt-in)

**Note**: There is NO "clipboard-only" backend. Setting clipboard without triggering paste is useless for automation.

Expand Down Expand Up @@ -99,7 +98,7 @@ sudo apt install libxtst-dev wmctrl
# For clipboard functionality
sudo apt install xclip wl-clipboard

# For ydotool-based paste (optional)
# For ydotool-based paste fallback (optional)
sudo apt install ydotool
```

Expand Down
Loading
Loading