diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index d0320515..78cd2323 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -1,105 +1,100 @@ # ColdVox – AI workspace instructions -Use these notes to help AI agents work productively in this Rust repo. Main crate: `crates/app`. A vendored VAD library lives in `Forks/ColdVox-voice_activity_detector` (integrated via Silero V5). - -## Architecture -- `foundation/` (app scaffolding) - - `state.rs`: `AppState` + `StateManager` with validated transitions. - - `shutdown.rs`: Ctrl+C handler + panic hook via `ShutdownHandler`/`ShutdownGuard`. - - `health.rs`: `HealthMonitor` with periodic checks (none registered yet). - - `error.rs`: `AppError`/`AudioError`, `AudioConfig { silence_threshold }`, `recovery_strategy()` hints. -- `audio/` (capture pipeline) - - `device.rs`: CPAL host/device discovery; prefers 16 kHz mono when available. - - `ring_buffer.rs`: rtrb SPSC ring buffer for i16 samples (producer/consumer split). - - `capture.rs`: builds CPAL input stream; writes samples into the rtrb ring buffer (non-blocking, drop-on-full). - - `watchdog.rs`: 5s no-data watchdog; `is_triggered()` used to drive recovery. - - `detector.rs`: RMS-based silence detection using `AudioConfig.silence_threshold`. - - `chunker.rs`: Converts variable-sized frames to fixed 512-sample chunks for VAD. - - `vad_processor.rs`: VAD processing pipeline with broadcast channel distribution. -- `vad/` (voice activity detection) - - `silero_wrapper.rs`: Silero V5 model integration via ONNX runtime. - - `processor.rs`: VAD state machine and event generation. - - `config.rs`: Unified VAD configuration (Silero mode is default). -- `stt/` (speech-to-text - behind the `vosk` feature) - - `mod.rs`: TranscriptionEvent, WordInfo, TranscriptionConfig. - - `processor.rs`: STT processor gated by VAD events; emits TranscriptionEvent. - - `vosk.rs`: VoskTranscriber implementation (requires libvosk system library). - - `persistence.rs`: Optional persistence of transcripts/audio. -- `telemetry/`: in-process counters/gauges (`PipelineMetrics`). -- Binaries: `src/main.rs` (app, STT when built with `--features vosk`), `bin/mic_probe.rs`, `bin/foundation_probe.rs`, `bin/tui_dashboard.rs`. +Use these notes to help AI agents work effectively in this Rust workspace. Main application crate: `crates/app` (package `coldvox-app`). Core subsystems live in split crates and are re-exported by the app where convenient. + +## Architecture (multi-crate) + +- `crates/coldvox-foundation/` — App scaffolding + - `state.rs`: `AppState` + `StateManager` with validated transitions + - `shutdown.rs`: Ctrl+C handler + panic hook (`ShutdownHandler`/`ShutdownGuard`) + - `health.rs`: `HealthMonitor` + - `error.rs`: `AppError`/`AudioError`, `AudioConfig { silence_threshold }` + +- `crates/coldvox-audio/` — Capture & chunking pipeline + - `device.rs`: CPAL host/device discovery; PipeWire-aware candidates + - `capture.rs`: `AudioCaptureThread::spawn(...)` input stream, watchdog, silence detection + - `ring_buffer.rs`: `AudioRingBuffer` (rtrb SPSC for i16 samples) + - `frame_reader.rs`: `FrameReader` to normalize device frames + - `chunker.rs`: `AudioChunker` → fixed 512-sample frames (32 ms at 16 kHz) + - `watchdog.rs`: 5s no-data watchdog used for auto-recovery + - `detector.rs`: RMS-based `SilenceDetector` using `AudioConfig.silence_threshold` + +- `crates/coldvox-vad/` — VAD traits, config, Level3 energy VAD (feature `level3`) + - `config.rs`: `UnifiedVadConfig`, `VadMode` + - `engine.rs`, `types.rs`, `constants.rs`, `VadProcessor` trait + +- `crates/coldvox-vad-silero/` — Silero V5 ONNX VAD (feature `silero`) + - `silero_wrapper.rs`: `SileroEngine` implementing `VadEngine` + - Uses the external `voice_activity_detector` crate (Silero V5 backend) + +- `crates/coldvox-stt/` — STT core abstractions + +- `crates/coldvox-stt-vosk/` — Vosk integration (feature `vosk`) + +- `crates/coldvox-telemetry/` — In-process metrics (`PipelineMetrics`, `FpsTracker`) + +- `crates/coldvox-text-injection/` — Text injection backends (feature-gated) + +- `crates/app/` — App glue, UI, re-exports + - `src/audio/`: + - `vad_adapter.rs`: Bridges `UnifiedVadConfig` to a concrete `VadEngine` (Silero or Level3) + - `vad_processor.rs`: Async VAD pipeline task publishing `VadEvent`s + - `mod.rs`: Re-exports from `coldvox-audio` + - `src/vad/mod.rs`: Re-exports VAD types from `coldvox-vad` and `coldvox-vad-silero` + - `src/stt/`: Processor/persistence wrappers and re-exports for Vosk + - Binaries: `src/main.rs` (app), `src/bin/tui_dashboard.rs`, probes under `src/probes/` ## Build, run, debug -- From `crates/app`: - - App (basic): `cargo run` - - App (with STT): `cargo run --features vosk` (requires libvosk system library) + +- From `crates/app` (package `coldvox-app`): + - App: `cargo run` + - App + STT (Vosk): `cargo run --features vosk` (requires system libvosk and a model) - TUI Dashboard: - - Without STT: `cargo run --bin tui_dashboard` + - No STT: `cargo run --bin tui_dashboard` - With STT: `cargo run --features vosk --bin tui_dashboard` - Device selection: append `-- -D ""` - - Probes: + - Probes (examples live at repo root under `examples/`, wired via Cargo metadata): - `cargo run --bin mic_probe -- --duration 30 --device "" --silence_threshold 120` - `cargo run --bin foundation_probe -- --duration 30 --simulate_errors --simulate_panics` - Release: `cargo build --release` or `cargo build --release --features vosk` - Logging: `tracing` with `RUST_LOG` or `--log-level` in TUI; daily-rotated file at `logs/coldvox.log`. - App: logs to stderr and file. - TUI Dashboard: logs to file only (to avoid corrupting the TUI). Default level is `debug`; override with `--log-level `. -- Tests: unit tests in source modules; VAD crate has extensive tests; run from its folder with optional `--features async`. +- Tests: unit tests in source modules; integration tests under `crates/app/tests/`; VAD crates include unit tests. ## Audio data flow and contracts -- Callback thread (CPAL) → i16 samples → rtrb ring buffer (SPSC) → FrameReader → AudioChunker → broadcast channel. -- AudioChunker output: 512-sample frames (32ms) distributed via broadcast to VAD and STT processors. -- VAD processing: Silero V5 model evaluates speech probability, generates VadEvent stream. -- STT processing: Gated by VAD events, transcribes speech segments when detected (requires vosk feature). - - TUI: when STT is enabled and a model is present, partial/final transcripts are logged; the Status panel shows the last final transcript. -- Backpressure: if the consumer is slow, ring writes drop when full (warn logged); keep a reader draining via `FrameReader`. -- Preferred format: 16 kHz mono if supported; otherwise first supported config with automatic conversion. -- Watchdog: feed on each callback; after ~5s inactivity, `is_triggered()` becomes true; `AudioCapture::recover()` attempts up to 3 restarts. -- Silence: RMS-based; >3s continuous silence logs a warning (hinting device issues). +- CPAL callback → i16 samples → `AudioRingBuffer` (SPSC) → `FrameReader` → `AudioChunker` → broadcast channel +- Chunker output: 512-sample frames (32 ms) at 16 kHz to VAD/STT subscribers +- VAD: Silero V5 (default) or Level3 energy engine generates `VadEvent`s +- STT: Gated by VAD events; transcribes segments when speech is active (feature `vosk`) + - TUI: when STT is enabled and a model is present, partial/final transcripts are logged; Status shows last final transcript +- Backpressure: if the consumer is slow, writes drop when full (warn logged); keep a reader draining via `FrameReader` +- Preferred device format: choose 16 kHz mono when available; otherwise select best supported config and convert downstream +- Watchdog: 5s no-data triggers restart logic in capture thread +- Silence: RMS-based detector; >3s continuous silence logs a warning ## Tuning knobs (where to tweak) -- Chunker (`audio/chunker.rs` → `ChunkerConfig`) - - `frame_size_samples` (default 512): output frame size; matches VAD window. - - `sample_rate_hz` (default 16000): target internal rate. - - `resampler_quality`: `Fast` | `Balanced` (default) | `Quality`. - -- VAD (`vad/config.rs`, `vad/types.rs`) - - Mode: `UnifiedVadConfig.mode` → `Silero` (default) | `Level3`. - - Silero (`SileroConfig`) - - `threshold` (default 0.3): speech probability cutoff. - - `min_speech_duration_ms` (default 250): min speech length before start. - - `min_silence_duration_ms` (default 100): min silence before end. - - `window_size_samples` (default 512): analysis window; aligns with chunker. - - Level3 energy VAD (`Level3Config`) [disabled by default] - - `enabled` (default false): toggle fallback engine. - - `onset_threshold_db` (default 9.0 over floor). - - `offset_threshold_db` (default 6.0 over floor). - - `ema_alpha` (default 0.02): noise floor smoothing. - - `speech_debounce_ms` (default 200): frames to confirm start. - - `silence_debounce_ms` (default 400): frames to confirm end. - - `initial_floor_db` (default -50.0): starting noise floor. - - Frame basics - - `UnifiedVadConfig.frame_size_samples` (default 512) and `sample_rate_hz` (default 16000) control window duration. - -- STT (`stt/mod.rs`, `stt/processor.rs`, `stt/vosk.rs`) [feature `vosk`] - - `TranscriptionConfig` - - `enabled` (bool): gate STT. - - `model_path` (string): defaults via `VOSK_MODEL_PATH` or `models/vosk-model-small-en-us-0.15`. - - `partial_results` (bool, default true): emit interim text. - - `max_alternatives` (u32, default 1): candidate count. - - `include_words` (bool, default false): word timings/confidence. - - `buffer_size_ms` (u32, default 512): STT chunk size fed to Vosk. - -- Text Injection (`text_injection/session.rs`, `text_injection/processor.rs`) - - `SessionConfig` - - `silence_timeout_ms` (default 1500): finalize after silence. - - `buffer_pause_timeout_ms` (default 500): pause boundary between chunks. - - `max_buffer_size` (default 5000 chars): cap transcript buffer. - - `InjectionProcessorConfig` - - `poll_interval_ms` (in code via comments, default 100ms). - -- Audio foundation (`foundation/error.rs`) - - `AudioConfig.silence_threshold` (default 100): RMS-based silence detector threshold. +- Chunker (`crates/coldvox-audio/src/chunker.rs` → `ChunkerConfig`) + - `frame_size_samples` (default 512), `sample_rate_hz` (default 16000) + - `resampler_quality`: `Fast` | `Balanced` (default) | `Quality` + +- VAD (`crates/coldvox-vad/src/config.rs`) + - `UnifiedVadConfig.mode` → `Silero` (default) | `Level3` + - Silero (`crates/coldvox-vad-silero/src/config.rs`) + - `threshold` (default 0.3), `min_speech_duration_ms` (250), `min_silence_duration_ms` (100), `window_size_samples` (512) + - Level3 (`feature = "level3"`, disabled by default) + - `onset_threshold_db` (9.0), `offset_threshold_db` (6.0), `ema_alpha` (0.02) + - `speech_debounce_ms` (200), `silence_debounce_ms` (400), `initial_floor_db` (-50.0) + +- STT (`crates/app/src/stt/` wrappers; core types in `crates/coldvox-stt/`) [feature `vosk`] + - `TranscriptionConfig`: `model_path`, `partial_results`, `max_alternatives`, `include_words`, `buffer_size_ms` + +- Text Injection (`crates/coldvox-text-injection/`; app glue in `crates/app/src/text_injection/`) + - `SessionConfig`, injector backends via features: `text-injection-*` + +- Foundation (`crates/coldvox-foundation/src/error.rs`) + - `AudioConfig.silence_threshold` (default 100) ## Logging for tuning @@ -114,23 +109,26 @@ Use these notes to help AI agents work productively in this Rust repo. Main crat - Logs to stderr and daily-rotated `logs/coldvox.log`. ## Usage patterns -- Start capture: `AudioCaptureThread::spawn(config, ring_producer, device)`. -- Create pipeline: `FrameReader` → `AudioChunker` → broadcast channel → VAD/STT processors. -- VAD integration: `VadProcessor::spawn(config, audio_rx, event_tx, metrics)`. -- STT integration: `SttProcessor::new(audio_rx, vad_rx, transcription_tx, config)` (requires vosk feature). -- Metrics: pass `Arc` to all components for unified telemetry. -- Enumerate devices: `DeviceManager::new()?.enumerate_devices()`; marks default device. - -## VAD system (fully integrated) -- `Forks/ColdVox-voice_activity_detector`: Silero V5 via ONNX Runtime. 16 kHz expects 512-sample windows per prediction. -- Runtime binaries provided under `runtimes/` for major platforms; see its `README.md` for usage and feature flags (`async`, `load-dynamic`). -- Integration: `vad/silero_wrapper.rs` provides `SileroEngine` implementation. -- State machine: VAD events (SpeechStart, SpeechEnd) generated with configurable thresholds and debouncing. -- Fallback: Energy-based VAD available as alternative (currently disabled by default). - -## STT system (feature-gated, available when enabled) -- Vosk-based transcription via `stt/vosk.rs` (requires libvosk system library). -- Gated by VAD: transcribes during detected speech segments. -- Event-driven: emits `TranscriptionEvent::{Partial,Final,Error}` via mpsc channel. -- Configuration: model path via `VOSK_MODEL_PATH` env var; defaults to `models/vosk-model-small-en-us-0.15` if unset. -- Build: enable with `--features vosk`. If the model path exists, STT runs; otherwise STT stays disabled. +- Start capture (coldvox-audio): + - `(capture, device_cfg, cfg_rx) = AudioCaptureThread::spawn(audio_cfg, ring_producer, device_name_opt)?` + - Stop: `capture.stop()` +- Create pipeline: + - `FrameReader` (from consumer) → `AudioChunker` → `broadcast::Sender` +- VAD (app glue): `VadProcessor::spawn(vad_cfg, audio_rx, event_tx, Some(metrics))?` +- STT (feature `vosk`): construct processor under `crates/app/src/stt/processor.rs` +- Metrics: use `Arc` across components +- Devices: `DeviceManager::new()?.enumerate_devices()`; `candidate_device_names()` prefers PipeWire → default → others + +## VAD system +- Silero V5 via `crates/coldvox-vad-silero/` (feature `silero`, default enabled in app) + - Depends on external `voice_activity_detector` crate for ONNX runtime integration +- 16 kHz, 512-sample windows per prediction +- Events: `VadEvent::{SpeechStart, SpeechEnd}` with debouncing and thresholds +- Fallback: Level3 energy VAD available (feature `level3`, disabled by default) + +## STT system (feature-gated) +- Vosk-based transcription via `crates/coldvox-stt-vosk/` (re-exported in `crates/app/src/stt/vosk.rs`) +- Gated by VAD: transcribes during detected speech segments +- Events: `TranscriptionEvent::{Partial, Final, Error}` via mpsc +- Model path via `VOSK_MODEL_PATH` or default `models/vosk-model-small-en-us-0.15` +- Enable with `--features vosk`; if model path is missing, STT stays disabled diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 00000000..e1af6b2e --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,29 @@ +version: 2 +updates: + - package-ecosystem: "github-actions" + directory: "/" + schedule: + interval: "weekly" + commit-message: + prefix: "chore(deps)" + labels: + - "dependencies" + - "github-actions" + # Group updates to reduce PR noise + groups: + actions: + patterns: + - "*" + + - package-ecosystem: "cargo" + directory: "/" + schedule: + interval: "weekly" + commit-message: + prefix: "chore(deps)" + labels: + - "dependencies" + - "rust" + # Don't update workspace members + ignore: + - dependency-name: "coldvox-*" \ No newline at end of file diff --git a/.github/workflows/benchmarks.yml b/.github/workflows/benchmarks.yml new file mode 100644 index 00000000..51159e6a --- /dev/null +++ b/.github/workflows/benchmarks.yml @@ -0,0 +1,58 @@ +name: Benchmarks + +on: + push: + branches: [main] + pull_request: + types: [opened, synchronize] + paths: + - 'crates/**' + - 'Cargo.toml' + - 'Cargo.lock' + - '.github/workflows/benchmarks.yml' + +permissions: + contents: read + pull-requests: read + +concurrency: + group: benchmarks-${{ github.ref }} + cancel-in-progress: true + +jobs: + benchmark: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + + - name: Install system dependencies + run: | + sudo apt-get update + sudo apt-get install -y libasound2-dev + + - name: Run benchmarks + run: | + # Run specific benchmark with minimal features to avoid compilation issues + cargo bench --bench text_chunking_bench --no-default-features --features silero -- --output-format bencher | tee output.txt + + - name: Store benchmark result + uses: benchmark-action/github-action-benchmark@4de1bed97a47495fc4c5404952da0499e31f5c29 # v1.20.3 + with: + name: 'Benchmark' + tool: 'cargo' + output-file-path: output.txt + github-token: ${{ secrets.GITHUB_TOKEN }} + auto-push: false + comment-on-alert: true + alert-threshold: '120%' + fail-on-alert: false + gh-pages-branch: gh-pages + benchmark-data-dir-path: dev/bench + skip-fetch-gh-pages: false + comment-always: false + summary-always: false + save-data-file: true \ No newline at end of file diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 69c5ebf4..cd99742b 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1,113 +1,71 @@ name: CI on: - push: - branches: [main, develop] pull_request: + push: branches: [main, develop] + workflow_dispatch: # Allow manual triggering for agents + +permissions: + contents: read + pull-requests: read -env: - CARGO_TERM_COLOR: always +concurrency: + group: ci-${{ github.ref }} + cancel-in-progress: true jobs: - test: - name: Test Suite - runs-on: ubuntu-latest - strategy: - fail-fast: false - matrix: - feature_set: - - name: "Default features" - features: "" - - name: "VAD only" - features: "" - - name: "STT with Vosk" - features: "vosk" - - name: "Text injection" - features: "text-injection" - - name: "Full features" - features: "vosk,text-injection" - - name: "Examples" - features: "examples" - - name: "Live hardware tests" - features: "live-hardware-tests" + # Core CI using org workflow with workspace support + common-ci: + uses: coldaine/.github/.github/workflows/lang-ci.yml@v1 + secrets: inherit + with: + run_rust: true + run_python: false + rust_no_default_features: true # Baseline without Vosk + rust_msrv: "1.70.0" # Minimum supported version + use_nextest: true # Faster test execution + use_sccache: true # Build caching + run_cargo_deny: true # License/security checks + test_timeout_minutes: 30 + max_parallel: 4 # Limit parallel crate builds + # Crate-specific system dependencies + crate_system_deps: | + {"coldvox-audio":"libasound2-dev","coldvox-text-injection":"libxdo-dev libxtst-dev","coldvox-gui":"libgl1-mesa-dev libxcursor-dev"} + # Validate lockfile is committed and up-to-date + lockfile-check: + runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - - - name: Install Rust - uses: dtolnay/rust-toolchain@stable - with: - components: rustfmt, clippy - - - name: Cache dependencies - uses: Swatinem/rust-cache@v2 - with: - key: ${{ matrix.feature_set.name }} - - - name: Install system dependencies - run: | - sudo apt-get update - sudo apt-get install -y \ - libasound2-dev \ - libxdo-dev \ - libxtst-dev \ - libxinerama-dev \ - libx11-dev \ - libxcursor-dev \ - libxi-dev \ - libgl1-mesa-dev \ - pkg-config - - - name: Check formatting - run: cargo fmt --all -- --check - - - name: Run clippy - run: | - if [ -n "${{ matrix.feature_set.features }}" ]; then - cargo clippy --workspace --features ${{ matrix.feature_set.features }} -- -D warnings - else - cargo clippy --workspace -- -D warnings - fi - - - name: Run tests - run: | - if [ -n "${{ matrix.feature_set.features }}" ]; then - cargo test --workspace --features ${{ matrix.feature_set.features }} - else - cargo test --workspace - fi - - - name: Build - run: | - if [ -n "${{ matrix.feature_set.features }}" ]; then - cargo build --workspace --features ${{ matrix.feature_set.features }} - else - cargo build --workspace - fi + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - name: Validate lockfile + run: | + cargo check --locked --workspace + if git diff --exit-code Cargo.lock; then + echo "✅ Lockfile is up to date" + else + echo "❌ Cargo.lock is out of date" + exit 1 + fi - # Separate job for documentation - docs: - name: Documentation + # Quick validation of key features + feature-smoke-test: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - - - name: Install Rust - uses: dtolnay/rust-toolchain@stable - - - name: Cache dependencies - uses: Swatinem/rust-cache@v2 - - - name: Install system dependencies - run: | - sudo apt-get update - sudo apt-get install -y \ - libasound2-dev \ - libxdo-dev \ - libxtst-dev - - - name: Check documentation - run: cargo doc --workspace --no-deps --all-features - env: - RUSTDOCFLAGS: "-D warnings" \ No newline at end of file + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + - name: Install system deps + run: | + sudo apt-get update + sudo apt-get install -y libasound2-dev libxdo-dev libxtst-dev + - name: Check key features + run: | + cargo check --locked -p coldvox-text-injection + cargo check --locked -p app --features text-injection + cargo check --locked -p app --features examples \ No newline at end of file diff --git a/.github/workflows/cross-platform.yml b/.github/workflows/cross-platform.yml new file mode 100644 index 00000000..f06c846b --- /dev/null +++ b/.github/workflows/cross-platform.yml @@ -0,0 +1,76 @@ +name: Cross-Platform Tests + +on: + # Only on release preparation + pull_request: + branches: [main] + types: [labeled] + # Manual trigger + workflow_dispatch: + +permissions: + contents: read + pull-requests: read + +concurrency: + group: cross-platform-${{ github.ref }} + cancel-in-progress: true + +jobs: + # Linux is primary platform - MUST pass + linux-test: + if: contains(github.event.label.name, 'release') || github.event_name == 'workflow_dispatch' + runs-on: ubuntu-latest + timeout-minutes: 30 + steps: + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - name: Install cargo-nextest + run: cargo install cargo-nextest --locked + - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + + - name: Install Linux dependencies + run: | + sudo apt-get update + sudo apt-get install -y libasound2-dev libxdo-dev libxtst-dev + + - name: Build + run: cargo build --workspace --locked --no-default-features + + - name: Test + run: cargo nextest run --workspace --locked --no-default-features + + # Windows is secondary platform - best effort + windows-test: + if: contains(github.event.label.name, 'release') || github.event_name == 'workflow_dispatch' + runs-on: windows-latest + timeout-minutes: 45 + continue-on-error: true # Don't block releases on Windows issues + + steps: + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - name: Install cargo-nextest (Windows) + run: cargo install cargo-nextest --locked + - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + with: + key: windows + + - name: Build (Windows) + run: cargo build --workspace --locked --no-default-features + + - name: Test (Windows) + run: cargo nextest run --workspace --locked --no-default-features + + - name: Upload Windows artifacts on failure + if: failure() + uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0 + with: + name: windows-failure-artifacts + path: | + target/debug/**/*.log + retention-days: 3 \ No newline at end of file diff --git a/.github/workflows/feature-matrix.yml b/.github/workflows/feature-matrix.yml new file mode 100644 index 00000000..1b081c58 --- /dev/null +++ b/.github/workflows/feature-matrix.yml @@ -0,0 +1,46 @@ +name: Feature Matrix Tests + +on: + # Weekly comprehensive testing + schedule: + - cron: '0 2 * * 1' + # Manual trigger + workflow_dispatch: + +permissions: + contents: read + pull-requests: read + +concurrency: + group: feature-matrix-${{ github.ref }} + cancel-in-progress: true + +jobs: + feature-combinations: + runs-on: ubuntu-latest + timeout-minutes: 60 + steps: + - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + - name: Install cargo-hack + run: cargo install cargo-hack --locked + - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + + - name: Install system dependencies + run: | + sudo apt-get update + sudo apt-get install -y libasound2-dev libxdo-dev libxtst-dev + + - name: Test feature combinations for critical crates + run: | + # Test each feature in isolation and combinations + cargo hack test --each-feature -p coldvox-audio + cargo hack test --each-feature -p coldvox-vad + cargo hack test --each-feature -p coldvox-text-injection + + # Skip GUI and Vosk crates (require special setup) + cargo hack test --each-feature --workspace \ + --exclude coldvox-gui \ + --exclude coldvox-stt-vosk \ No newline at end of file diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index b0c0ab1e..a5b8f247 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -23,21 +23,21 @@ jobs: if: github.ref == 'refs/heads/main' steps: - name: Checkout repository - uses: actions/checkout@v4 + uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 with: fetch-depth: 0 token: ${{ github.token }} - name: Install Rust toolchain - uses: dtolnay/rust-toolchain@stable + uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable - name: Cache Rust dependencies - uses: Swatinem/rust-cache@v2 + uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - name: Install release-plz - uses: taiki-e/install-action@v2 - with: - tool: release-plz + run: cargo install release-plz --locked - name: Run release-plz env: @@ -54,19 +54,19 @@ jobs: if: github.event_name == 'pull_request' && github.event.pull_request.merged == true && github.event.pull_request.base.ref == 'main' steps: - name: Checkout repository - uses: actions/checkout@v4 + uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 with: fetch-depth: 0 # Ensure we operate against the main branch after merge ref: main - name: Install Rust toolchain - uses: dtolnay/rust-toolchain@stable + uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable - name: Install release-plz - uses: taiki-e/install-action@v2 - with: - tool: release-plz + run: cargo install release-plz --locked - name: Create GitHub release env: diff --git a/.github/workflows/vosk-integration.yml b/.github/workflows/vosk-integration.yml new file mode 100644 index 00000000..6f35aa6b --- /dev/null +++ b/.github/workflows/vosk-integration.yml @@ -0,0 +1,123 @@ +name: Vosk Integration Tests + +on: + # Run on PRs that modify STT-related code + pull_request: + paths: + - 'crates/coldvox-stt/**' + - 'crates/coldvox-stt-vosk/**' + - 'crates/app/**' + - 'examples/vosk_*.rs' + - '.github/workflows/vosk-integration.yml' + # Weekly scheduled run + schedule: + - cron: '0 0 * * 0' + # Manual trigger + workflow_dispatch: + +permissions: + contents: read + pull-requests: read + +concurrency: + group: vosk-${{ github.ref }} + cancel-in-progress: true + +jobs: + vosk-tests: + name: Vosk STT Integration + runs-on: ubuntu-latest + timeout-minutes: 45 + + steps: + - name: Checkout code + uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 + with: + fetch-depth: 0 + + - name: Install Rust + uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 + with: + toolchain: stable + + - name: Cache Cargo + uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 + + - name: Install system dependencies + run: | + sudo apt-get update + sudo apt-get install -y \ + libasound2-dev \ + python3 \ + python3-pip \ + wget \ + unzip + + - name: Cache Vosk model + id: cache-vosk-model + uses: actions/cache@13aacd865c20de90d75de3b17ebe84f7a17d57d2 # v4.0.0 + with: + path: models/vosk-model-small-en-us-0.15 + key: vosk-model-small-en-us-0.15 + restore-keys: | + vosk-model-small-en-us- + vosk-model- + + - name: Download Vosk model + if: steps.cache-vosk-model.outputs.cache-hit != 'true' + run: | + mkdir -p models + cd models + # Retry logic for robustness + for i in 1 2 3; do + wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip && break + echo "Download attempt $i failed, retrying..." + sleep 5 + done + unzip vosk-model-small-en-us-0.15.zip + rm vosk-model-small-en-us-0.15.zip + + - name: Install cargo-nextest + run: cargo install cargo-nextest --locked + + - name: Build with Vosk + run: | + # Build both crates that use Vosk feature + cargo build --locked -p coldvox-stt-vosk --features vosk + cargo build --locked -p app --features vosk + + - name: Run Vosk tests + env: + VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 + RUST_TEST_THREADS: 1 # Vosk may have threading issues + run: | + cargo nextest run --locked -p coldvox-stt-vosk --features vosk + + - name: Run end-to-end WAV pipeline test + env: + VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 + run: | + cargo test --locked -p app --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture + + - name: Test Vosk examples + env: + VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 + run: | + # Run Vosk examples if they exist + if ls examples/vosk_*.rs 1> /dev/null 2>&1; then + for example in examples/vosk_*.rs; do + name=$(basename $example .rs) + cargo run --locked --example $name --features vosk,examples || true + done + fi + + - name: Upload artifacts on failure + if: failure() + uses: actions/upload-artifact@50769540e7f4bd5e21e526ee35c689e35e0d6874 # v4.4.0 + with: + name: vosk-test-artifacts + path: | + target/debug/**/*.log + logs/ + transcripts/ + retention-days: 7 \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index ec884e39..a5b47c48 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,187 +1,203 @@ # CLAUDE.md -This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +Guidance for Claude Code when working in this repository. ## Project Overview -ColdVox is a Rust-based voice AI project focused on real-time audio processing with emphasis on reliability and automatic recovery. The project implements a multi-phase STT (Speech-to-Text) system with voice activity detection (VAD) and resilient audio capture using lock-free ring buffers for real-time communication. - -## Architecture - -### Core Components - -- **Foundation Layer** (`crates/app/src/foundation/`): Error handling, health monitoring, state management, graceful shutdown -- **Audio System** (`crates/app/src/audio/`): Microphone capture, device management, watchdog monitoring, automatic recovery - - `AudioCapture`: Device-native capture (no resampling, converts device sample format → i16) - - `AudioChunker`: Downmixes to mono, resamples to 16 kHz, converts variable-sized frames to fixed 512-sample chunks - - `VadAdapter`: Trait for pluggable VAD implementations - - `VADProcessor`: VAD processing pipeline integration -- **VAD System** (`crates/app/src/vad/`): Dual VAD implementation with power-based VAD and ML models - - `Level3VAD`: Progressive energy-based VAD implementation **[DISABLED BY Default]** - - `SileroEngine`: Silero model wrapper for ML-based VAD **[Default ACTIVE VAD]** - - `VADStateMachine`: State management for VAD transitions - - `UnifiedVADConfig`: Configuration supporting both VAD modes (defaults to Silero) -- **STT System** (`crates/app/src/stt/`): Speech-to-text transcription with buffered processing - - `VoskTranscriber`: Vosk-based STT implementation - - `STTProcessor`: Buffers audio during speech segments, processes entire buffer at SpeechEnd for better accuracy - - `Transcriber` trait for pluggable STT backers - - **Buffering Strategy**: Accumulates all audio frames from SpeechStart → SpeechEnd, then processes as single chunk -- **Text Injection System** (`crates/app/src/text_injection/`): Immediate text injection - - `TextInjector`: Production text injection using ydotool/clipboard - - `InjectionProcessor`: Immediate injection (0ms timeout) after transcription completes - - `AsyncInjectionProcessor`: Async wrapper for pipeline integration -- **Telemetry** (`crates/app/src/telemetry/`): Metrics collection and monitoring - - `PipelineMetrics`: Real-time pipeline performance metrics - - Cross-thread monitoring of audio levels, latency, and throughput - -### Threading Model - -- **Mic Thread**: Owns audio device, handles capture -- **Processing Thread**: Runs VAD and chunking -- **STT Thread**: Processes speech segments when VAD detects speech -- **Main Thread**: Orchestrates and monitors components -- Communication via lock-free ring buffers (rtrb), broadcast channels, and mpsc channels - -### Audio Specifications - -- Internal processing format: 16 kHz, 16-bit signed (i16), mono -- Capture: Device‑native format and rate; converted to i16 only +ColdVox is a Rust-based voice AI project that implements a complete VAD-gated STT pipeline to capture audio, transcribe speech to text, and inject the transcribed text into the proper text field where the user is working. The text injection uses multiple backend methods (clipboard, AT-SPI, keyboard emulation) and is the critical final step that delivers the transcribed output to the user's active application. + +**Platform Detection Update (2025-09-02):** The build system now automatically detects the platform and enables appropriate text injection backends: +- Linux: Automatically enables AT-SPI, Wayland clipboard, and ydotool backends +- Windows/macOS: Automatically enables Enigo backend +- Build.rs detects Wayland vs X11 at compile time for optimal backend selection +- No need to manually specify text-injection feature flags on Linux anymore + +## Architecture (multi-crate) + +- `crates/coldvox-foundation/` — App scaffolding and core types + - `state.rs`: `AppState` + `StateManager` + - `shutdown.rs`: Ctrl+C handler + panic hook (`ShutdownHandler`/`ShutdownGuard`) + - `health.rs`: `HealthMonitor` + - `error.rs`: `AppError`/`AudioError`, `AudioConfig { silence_threshold }` + +- `crates/coldvox-audio/` — Capture & chunking pipeline + - `device.rs`: CPAL host/device discovery; PipeWire-aware priorities + - `capture.rs`: `AudioCaptureThread::spawn(...)` (input stream, watchdog, silence detection) + - `ring_buffer.rs`: `AudioRingBuffer` (rtrb SPSC for i16 samples) + - `frame_reader.rs`: `FrameReader` to normalize device frames + - `chunker.rs`: `AudioChunker` → fixed 512-sample frames (32 ms at 16 kHz) + - `watchdog.rs`: 5s no-data watchdog and auto-recovery hooks + - `detector.rs`: RMS-based `SilenceDetector` + +- `crates/coldvox-vad/` — VAD traits and configs; Level3 energy VAD (feature `level3`) + - `config.rs`: `UnifiedVadConfig`, `VadMode` + - `engine.rs`, `types.rs`, `constants.rs`, `VadProcessor` trait + +- `crates/coldvox-vad-silero/` — Silero V5 ONNX VAD (feature `silero`) + - `silero_wrapper.rs`: `SileroEngine` implementing `VadEngine` + - Uses external `voice_activity_detector` crate + +- `crates/coldvox-stt/` — STT core abstractions and events + +- `crates/coldvox-stt-vosk/` — Vosk STT integration (feature `vosk`) + +- `crates/coldvox-telemetry/` — Pipeline metrics (`PipelineMetrics`, `FpsTracker`) + +- `crates/coldvox-text-injection/` — Text injection backends (feature-gated) + +- `crates/app/` — App glue, UI, re-exports + - `src/audio/`: `vad_adapter.rs`, `vad_processor.rs`, re-exports from `coldvox-audio` + - `src/vad/mod.rs`: re-exports VAD types from VAD crates + - `src/stt/`: processor/persistence wrappers and Vosk re-exports + - Binaries: `src/main.rs` (app), `src/bin/tui_dashboard.rs`, probes under `src/probes/` + +## Threading & Tasks + +- Dedicated capture thread: owns CPAL stream; watchdog monitors no-data; restarts on errors +- Async tasks (Tokio): VAD processor, STT processor, UI/TUI tasks +- Channels: rtrb SPSC ring buffer, `broadcast` for audio frames, `mpsc` for events + +## Audio Specifications + +- Internal pipeline target: 16 kHz, 16-bit i16, mono +- Device capture: device-native format, converted to i16; channel/rate normalization downstream - Chunker output: 512 samples (32 ms) at 16 kHz -- Conversion: Stereo→mono averaging and resampling happen in the chunker task -- Overflow handling: Lock‑free ring buffer backpressure with stats +- Conversions: stereo→mono and resampling via `FrameReader`/`AudioChunker` and VAD adapter when needed +- Backpressure: non-blocking writes; drop on full (metrics recorded) ### Resampler Quality - Presets: `Fast`, `Balanced` (default), `Quality` -- Trade‑offs: - - Fast: lowest CPU, slightly more aliasing - - Balanced: good default balance - - Quality: higher CPU, best stopband attenuation -- Where: set via `ChunkerConfig { resampler_quality, .. }` +- Location: `crates/coldvox-audio/src/chunker.rs` (`ChunkerConfig { resampler_quality, .. }`) ## Development Commands +All commands below assume working from `crates/app` unless noted. + ### Building ```bash -# Main application (requires --features vosk) cd crates/app -cargo build --features vosk -cargo build --release --features vosk -# TUI Dashboard binary +# App (with STT by default - requires system libvosk) +cargo build +cargo build --release + +# App without STT (for CI or environments without Vosk) +cargo build --no-default-features --features silero,text-injection + +# TUI Dashboard cargo build --bin tui_dashboard -# Build specific examples (from crates/app directory) +# Examples (wired from root /examples via Cargo metadata) cargo build --example foundation_probe cargo build --example mic_probe cargo build --example vad_demo +cargo build --example record_10s ``` -### Testing +### Running ```bash -# Run all tests -cargo test +# App (with STT by default) +cargo run -# Run specific test -cargo test test_name +# App without STT (for CI or environments without Vosk) +cargo run --no-default-features --features silero,text-injection -# Run with verbose output -cargo test -- --nocapture - -# Run tests for specific module -cargo test audio:: -cargo test vad:: +# TUI Dashboard (optionally select device) +cargo run --bin tui_dashboard +cargo run --bin tui_dashboard -- -D "USB Microphone" -# Run end-to-end WAV pipeline test (requires Vosk model) -VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture +# Examples +cargo run --example foundation_probe -- --duration 60 +cargo run --example mic_probe -- --duration 120 --device "pipewire" --silence_threshold 120 +cargo run --example vad_demo +cargo run --example record_10s ``` -### Running Test Binaries +### Testing ```bash -# Main application (requires --features vosk) -cargo run --features vosk +# Workspace tests +cargo test -# TUI Dashboard for real-time monitoring -cargo run --bin tui_dashboard -cargo run --bin tui_dashboard -- -D "USB Microphone" # Specific device +# Verbose +cargo test -- --nocapture -# Examples (from crates/app directory): -cargo run --example foundation_probe -- --duration 60 -cargo run --example mic_probe -- --duration 120 --expect-disconnect -cargo run --example vad_demo # Test VAD with microphone -cargo run --example record_10s # Record 10 seconds to WAV +# Specific crate/module +cargo test -p coldvox-app vad_pipeline_tests + +# End-to-end WAV pipeline test (requires Vosk model) +VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ + cargo test -p coldvox-app --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture ``` ### Type Checking & Linting + ```bash cargo check --all-targets -cargo fmt -- --check # Check formatting -cargo clippy -- -D warnings # Strict linting +cargo fmt -- --check +cargo clippy -- -D warnings ``` ## Key Design Principles -1. **Monotonic Time**: Use `std::time::Instant` for all durations/intervals -2. **Graceful Degradation**: Primary VAD with energy-based fallback -3. **Automatic Recovery**: Exponential backoff with jitter for reconnection -4. **Lock-free Communication**: Ring buffers (rtrb) with atomic operations -5. **Structured Logging**: Rate-limited, JSON-formatted logs with daily rotation -6. **Power-of-two Buffers**: For efficient index masking in ring buffers +1. Monotonic time (`std::time::Instant`) for durations and timestamps +2. Graceful degradation: Silero VAD default, Level3 fallback via feature +3. Automatic recovery: watchdog + restart on stream error +4. Lock-free communication: rtrb ring buffer with atomic counters +5. Structured logging with rotation; avoid TUI stderr logging +6. Power-of-two buffers for efficient masking + +## Tuning Knobs + +- Chunker (`crates/coldvox-audio/src/chunker.rs` → `ChunkerConfig`) + - `frame_size_samples` (default 512), `sample_rate_hz` (16000), `resampler_quality` -## Configuration +- VAD (`crates/coldvox-vad/src/config.rs`) + - `UnifiedVadConfig.mode`: `Silero` (default) | `Level3` + - Silero (`crates/coldvox-vad-silero/src/config.rs`): `threshold`, `min_speech_duration_ms`, `min_silence_duration_ms`, `window_size_samples` + - Level3 (feature `level3`): `onset_threshold_db`, `offset_threshold_db`, `ema_alpha`, `speech_debounce_ms`, `silence_debounce_ms`, `initial_floor_db` -Configuration parameters: -- Window/overlap for audio processing (default: 500ms window, 0.5 overlap) -- VAD thresholds and debouncing (speech_threshold: 0.6, min_speech_ms: 200) -- Retry policies and timeouts (exponential backoff with jitter) -- Buffer overflow handling (DropOldest/DropNewest/Panic) -- Logging and metrics settings (JSON structured, rate-limited) +- STT (`crates/app/src/stt/` wrappers; core in `crates/coldvox-stt/`) [enabled by default, disable with `--no-default-features`] + - `TranscriptionConfig`: `model_path`, `partial_results`, `max_alternatives`, `include_words`, `buffer_size_ms` + +- Text Injection (`crates/coldvox-text-injection/`; app glue in `crates/app/src/text_injection/`) + - Backends via features: `text-injection-*` + +- Foundation (`crates/coldvox-foundation/src/error.rs`) + - `AudioConfig.silence_threshold` (default 100) ## Important Files -- `crates/app/src/main.rs`: Main application entry point with Vosk STT -- `crates/app/src/bin/tui_dashboard.rs`: Real-time monitoring dashboard -- `crates/app/src/audio/capture.rs`: Core audio capture with format negotiation -- `crates/app/src/audio/chunker.rs`: Audio chunking for VAD processing -- `crates/app/src/vad/processor.rs`: VAD processing pipeline integration -- `crates/app/src/stt/processor.rs`: STT processor gated by VAD -- `crates/app/src/text_injection/processor.rs`: Session-based text injection processor -- `crates/app/src/telemetry/pipeline_metrics.rs`: Real-time metrics tracking -- `crates/app/src/stt/tests/end_to_end_wav.rs`: End-to-end pipeline test with WAV files - -## Error Handling - -Hierarchical error types with recovery strategies: -- `AppError`: Top-level application errors -- `AudioError`: Audio subsystem specific errors (supports all CPAL formats) -- Recovery via exponential backoff with jitter -- Watchdog monitoring for device disconnection (with proper epoch handling) -- Clean shutdown with `stop()` methods on all components +- App entry points: `crates/app/src/main.rs`, `crates/app/src/bin/tui_dashboard.rs` +- Audio glue: `crates/app/src/audio/vad_adapter.rs`, `crates/app/src/audio/vad_processor.rs` +- Audio core: `crates/coldvox-audio/src/capture.rs`, `frame_reader.rs`, `chunker.rs`, `ring_buffer.rs`, `device.rs` +- VAD: `crates/coldvox-vad/src/*`, `crates/coldvox-vad-silero/src/silero_wrapper.rs` +- STT: `crates/app/src/stt/processor.rs`, `crates/app/src/stt/vosk.rs`, `crates/app/src/stt/tests/end_to_end_wav.rs` +- Telemetry: `crates/coldvox-telemetry/src/*` + +## Error Handling & Recovery + +- `AppError` and `AudioError` types in foundation +- Watchdog monitors no-data; stream errors trigger restarts +- Clean shutdown via `AudioCaptureThread::stop()` and task aborts ## Testing Approach -- Unit tests for individual components -- Integration tests for subsystems -- Examples for manual testing (in `/examples/` directory) -- TUI dashboard (`tui_dashboard`) for real-time monitoring -- Mock traits using `mockall` for isolation -- WAV file testing for VAD validation -- **End-to-end pipeline testing** (`crates/app/src/stt/tests/end_to_end_wav.rs`): Complete pipeline validation using real WAV files +- Unit tests within crates; integration tests under `crates/app/tests/` +- Example programs under `/examples` for manual verification +- End-to-end WAV pipeline test: `crates/app/src/stt/tests/end_to_end_wav.rs` ## Vosk Model Setup -ColdVox uses Vosk for speech-to-text transcription. A small English model is already installed: - -- **Location**: `models/vosk-model-small-en-us-0.15/` -- **Environment Variable**: Set `VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15` if not using default -- **Alternative Models**: Download larger models from https://alphacephei.com/vosk/models for better accuracy +- Default model path: `models/vosk-model-small-en-us-0.15/` +- Override with `VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15` +- Larger models at https://alphacephei.com/vosk/models -## Known Issues +## Notes / Known Behaviors -- **Example paths**: Cargo.toml references `crates/app/examples/` but actual files are in root `/examples/` directory -- **Device selection**: TUI dashboard device selection (-D flag) requires exact device name match -- **Dynamic device reconfiguration**: Capture→Chunker config update is driven by frame metadata; ensure `FrameReader` is updated on device changes \ No newline at end of file +- Linux/PipeWire: `DeviceManager` prioritizes `pipewire` → default device → others +- TUI `-D` expects a device name; use exact device string shown by system when possible +- On format changes (rate/channels), `FrameReader` should receive updated device config via broadcast \ No newline at end of file diff --git a/Cargo.lock b/Cargo.lock index 9c7236a5..6a927c82 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -803,6 +803,7 @@ dependencies = [ "async-trait", "atspi", "chrono", + "coldvox-stt", "device_query 2.1.0", "enigo", "mockall", diff --git a/crates/app/Cargo.toml b/crates/app/Cargo.toml index 08e72b07..6e6d8e59 100644 --- a/crates/app/Cargo.toml +++ b/crates/app/Cargo.toml @@ -75,7 +75,8 @@ coldvox-vad = { path = "../coldvox-vad" } coldvox-vad-silero = { path = "../coldvox-vad-silero", features = ["silero"] } coldvox-stt = { path = "../coldvox-stt" } coldvox-stt-vosk = { path = "../coldvox-stt-vosk", optional = true, features = ["vosk"] } -coldvox-text-injection = { path = "../coldvox-text-injection", optional = true } +# Note: coldvox-text-injection is now specified in target-specific sections below +# to automatically enable platform-appropriate backends csv = "1.3" device_query = "4.0" cpal = "0.15.2" @@ -91,11 +92,12 @@ criterion = "0.5" rand = "0.8" [features] -default = ["silero"] +default = ["silero", "text-injection", "vosk"] live-hardware-tests = [] vosk = ["dep:coldvox-stt-vosk"] +no-stt = [] examples = [] -text-injection = ["dep:coldvox-text-injection"] +text-injection = ["dep:coldvox-text-injection"] # Enables platform-specific dependency silero = ["coldvox-vad-silero/silero"] level3 = ["coldvox-vad/level3"] text-injection-atspi = ["text-injection", "coldvox-text-injection/atspi"] @@ -104,4 +106,16 @@ text-injection-ydotool = ["text-injection", "coldvox-text-injection/ydotool"] text-injection-enigo = ["text-injection", "coldvox-text-injection/enigo"] text-injection-mki = ["text-injection", "coldvox-text-injection/mki"] text-injection-kdotool = ["text-injection", "coldvox-text-injection/xdg_kdotool"] -text-injection-regex = ["text-injection", "dep:regex"] \ No newline at end of file +text-injection-regex = ["text-injection", "dep:regex"] + +# Platform-specific dependencies for Linux +[target.'cfg(target_os = "linux")'.dependencies] +coldvox-text-injection = { path = "../coldvox-text-injection", features = ["atspi", "wl_clipboard", "ydotool"], optional = true } + +# Platform-specific dependencies for Windows +[target.'cfg(target_os = "windows")'.dependencies] +coldvox-text-injection = { path = "../coldvox-text-injection", features = ["enigo"], optional = true } + +# Platform-specific dependencies for macOS +[target.'cfg(target_os = "macos")'.dependencies] +coldvox-text-injection = { path = "../coldvox-text-injection", features = ["enigo"], optional = true } \ No newline at end of file diff --git a/crates/app/benches/text_chunking_bench.rs b/crates/app/benches/text_chunking_bench.rs index f79a05e2..e6791afb 100644 --- a/crates/app/benches/text_chunking_bench.rs +++ b/crates/app/benches/text_chunking_bench.rs @@ -1,4 +1,6 @@ -use criterion::{criterion_group, criterion_main, BatchSize, BenchmarkId, Criterion, Throughput, black_box}; +use criterion::{ + black_box, criterion_group, criterion_main, BatchSize, BenchmarkId, Criterion, Throughput, +}; // Old approach: allocate Vec and chunk by character count fn chunk_old_collect(text: &str, chunk_chars: usize) -> usize { @@ -19,10 +21,9 @@ fn chunk_new_iter(text: &str, chunk_chars: usize) -> usize { let mut count = 0usize; let mut chars_seen = 0usize; let mut start = 0usize; - let mut end_byte = 0usize; for (i, ch) in text.char_indices() { // end_byte should point after the current char - end_byte = i + ch.len_utf8(); + let end_byte = i + ch.len_utf8(); chars_seen += 1; if chars_seen == chunk_chars { let _chunk = &text[start..end_byte]; diff --git a/crates/app/build.rs b/crates/app/build.rs new file mode 100644 index 00000000..c8d35a6c --- /dev/null +++ b/crates/app/build.rs @@ -0,0 +1,56 @@ +use std::env; + +fn main() { + // Detect Linux desktop environment at compile time + if cfg!(target_os = "linux") { + // Enable Linux-specific text injection features + println!("cargo:rustc-cfg=text_injection_linux"); + + // Check for Wayland + if env::var("WAYLAND_DISPLAY").is_ok() || + env::var("XDG_SESSION_TYPE").map(|s| s == "wayland").unwrap_or(false) { + println!("cargo:rustc-cfg=wayland_session"); + println!("cargo:rustc-cfg=text_injection_atspi"); + println!("cargo:rustc-cfg=text_injection_clipboard"); + println!("cargo:rustc-cfg=text_injection_ydotool"); + } + + // Check for X11 + if env::var("DISPLAY").is_ok() || + env::var("XDG_SESSION_TYPE").map(|s| s == "x11").unwrap_or(false) { + println!("cargo:rustc-cfg=x11_session"); + println!("cargo:rustc-cfg=text_injection_atspi"); + println!("cargo:rustc-cfg=text_injection_clipboard"); + println!("cargo:rustc-cfg=text_injection_kdotool"); + } + + // If neither detected, enable all Linux backends + if env::var("WAYLAND_DISPLAY").is_err() && + env::var("DISPLAY").is_err() && + env::var("XDG_SESSION_TYPE").is_err() { + // Build environment might not have display vars, enable all + println!("cargo:rustc-cfg=text_injection_atspi"); + println!("cargo:rustc-cfg=text_injection_clipboard"); + println!("cargo:rustc-cfg=text_injection_ydotool"); + println!("cargo:rustc-cfg=text_injection_kdotool"); + } + + // Always enable these on Linux + println!("cargo:rustc-cfg=text_injection_mki"); + println!("cargo:rustc-cfg=text_injection_enigo"); + } + + // Windows + if cfg!(target_os = "windows") { + println!("cargo:rustc-cfg=text_injection_windows"); + println!("cargo:rustc-cfg=text_injection_enigo"); + println!("cargo:rustc-cfg=text_injection_mki"); + } + + // macOS + if cfg!(target_os = "macos") { + println!("cargo:rustc-cfg=text_injection_macos"); + println!("cargo:rustc-cfg=text_injection_enigo"); + println!("cargo:rustc-cfg=text_injection_mki"); + } +} \ No newline at end of file diff --git a/crates/app/src/audio/mod.rs b/crates/app/src/audio/mod.rs index 3b401f0c..7265496d 100644 --- a/crates/app/src/audio/mod.rs +++ b/crates/app/src/audio/mod.rs @@ -3,12 +3,12 @@ pub mod vad_processor; // Re-export modules from coldvox-audio crate pub use coldvox_audio::{ + capture::CaptureStats, chunker::{AudioChunker, ChunkerConfig, ResamplerQuality}, frame_reader::FrameReader, - ring_buffer::{AudioRingBuffer, AudioProducer}, - capture::CaptureStats, + ring_buffer::{AudioProducer, AudioRingBuffer}, }; +pub use coldvox_audio::AudioFrame; pub use vad_adapter::*; pub use vad_processor::*; -pub use coldvox_audio::AudioFrame; diff --git a/crates/app/src/audio/vad_adapter.rs b/crates/app/src/audio/vad_adapter.rs index c9d6db91..ffc1bc14 100644 --- a/crates/app/src/audio/vad_adapter.rs +++ b/crates/app/src/audio/vad_adapter.rs @@ -1,11 +1,9 @@ -use coldvox_vad::{ - UnifiedVadConfig, VadMode, VadEngine, VadEvent, VadState, -}; -#[cfg(feature = "level3")] -use coldvox_vad::VadConfig; #[cfg(feature = "level3")] use coldvox_vad::level3::Level3Vad; -#[cfg(feature = "silero")] +#[cfg(feature = "level3")] +use coldvox_vad::VadConfig; +use coldvox_vad::{UnifiedVadConfig, VadEngine, VadEvent, VadMode, VadState}; +#[cfg(feature = "silero")] use coldvox_vad_silero::SileroEngine; pub struct VadAdapter { @@ -22,7 +20,10 @@ impl VadAdapter { // INTENTIONAL: Level3 VAD is disabled by default // This check ensures it's not accidentally enabled without explicit configuration if !config.level3.enabled { - return Err("Level3 VAD is disabled in configuration. Use Silero mode instead.".to_string()); + return Err( + "Level3 VAD is disabled in configuration. Use Silero mode instead." + .to_string(), + ); } let level3_config = VadConfig { onset_threshold_db: config.level3.onset_threshold_db, @@ -38,7 +39,10 @@ impl VadAdapter { } #[cfg(not(feature = "level3"))] VadMode::Level3 => { - return Err("Level3 VAD is not available in this build. Use Silero mode instead.".to_string()); + return Err( + "Level3 VAD is not available in this build. Use Silero mode instead." + .to_string(), + ); } VadMode::Silero => { let silero_config = coldvox_vad_silero::SileroConfig { @@ -50,7 +54,7 @@ impl VadAdapter { Box::new(SileroEngine::new(silero_config)?) } }; - + let resampler = if engine.required_sample_rate() != config.sample_rate_hz || engine.required_frame_size_samples() != config.frame_size_samples { @@ -63,14 +67,14 @@ impl VadAdapter { } else { None }; - + Ok(Self { engine, config, resampler, }) } - + pub fn process(&mut self, frame: &[i16]) -> Result, String> { if let Some(resampler) = &mut self.resampler { let processed_frame = resampler.process(frame)?; @@ -85,26 +89,25 @@ impl VadAdapter { self.engine.process(frame) } } - + pub fn reset(&mut self) { self.engine.reset(); if let Some(resampler) = &mut self.resampler { resampler.reset(); } } - + pub fn current_state(&self) -> VadState { self.engine.current_state() } - + pub fn config(&self) -> &UnifiedVadConfig { &self.config } } use rubato::{ - Resampler, SincFixedIn, - SincInterpolationParameters, SincInterpolationType, WindowFunction, + Resampler, SincFixedIn, SincInterpolationParameters, SincInterpolationType, WindowFunction, }; struct AudioResampler { @@ -131,7 +134,7 @@ impl AudioResampler { let (resampler, chunk_size) = if input_rate != output_rate { // Use a chunk size that works well with typical frame sizes let chunk_size = 512; - + // Configure for low-latency VAD processing let sinc_params = SincInterpolationParameters { sinc_len: 64, @@ -140,20 +143,21 @@ impl AudioResampler { oversampling_factor: 128, window: WindowFunction::Blackman2, }; - + let resampler = SincFixedIn::::new( - output_rate as f64 / input_rate as f64, // Resample ratio - 2.0, // Max resample ratio change (not used in fixed mode) + output_rate as f64 / input_rate as f64, // Resample ratio + 2.0, // Max resample ratio change (not used in fixed mode) sinc_params, chunk_size, - 1, // mono - ).map_err(|e| format!("Failed to create Rubato resampler: {}", e))?; - + 1, // mono + ) + .map_err(|e| format!("Failed to create Rubato resampler: {}", e))?; + (Some(resampler), chunk_size) } else { - (None, 512) // Default chunk size even when not resampling + (None, 512) // Default chunk size even when not resampling }; - + Ok(Self { input_rate, output_rate, @@ -167,7 +171,7 @@ impl AudioResampler { chunk_size, }) } - + fn process(&mut self, input: &[i16]) -> Result, String> { if input.len() != self.input_frame_size { return Err(format!( @@ -176,10 +180,10 @@ impl AudioResampler { input.len() )); } - + // Add input to accumulator self.accumulator.extend_from_slice(input); - + // If sample rates are the same, just handle frame size conversion if self.input_rate == self.output_rate { // Simple frame size conversion without resampling @@ -189,18 +193,18 @@ impl AudioResampler { } } else if let Some(resampler) = &mut self.resampler { // Use Rubato for high-quality resampling - + // Convert accumulated i16 samples to f32 for &sample in &self.accumulator { self.f32_input_buffer.push(sample as f32 / 32768.0); } self.accumulator.clear(); - + // Process complete chunks through Rubato while self.f32_input_buffer.len() >= self.chunk_size { let chunk: Vec = self.f32_input_buffer.drain(..self.chunk_size).collect(); let input_frames = vec![chunk]; - + // Process with Rubato match resampler.process(&input_frames, None) { Ok(output_frames) => { @@ -213,7 +217,7 @@ impl AudioResampler { } } } - + // Convert f32 output back to i16 and add to output buffer for &sample in &self.f32_output_buffer { let clamped = sample.clamp(-1.0, 1.0); @@ -222,7 +226,7 @@ impl AudioResampler { } self.f32_output_buffer.clear(); } - + // Return a complete frame if available, otherwise return empty vector if self.output_buffer.len() >= self.output_frame_size { Ok(self.output_buffer.drain(..self.output_frame_size).collect()) @@ -231,7 +235,7 @@ impl AudioResampler { Ok(Vec::new()) } } - + fn reset(&mut self) { self.output_buffer.clear(); self.accumulator.clear(); @@ -284,7 +288,11 @@ mod tests { break; } } - assert_eq!(got.len(), 512, "Should eventually produce one full 512-sample frame"); + assert_eq!( + got.len(), + 512, + "Should eventually produce one full 512-sample frame" + ); assert!(got.iter().all(|&s| s == 0)); } -} \ No newline at end of file +} diff --git a/crates/app/src/audio/vad_processor.rs b/crates/app/src/audio/vad_processor.rs index a3498cd9..f9ee8bc4 100644 --- a/crates/app/src/audio/vad_processor.rs +++ b/crates/app/src/audio/vad_processor.rs @@ -1,6 +1,6 @@ +use coldvox_audio::AudioFrame; use coldvox_telemetry::{FpsTracker, PipelineMetrics}; use coldvox_vad::{UnifiedVadConfig, VadEvent}; -use coldvox_audio::AudioFrame; use std::sync::Arc; use tokio::sync::broadcast; use tokio::sync::mpsc::Sender; @@ -61,7 +61,8 @@ impl VadProcessor { } // Convert f32 samples back to i16 - let i16_data: Vec = frame.samples + let i16_data: Vec = frame + .samples .iter() .map(|&s| (s * i16::MAX as f32) as i16) .collect(); diff --git a/crates/app/src/bin/mic_probe.rs b/crates/app/src/bin/mic_probe.rs index ae12a06b..78e7990a 100644 --- a/crates/app/src/bin/mic_probe.rs +++ b/crates/app/src/bin/mic_probe.rs @@ -1,17 +1,18 @@ use clap::{Parser, Subcommand}; use coldvox_app::probes::{ - common::{ensure_results_dir, LiveTestResult, TestContext, write_result_json}, - MicCaptureCheck, VadMicCheck + common::{ensure_results_dir, write_result_json, LiveTestResult, TestContext}, + MicCaptureCheck, VadMicCheck, }; use std::path::PathBuf; use std::time::Duration; - #[derive(Parser)] #[command(name = "mic-probe")] #[command(version = "1.0")] #[command(about = "ColdVox live audio testing tool")] -#[command(long_about = "Comprehensive audio testing tool for microphone capture, VAD processing, and system validation")] +#[command( + long_about = "Comprehensive audio testing tool for microphone capture, VAD processing, and system validation" +)] struct Cli { #[command(subcommand)] command: Commands, @@ -50,18 +51,10 @@ async fn main() -> Result<(), Box> { let cli = Cli::parse(); match cli.command { - Commands::MicCapture => { - run_single_test(&cli, TestType::MicCapture).await - } - Commands::VadMic => { - run_single_test(&cli, TestType::VadMic).await - } - Commands::All => { - run_all_tests(&cli).await - } - Commands::ListDevices => { - list_devices().await - } + Commands::MicCapture => run_single_test(&cli, TestType::MicCapture).await, + Commands::VadMic => run_single_test(&cli, TestType::VadMic).await, + Commands::All => run_all_tests(&cli).await, + Commands::ListDevices => list_devices().await, } } @@ -98,7 +91,12 @@ async fn run_single_test(cli: &Cli, test_type: TestType) -> Result<(), Box { @@ -173,7 +171,14 @@ async fn run_all_tests(cli: &Cli) -> Result<(), Box> { } } - println!("\nOverall result: {}", if all_passed { "ALL TESTS PASSED" } else { "SOME TESTS FAILED" }); + println!( + "\nOverall result: {}", + if all_passed { + "ALL TESTS PASSED" + } else { + "SOME TESTS FAILED" + } + ); if !all_passed { std::process::exit(1); @@ -184,12 +189,12 @@ async fn run_all_tests(cli: &Cli) -> Result<(), Box> { async fn list_devices() -> Result<(), Box> { use cpal::traits::{DeviceTrait, HostTrait}; - + println!("Available audio devices:"); println!(); - + let host = cpal::default_host(); - + // List input devices println!("Input Devices:"); match host.input_devices() { @@ -205,7 +210,7 @@ async fn list_devices() -> Result<(), Box> { println!(" Error listing input devices: {}", e); } } - + // Show default input device println!(); if let Some(device) = host.default_input_device() { @@ -216,7 +221,7 @@ async fn list_devices() -> Result<(), Box> { } else { println!("No default input device found"); } - + Ok(()) } @@ -248,4 +253,4 @@ fn print_test_result(result: &LiveTestResult) { println!(" {}: {}", key, value); } } -} \ No newline at end of file +} diff --git a/crates/app/src/bin/tui_dashboard.rs b/crates/app/src/bin/tui_dashboard.rs index 53fb412d..e31063c9 100644 --- a/crates/app/src/bin/tui_dashboard.rs +++ b/crates/app/src/bin/tui_dashboard.rs @@ -4,19 +4,19 @@ // - File output uses a non-blocking writer; logs/ is created if missing. // - Useful for post-session analysis even when the TUI is active. use clap::Parser; +use coldvox_app::audio::vad_processor::VadProcessor; +#[cfg(feature = "vosk")] +use coldvox_app::stt::{processor::SttProcessor, TranscriptionConfig, TranscriptionEvent}; use coldvox_audio::capture::AudioCaptureThread; +use coldvox_audio::chunker::AudioFrame as VadFrame; use coldvox_audio::chunker::{AudioChunker, ChunkerConfig}; use coldvox_audio::frame_reader::FrameReader; use coldvox_audio::ring_buffer::AudioRingBuffer; -use coldvox_audio::chunker::AudioFrame as VadFrame; -use coldvox_app::audio::vad_processor::VadProcessor; use coldvox_foundation::error::AudioConfig; use coldvox_telemetry::pipeline_metrics::{PipelineMetrics, PipelineStage}; use coldvox_vad::config::{UnifiedVadConfig, VadMode}; use coldvox_vad::constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; use coldvox_vad::types::VadEvent; -#[cfg(feature = "vosk")] -use coldvox_app::stt::{processor::SttProcessor, TranscriptionConfig, TranscriptionEvent}; use crossterm::{ event::{self, DisableMouseCapture, EnableMouseCapture, Event, KeyCode}, execute, @@ -50,7 +50,8 @@ fn init_logging(cli_level: &str) -> Result<(), Box> { } else { std::env::var("RUST_LOG").unwrap_or_else(|_| "debug".to_string()) }; - let env_filter = EnvFilter::try_new(effective_level).unwrap_or_else(|_| EnvFilter::new("debug")); + let env_filter = + EnvFilter::try_new(effective_level).unwrap_or_else(|_| EnvFilter::new("debug")); // Only use file logging for TUI mode to avoid corrupting the display let file_layer = fmt::layer() @@ -71,7 +72,11 @@ fn init_logging(cli_level: &str) -> Result<(), Box> { } #[derive(Parser)] -#[command(author, version, about = "TUI Dashboard with real-time audio monitoring")] +#[command( + author, + version, + about = "TUI Dashboard with real-time audio monitoring" +)] struct Cli { /// Audio device name #[arg(short = 'D', long)] @@ -176,7 +181,7 @@ impl Default for DashboardState { metrics: PipelineMetricsSnapshot { current_rms: 0, current_peak: 0, - audio_level_db: -900, // -90.0 dB * 10 + audio_level_db: -900, // -90.0 dB * 10 capture_fps: 0, chunker_fps: 0, vad_fps: 0, @@ -388,16 +393,27 @@ async fn run_audio_pipeline(tx: mpsc::Sender, device: String) { let rb_capacity = 16_384; let rb = AudioRingBuffer::new(rb_capacity); let (audio_producer, audio_consumer) = rb.split(); - let (audio_thread, device_cfg, _config_rx) = match AudioCaptureThread::spawn(audio_config, audio_producer, device_option) { - Ok(thread_tuple) => thread_tuple, - Err(e) => { - let _ = tx.send(AppEvent::Log(LogLevel::Error, format!("Failed to create audio thread: {}", e))).await; - let _ = tx.send(AppEvent::PipelineStopped).await; - return; - } - }; + let (audio_thread, device_cfg, _config_rx) = + match AudioCaptureThread::spawn(audio_config, audio_producer, device_option) { + Ok(thread_tuple) => thread_tuple, + Err(e) => { + let _ = tx + .send(AppEvent::Log( + LogLevel::Error, + format!("Failed to create audio thread: {}", e), + )) + .await; + let _ = tx.send(AppEvent::PipelineStopped).await; + return; + } + }; - let _ = tx.send(AppEvent::Log(LogLevel::Success, "Audio capture started".to_string())).await; + let _ = tx + .send(AppEvent::Log( + LogLevel::Success, + "Audio capture started".to_string(), + )) + .await; // Broadcast channel for audio frames let (audio_tx, _) = broadcast::channel::(200); @@ -416,25 +432,32 @@ async fn run_audio_pipeline(tx: mpsc::Sender, device: String) { rb_capacity, Some(metrics.clone()), ); - let chunker = AudioChunker::new(frame_reader, audio_tx.clone(), chunker_cfg).with_metrics(metrics.clone()); + let chunker = AudioChunker::new(frame_reader, audio_tx.clone(), chunker_cfg) + .with_metrics(metrics.clone()); let _chunker_handle = chunker.spawn(); let vad_cfg = UnifiedVadConfig { mode: VadMode::Silero, frame_size_samples: FRAME_SIZE_SAMPLES, - sample_rate_hz: SAMPLE_RATE_HZ, // Standard 16kHz - resampler will handle conversion + sample_rate_hz: SAMPLE_RATE_HZ, // Standard 16kHz - resampler will handle conversion ..Default::default() }; let vad_audio_rx = audio_tx.subscribe(); - let _vad_thread = match VadProcessor::spawn(vad_cfg, vad_audio_rx, event_tx, Some(metrics.clone())) { - Ok(h) => h, - Err(e) => { - let _ = tx.send(AppEvent::Log(LogLevel::Error, format!("Failed to spawn VAD: {}", e))).await; - let _ = tx.send(AppEvent::PipelineStopped).await; - return; - } - }; + let _vad_thread = + match VadProcessor::spawn(vad_cfg, vad_audio_rx, event_tx, Some(metrics.clone())) { + Ok(h) => h, + Err(e) => { + let _ = tx + .send(AppEvent::Log( + LogLevel::Error, + format!("Failed to spawn VAD: {}", e), + )) + .await; + let _ = tx.send(AppEvent::PipelineStopped).await; + return; + } + }; let _ = tx.send(AppEvent::PipelineStarted).await; @@ -460,7 +483,8 @@ async fn run_audio_pipeline(tx: mpsc::Sender, device: String) { // STT config let stt_config = TranscriptionConfig { enabled: true, - model_path: std::env::var("VOSK_MODEL_PATH").unwrap_or_else(|_| "models/vosk-model-small-en-us-0.15".to_string()), + model_path: std::env::var("VOSK_MODEL_PATH") + .unwrap_or_else(|_| "models/vosk-model-small-en-us-0.15".to_string()), partial_results: true, max_alternatives: 1, include_words: false, @@ -475,11 +499,18 @@ async fn run_audio_pipeline(tx: mpsc::Sender, device: String) { (Some(stt_transcription_rx), Some(stt_vad_tx)) } Err(e) => { - let _ = tx.send(AppEvent::Log(LogLevel::Error, format!("Failed to create STT processor: {}", e))).await; + let _ = tx + .send(AppEvent::Log( + LogLevel::Error, + format!("Failed to create STT processor: {}", e), + )) + .await; (None, None) } } - } else { (None, None) }; + } else { + (None, None) + }; // Relay VAD events: to UI and to STT (if enabled) let (ui_vad_tx, mut ui_vad_rx) = mpsc::channel::(200); @@ -542,7 +573,12 @@ async fn run_audio_pipeline(tx: mpsc::Sender, device: String) { } } - let _ = tx.send(AppEvent::Log(LogLevel::Info, "Stopping pipeline...".to_string())).await; + let _ = tx + .send(AppEvent::Log( + LogLevel::Info, + "Stopping pipeline...".to_string(), + )) + .await; // Stop audio thread audio_thread.stop(); @@ -561,10 +597,7 @@ fn draw_ui(f: &mut Frame, state: &DashboardState) { let top_chunks = Layout::default() .direction(Direction::Horizontal) - .constraints([ - Constraint::Percentage(60), - Constraint::Percentage(40), - ]) + .constraints([Constraint::Percentage(60), Constraint::Percentage(40)]) .split(main_chunks[0]); draw_audio_levels(f, top_chunks[0], state); @@ -572,10 +605,7 @@ fn draw_ui(f: &mut Frame, state: &DashboardState) { let middle_chunks = Layout::default() .direction(Direction::Horizontal) - .constraints([ - Constraint::Percentage(50), - Constraint::Percentage(50), - ]) + .constraints([Constraint::Percentage(50), Constraint::Percentage(50)]) .split(main_chunks[1]); draw_metrics(f, middle_chunks[0], state); @@ -585,9 +615,7 @@ fn draw_ui(f: &mut Frame, state: &DashboardState) { } fn draw_audio_levels(f: &mut Frame, area: Rect, state: &DashboardState) { - let block = Block::default() - .title("Audio Levels") - .borders(Borders::ALL); + let block = Block::default().title("Audio Levels").borders(Borders::ALL); let inner = block.inner(area); f.render_widget(block, area); @@ -606,15 +634,13 @@ fn draw_audio_levels(f: &mut Frame, area: Rect, state: &DashboardState) { let gauge = Gauge::default() .block(Block::default().title("Level")) - .gauge_style( - if level_percent > 80 { - Style::default().fg(Color::Red) - } else if level_percent > 60 { - Style::default().fg(Color::Yellow) - } else { - Style::default().fg(Color::Green) - } - ) + .gauge_style(if level_percent > 80 { + Style::default().fg(Color::Red) + } else if level_percent > 60 { + Style::default().fg(Color::Yellow) + } else { + Style::default().fg(Color::Green) + }) .percent(level_percent) .label(format!("{:.1} dB", db)); f.render_widget(gauge, chunks[0]); @@ -622,19 +648,21 @@ fn draw_audio_levels(f: &mut Frame, area: Rect, state: &DashboardState) { let rms_scaled = state.metrics.current_rms as f64 / 1000.0; // stored as RMS*1000 let rms_db = if rms_scaled > 0.0 { 20.0 * (rms_scaled / 32767.0).log10() - } else { -90.0 }; + } else { + -90.0 + }; let peak = state.metrics.current_peak as f64; let peak_db = if peak > 0.0 { 20.0 * (peak / 32767.0).log10() - } else { -90.0 }; - + } else { + -90.0 + }; + let db_text = Paragraph::new(format!("Peak: {:.1} dB | RMS: {:.1} dB", peak_db, rms_db)) .alignment(Alignment::Center); f.render_widget(db_text, chunks[1]); - let sparkline_data: Vec = state.level_history.iter() - .map(|&v| v as u64) - .collect(); + let sparkline_data: Vec = state.level_history.iter().map(|&v| v as u64).collect(); let sparkline = Sparkline::default() .block(Block::default().title("History (60 samples)")) @@ -679,29 +707,34 @@ fn draw_pipeline_flow(f: &mut Frame, area: Rect, state: &DashboardState) { }; let indicator = if *active { "●" } else { "○" }; - let count_text = match i { + let count_text = match i { 0 => { - if state.has_metrics_snapshot { format!("{} events", state.metrics.capture_frames) } else { "N/A".to_string() } - }, + if state.has_metrics_snapshot { + format!("{} events", state.metrics.capture_frames) + } else { + "N/A".to_string() + } + } 1 => { - if state.has_metrics_snapshot { format!("{} events", state.metrics.chunker_frames) } else { "N/A".to_string() } - }, + if state.has_metrics_snapshot { + format!("{} events", state.metrics.chunker_frames) + } else { + "N/A".to_string() + } + } 2 => format!("{} events", state.vad_frames), 3 => format!("{} events", state.speech_segments), _ => "".to_string(), }; let text = format!("{} {} [{}]", indicator, name, count_text); - let paragraph = Paragraph::new(text) - .style(Style::default().fg(color)); + let paragraph = Paragraph::new(text).style(Style::default().fg(color)); f.render_widget(paragraph, chunks[i]); } } fn draw_metrics(f: &mut Frame, area: Rect, state: &DashboardState) { - let block = Block::default() - .title("Metrics") - .borders(Borders::ALL); + let block = Block::default().title("Metrics").borders(Borders::ALL); let inner = block.inner(area); f.render_widget(block, area); @@ -710,9 +743,18 @@ fn draw_metrics(f: &mut Frame, area: Rect, state: &DashboardState) { let metrics_text = vec![ Line::from(format!("Runtime: {}s", elapsed)), Line::from(""), - Line::from(format!("Capture FPS: {:.1}", state.metrics.capture_fps as f64 / 10.0)), - Line::from(format!("Chunker FPS: {:.1}", state.metrics.chunker_fps as f64 / 10.0)), - Line::from(format!("VAD FPS: {:.1}", state.metrics.vad_fps as f64 / 10.0)), + Line::from(format!( + "Capture FPS: {:.1}", + state.metrics.capture_fps as f64 / 10.0 + )), + Line::from(format!( + "Chunker FPS: {:.1}", + state.metrics.chunker_fps as f64 / 10.0 + )), + Line::from(format!( + "VAD FPS: {:.1}", + state.metrics.vad_fps as f64 / 10.0 + )), Line::from(""), Line::from("Buffer Fill:"), Line::from(format!(" Capture: {}%", state.metrics.capture_buffer_fill)), @@ -725,9 +767,7 @@ fn draw_metrics(f: &mut Frame, area: Rect, state: &DashboardState) { } fn draw_status(f: &mut Frame, area: Rect, state: &DashboardState) { - let block = Block::default() - .title("Status & VAD") - .borders(Borders::ALL); + let block = Block::default().title("Status & VAD").borders(Borders::ALL); let inner = block.inner(area); f.render_widget(block, area); @@ -746,8 +786,14 @@ fn draw_status(f: &mut Frame, area: Rect, state: &DashboardState) { status_text.push(Line::from(vec![ Span::raw("Pipeline: "), Span::styled( - if state.is_running { "RUNNING" } else { "STOPPED" }, - Style::default().fg(status_color).add_modifier(Modifier::BOLD), + if state.is_running { + "RUNNING" + } else { + "STOPPED" + }, + Style::default() + .fg(status_color) + .add_modifier(Modifier::BOLD), ), ])); status_text.push(Line::from(format!("Device: {}", state.selected_device))); @@ -756,19 +802,32 @@ fn draw_status(f: &mut Frame, area: Rect, state: &DashboardState) { Span::raw("Speaking: "), Span::styled( if state.is_speaking { "YES" } else { "NO" }, - Style::default().fg(if state.is_speaking { Color::Green } else { Color::Gray }), + Style::default().fg(if state.is_speaking { + Color::Green + } else { + Color::Gray + }), ), ])); - status_text.push(Line::from(format!("Speech Segments: {}", state.speech_segments))); + status_text.push(Line::from(format!( + "Speech Segments: {}", + state.speech_segments + ))); status_text.push(Line::from("")); status_text.push(Line::from("Last VAD Event:")); - status_text.push(Line::from(state.last_vad_event.as_deref().unwrap_or("None"))); + status_text.push(Line::from( + state.last_vad_event.as_deref().unwrap_or("None"), + )); #[cfg(feature = "vosk")] { status_text.push(Line::from("")); status_text.push(Line::from("Last Transcript (final):")); let txt = state.last_transcript.as_deref().unwrap_or("None"); - let trunc = if txt.len() > 80 { format!("{}…", &txt[..80]) } else { txt.to_string() }; + let trunc = if txt.len() > 80 { + format!("{}…", &txt[..80]) + } else { + txt.to_string() + }; status_text.push(Line::from(trunc)); } status_text.push(Line::from("")); @@ -780,18 +839,20 @@ fn draw_status(f: &mut Frame, area: Rect, state: &DashboardState) { } fn draw_logs(f: &mut Frame, area: Rect, state: &DashboardState) { - let block = Block::default() - .title("Logs") - .borders(Borders::ALL); + let block = Block::default().title("Logs").borders(Borders::ALL); let inner = block.inner(area); f.render_widget(block, area); - let start_time = state.logs.front() + let start_time = state + .logs + .front() .map(|e| e.timestamp) .unwrap_or_else(Instant::now); - let log_lines: Vec = state.logs.iter() + let log_lines: Vec = state + .logs + .iter() .rev() .take(inner.height as usize) .rev() @@ -810,14 +871,11 @@ fn draw_logs(f: &mut Frame, area: Rect, state: &DashboardState) { format!("[{:7.2}s] ", elapsed), Style::default().fg(Color::Gray), ), - Span::styled( - &entry.message, - Style::default().fg(color), - ), + Span::styled(&entry.message, Style::default().fg(color)), ]) }) .collect(); let paragraph = Paragraph::new(log_lines); f.render_widget(paragraph, inner); -} \ No newline at end of file +} diff --git a/crates/app/src/hotkey/indicator.rs b/crates/app/src/hotkey/indicator.rs index 7d6aeb4b..74d04ebf 100644 --- a/crates/app/src/hotkey/indicator.rs +++ b/crates/app/src/hotkey/indicator.rs @@ -1,5 +1,9 @@ +use crossterm::{ + cursor::MoveTo, + style::{Color, Print, ResetColor, SetBackgroundColor, SetForegroundColor}, + terminal, QueueableCommand, +}; use std::io::{stdout, Write}; -use crossterm::{cursor::MoveTo, style::{Color, Print, ResetColor, SetBackgroundColor, SetForegroundColor}, terminal, QueueableCommand}; /// Simple terminal indicator shown while recording is active. /// Draws a small bar centered horizontally about one-third from the bottom diff --git a/crates/app/src/hotkey/listener.rs b/crates/app/src/hotkey/listener.rs index a4ac80ea..cd837a44 100644 --- a/crates/app/src/hotkey/listener.rs +++ b/crates/app/src/hotkey/listener.rs @@ -1,8 +1,8 @@ +use super::indicator::RecordingIndicator; +use coldvox_vad::types::VadEvent; +use device_query::{DeviceQuery, DeviceState, Keycode}; use std::time::{Duration, Instant}; use tokio::sync::mpsc::Sender; -use device_query::{DeviceQuery, DeviceState, Keycode}; -use coldvox_vad::types::VadEvent; -use super::indicator::RecordingIndicator; /// Spawn a blocking task that listens for Ctrl+Super key combinations /// and emits synthetic `VadEvent`s to control the STT pipeline. @@ -24,14 +24,21 @@ pub fn spawn_hotkey_listener(event_tx: Sender) -> tokio::task::JoinHan start = Instant::now(); indicator.show(); let ts = app_start.elapsed().as_millis() as u64; - let _ = event_tx.blocking_send(VadEvent::SpeechStart { timestamp_ms: ts, energy_db: 0.0 }); + let _ = event_tx.blocking_send(VadEvent::SpeechStart { + timestamp_ms: ts, + energy_db: 0.0, + }); } } else if active { active = false; indicator.hide(); let duration = start.elapsed().as_millis() as u64; let ts = app_start.elapsed().as_millis() as u64; - let _ = event_tx.blocking_send(VadEvent::SpeechEnd { timestamp_ms: ts, duration_ms: duration, energy_db: 0.0 }); + let _ = event_tx.blocking_send(VadEvent::SpeechEnd { + timestamp_ms: ts, + duration_ms: duration, + energy_db: 0.0, + }); } std::thread::sleep(Duration::from_millis(20)); } diff --git a/crates/app/src/lib.rs b/crates/app/src/lib.rs index 7fa2c42e..9297827c 100644 --- a/crates/app/src/lib.rs +++ b/crates/app/src/lib.rs @@ -1,8 +1,9 @@ pub mod audio; +pub mod foundation; +pub mod hotkey; pub mod probes; pub mod stt; +pub mod telemetry; +#[cfg(feature = "text-injection")] pub mod text_injection; -pub mod hotkey; pub mod vad; -pub mod telemetry; -pub mod foundation; diff --git a/crates/app/src/main.rs b/crates/app/src/main.rs index 209de567..cf3a8888 100644 --- a/crates/app/src/main.rs +++ b/crates/app/src/main.rs @@ -4,28 +4,33 @@ // - The logs/ directory is created on startup if missing; file output uses a non-blocking writer. // - This ensures persistent logs for post-run analysis while keeping console output for live use. use anyhow::anyhow; -use coldvox_audio::{AudioChunker, ChunkerConfig, AudioRingBuffer, AudioCaptureThread, FrameReader}; -use coldvox_foundation::*; +#[cfg(feature = "vosk")] +use coldvox_app::stt::persistence::{ + AudioFormat, PersistenceConfig, SessionMetadata, TranscriptFormat, +}; use coldvox_app::stt::TranscriptionConfig; #[cfg(feature = "vosk")] use coldvox_app::stt::{processor::SttProcessor, TranscriptionEvent}; -#[cfg(feature = "vosk")] -use coldvox_app::stt::persistence::{PersistenceConfig, TranscriptFormat, AudioFormat, SessionMetadata}; +use coldvox_audio::{ + AudioCaptureThread, AudioChunker, AudioRingBuffer, ChunkerConfig, FrameReader, +}; +use coldvox_foundation::*; -use coldvox_vad::{UnifiedVadConfig, VadMode, FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ, VadEvent}; -use coldvox_telemetry::PipelineMetrics; +#[cfg(feature = "text-injection")] +use clap::Args; +use clap::{Parser, ValueEnum}; use coldvox_app::hotkey::spawn_hotkey_listener; #[cfg(feature = "text-injection")] use coldvox_app::text_injection::{AsyncInjectionProcessor, InjectionConfig}; +use coldvox_telemetry::PipelineMetrics; +use coldvox_vad::{UnifiedVadConfig, VadEvent, VadMode, FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; use std::time::Duration; -use clap::{Parser, ValueEnum}; -#[cfg(feature = "text-injection")] -use clap::Args; use tokio::sync::{broadcast, mpsc}; use tracing_appender::rolling::{RollingFileAppender, Rotation}; use tracing_subscriber::{fmt, prelude::*, EnvFilter}; -fn init_logging() -> Result> { +fn init_logging() -> Result> +{ std::fs::create_dir_all("logs")?; let file_appender = RollingFileAppender::new(Rotation::DAILY, "logs", "coldvox.log"); let (non_blocking_file, guard) = tracing_appender::non_blocking(file_appender); @@ -123,7 +128,10 @@ struct InjectionArgs { allow_mki: bool, /// Attempt injection even if the focused application is unknown - #[arg(long = "inject-on-unknown-focus", env = "COLDVOX_INJECT_ON_UNKNOWN_FOCUS")] + #[arg( + long = "inject-on-unknown-focus", + env = "COLDVOX_INJECT_ON_UNKNOWN_FOCUS" + )] inject_on_unknown_focus: bool, /// Restore clipboard contents after injection @@ -156,8 +164,12 @@ async fn main() -> Result<(), Box> { let cli = Cli::parse(); // Apply environment variable overrides - let device = cli.device.clone().or_else(|| std::env::var("COLDVOX_DEVICE").ok()); - let resampler_quality = std::env::var("COLDVOX_RESAMPLER_QUALITY").unwrap_or(cli.resampler_quality.clone()); + let device = cli + .device + .clone() + .or_else(|| std::env::var("COLDVOX_DEVICE").ok()); + let resampler_quality = + std::env::var("COLDVOX_RESAMPLER_QUALITY").unwrap_or(cli.resampler_quality.clone()); if cli.list_devices { let dm = coldvox_audio::DeviceManager::new()?; @@ -210,14 +222,13 @@ async fn main() -> Result<(), Box> { // --- 3. VAD Processor --- let vad_cfg = UnifiedVadConfig { mode: VadMode::Silero, - frame_size_samples: FRAME_SIZE_SAMPLES, // Both Silero and Level3 use 512 samples - sample_rate_hz: SAMPLE_RATE_HZ, // Standard 16kHz - resampler will handle conversion + frame_size_samples: FRAME_SIZE_SAMPLES, // Both Silero and Level3 use 512 samples + sample_rate_hz: SAMPLE_RATE_HZ, // Standard 16kHz - resampler will handle conversion ..Default::default() }; // This broadcast channel will distribute audio frames to all interested components. - let (audio_tx, _) = - broadcast::channel::(200); + let (audio_tx, _) = broadcast::channel::(200); let chunker = AudioChunker::new(frame_reader, audio_tx.clone(), chunker_cfg) .with_metrics(metrics.clone()) .with_device_config(device_config_rx.resubscribe()); @@ -301,11 +312,12 @@ async fn main() -> Result<(), Box> { #[cfg(feature = "vosk")] let (stt_handle, _persistence_handle, injection_handle) = if stt_config.enabled { // Create mpsc channel for STT processor to send transcription events - let (stt_transcription_tx, mut stt_transcription_rx) = mpsc::channel::(100); - + let (stt_transcription_tx, mut stt_transcription_rx) = + mpsc::channel::(100); + // Create broadcast channel for distributing transcription events to multiple consumers let (broadcast_tx, _) = broadcast::channel::(100); - + // Relay from STT processor to broadcast channel let broadcast_tx_clone = broadcast_tx.clone(); tokio::spawn(async move { @@ -313,7 +325,7 @@ async fn main() -> Result<(), Box> { let _ = broadcast_tx_clone.send(event); } }); - + // Create mpsc channel for injection processor let (injection_tx, injection_rx) = mpsc::channel::(100); let mut injection_relay_rx = broadcast_tx.subscribe(); @@ -322,7 +334,7 @@ async fn main() -> Result<(), Box> { let _ = injection_tx.send(event).await; } }); - + // Create mpsc channel for persistence if needed let persistence_rx = if cli.save_transcriptions { let (persist_tx, persist_rx) = mpsc::channel::(100); @@ -354,8 +366,13 @@ async fn main() -> Result<(), Box> { }); let stt_audio_rx = audio_tx.subscribe(); - let stt_processor = SttProcessor::new(stt_audio_rx, stt_event_rx, stt_transcription_tx, stt_config.clone()) - .map_err(|e| anyhow!("Failed to create STT processor: {}", e))?; + let stt_processor = SttProcessor::new( + stt_audio_rx, + stt_event_rx, + stt_transcription_tx, + stt_config.clone(), + ) + .map_err(|e| anyhow!("Failed to create STT processor: {}", e))?; // --- 5. Text Injection Processor --- #[cfg(feature = "text-injection")] @@ -368,9 +385,18 @@ async fn main() -> Result<(), Box> { allow_mki: cli.injection.allow_mki, restore_clipboard: cli.injection.restore_clipboard, inject_on_unknown_focus: cli.injection.inject_on_unknown_focus, - max_total_latency_ms: cli.injection.max_total_latency_ms.unwrap_or(InjectionConfig::default().max_total_latency_ms), - per_method_timeout_ms: cli.injection.per_method_timeout_ms.unwrap_or(InjectionConfig::default().per_method_timeout_ms), - cooldown_initial_ms: cli.injection.cooldown_initial_ms.unwrap_or(InjectionConfig::default().cooldown_initial_ms), + max_total_latency_ms: cli + .injection + .max_total_latency_ms + .unwrap_or(InjectionConfig::default().max_total_latency_ms), + per_method_timeout_ms: cli + .injection + .per_method_timeout_ms + .unwrap_or(InjectionConfig::default().per_method_timeout_ms), + cooldown_initial_ms: cli + .injection + .cooldown_initial_ms + .unwrap_or(InjectionConfig::default().cooldown_initial_ms), ..Default::default() }; @@ -380,7 +406,7 @@ async fn main() -> Result<(), Box> { injection_config, injection_rx, injection_shutdown_rx, - Some(metrics.clone()), + None, ); // Spawn injection processor @@ -394,7 +420,7 @@ async fn main() -> Result<(), Box> { tracing::info!("Text injection disabled."); None }; - + #[cfg(not(feature = "text-injection"))] let injection_handle: Option> = None; @@ -453,8 +479,15 @@ async fn main() -> Result<(), Box> { #[cfg(not(feature = "vosk"))] let persistence_handle = None; - tracing::info!("STT processor task started with model: {}", stt_config.model_path); - (Some(tokio::spawn(stt_processor.run())), persistence_handle, injection_handle) + tracing::info!( + "STT processor task started with model: {}", + stt_config.model_path + ); + ( + Some(tokio::spawn(stt_processor.run())), + persistence_handle, + injection_handle, + ) } else { tracing::info!("STT processor disabled - no model available"); (None, None, None) @@ -463,15 +496,19 @@ async fn main() -> Result<(), Box> { #[cfg(not(feature = "vosk"))] let (stt_handle, _persistence_handle, injection_handle) = { tracing::info!("STT processor disabled - no vosk feature"); - + // Consume VAD events even when STT is disabled to prevent channel backpressure tokio::spawn(async move { while let Some(_event) = event_rx.recv().await { // Just consume the events - no STT processing when vosk is disabled } }); - - (None::>, None::>, None::>) + + ( + None::>, + None::>, + None::>, + ) }; // --- Main Application Loop --- @@ -512,7 +549,7 @@ async fn main() -> Result<(), Box> { if let Some(tx) = injection_shutdown_tx { let _ = tx.send(()).await; } - + // 3. Abort the tasks. This will drop their channel senders, causing downstream // tasks with `recv()` loops to terminate gracefully. chunker_handle.abort(); @@ -540,4 +577,4 @@ async fn main() -> Result<(), Box> { tracing::info!("Shutdown complete"); Ok(()) -} \ No newline at end of file +} diff --git a/crates/app/src/probes/common.rs b/crates/app/src/probes/common.rs index 6c68e826..b699fc4f 100644 --- a/crates/app/src/probes/common.rs +++ b/crates/app/src/probes/common.rs @@ -39,7 +39,10 @@ pub struct TestContext { impl TestContext { pub fn new_seconds(duration_secs: u64) -> Self { - Self { duration: Duration::from_secs(duration_secs), ..Default::default() } + Self { + duration: Duration::from_secs(duration_secs), + ..Default::default() + } } } diff --git a/crates/app/src/probes/foundation.rs b/crates/app/src/probes/foundation.rs index ba46d495..9f2d344b 100644 --- a/crates/app/src/probes/foundation.rs +++ b/crates/app/src/probes/foundation.rs @@ -1,5 +1,3 @@ - - #[derive(Default)] pub struct FoundationHealth; @@ -19,4 +17,4 @@ pub struct FoundationHealth; // artifacts: vec![], // }) // } -// } \ No newline at end of file +// } diff --git a/crates/app/src/probes/mic_capture.rs b/crates/app/src/probes/mic_capture.rs index 3156b303..c4d2637e 100644 --- a/crates/app/src/probes/mic_capture.rs +++ b/crates/app/src/probes/mic_capture.rs @@ -1,9 +1,9 @@ -use std::sync::Arc; -use coldvox_telemetry::PipelineMetrics; use crate::probes::MicCaptureThresholds; +use coldvox_telemetry::PipelineMetrics; +use std::sync::Arc; use super::common::{LiveTestResult, TestContext, TestError, TestErrorKind}; -use coldvox_audio::{AudioCaptureThread, FrameReader, AudioRingBuffer}; +use coldvox_audio::{AudioCaptureThread, AudioRingBuffer, FrameReader}; use coldvox_foundation::{AudioConfig, AudioError}; use serde_json::json; use std::collections::HashMap; @@ -23,13 +23,16 @@ impl MicCaptureCheck { // Prepare ring buffer and spawn capture thread let rb = AudioRingBuffer::new(16_384); let (audio_producer, audio_consumer) = rb.split(); - let (capture_thread, dev_cfg, _config_rx) = AudioCaptureThread::spawn(config, audio_producer, device_name).map_err(|e| TestError { - kind: match e { - AudioError::DeviceNotFound { .. } => TestErrorKind::Device, - _ => TestErrorKind::Setup, - }, - message: format!("Failed to create audio capture thread: {}", e), - })?; + let (capture_thread, dev_cfg, _config_rx) = + AudioCaptureThread::spawn(config, audio_producer, device_name).map_err(|e| { + TestError { + kind: match e { + AudioError::DeviceNotFound { .. } => TestErrorKind::Device, + _ => TestErrorKind::Setup, + }, + message: format!("Failed to create audio capture thread: {}", e), + } + })?; tokio::time::sleep(Duration::from_millis(200)).await; // Give the thread time to start @@ -38,18 +41,18 @@ impl MicCaptureCheck { // Add optional logging of metrics every 30s let metrics_clone = metrics.clone(); - let log_handle = tokio::spawn(async move { + let log_handle = tokio::spawn(async move { let mut interval = interval(Duration::from_secs(30)); loop { interval.tick().await; - let capture_fps = metrics_clone.capture_fps.load(Ordering::Relaxed); + let capture_fps = metrics_clone.capture_fps.load(Ordering::Relaxed); let capture_fill = metrics_clone.capture_buffer_fill.load(Ordering::Relaxed); tracing::info!( - target: "mic_capture", - "Capture FPS: {}, Capture Buffer Fill: {}%", - capture_fps, - capture_fill - ); + target: "mic_capture", + "Capture FPS: {}, Capture Buffer Fill: {}%", + capture_fps, + capture_fill + ); } }); @@ -59,7 +62,7 @@ impl MicCaptureCheck { tokio::pin!(timeout); // Build a single reader for the duration of the test - let mut reader = FrameReader::new( + let mut reader = FrameReader::new( audio_consumer, dev_cfg.sample_rate, dev_cfg.channels, @@ -94,8 +97,8 @@ impl MicCaptureCheck { metrics.insert("frames_captured".to_string(), json!(frames_count)); metrics.insert("frames_per_sec".to_string(), json!(frames_per_sec)); metrics.insert("duration_secs".to_string(), json!(elapsed.as_secs_f64())); - metrics.insert("device_sample_rate".to_string(), json!(dev_cfg.sample_rate)); - metrics.insert("device_channels".to_string(), json!(dev_cfg.channels)); + metrics.insert("device_sample_rate".to_string(), json!(dev_cfg.sample_rate)); + metrics.insert("device_channels".to_string(), json!(dev_cfg.channels)); let default_thresholds = MicCaptureThresholds { max_drop_rate_error: Some(0.20), @@ -105,7 +108,9 @@ impl MicCaptureCheck { watchdog_must_be_false: Some(false), }; - let thresholds = ctx.thresholds.as_ref() + let thresholds = ctx + .thresholds + .as_ref() .map(|t| &t.mic_capture) .unwrap_or(&default_thresholds); @@ -121,34 +126,54 @@ impl MicCaptureCheck { } } -pub fn evaluate_mic_capture(metrics: &HashMap, thresholds: &MicCaptureThresholds) -> (bool, String) { +pub fn evaluate_mic_capture( + metrics: &HashMap, + thresholds: &MicCaptureThresholds, +) -> (bool, String) { let mut pass = true; let mut failures = vec![]; - let frames_per_sec = metrics.get("frames_per_sec").and_then(|v| v.as_f64()).unwrap_or(0.0); - let frames_captured = metrics.get("frames_captured").and_then(|v| v.as_u64()).unwrap_or(0); - let duration_secs = metrics.get("duration_secs").and_then(|v| v.as_f64()).unwrap_or(0.0); + let frames_per_sec = metrics + .get("frames_per_sec") + .and_then(|v| v.as_f64()) + .unwrap_or(0.0); + let frames_captured = metrics + .get("frames_captured") + .and_then(|v| v.as_u64()) + .unwrap_or(0); + let duration_secs = metrics + .get("duration_secs") + .and_then(|v| v.as_f64()) + .unwrap_or(0.0); if let Some(min_fps) = thresholds.frames_per_sec_min { if frames_per_sec < min_fps { pass = false; - failures.push(format!("FPS {:.1} below minimum {:.1}", frames_per_sec, min_fps)); + failures.push(format!( + "FPS {:.1} below minimum {:.1}", + frames_per_sec, min_fps + )); } } if let Some(max_fps) = thresholds.frames_per_sec_max { if frames_per_sec > max_fps { pass = false; - failures.push(format!("fps {:.1} exceeds maximum {:.1}", frames_per_sec, max_fps)); + failures.push(format!( + "fps {:.1} exceeds maximum {:.1}", + frames_per_sec, max_fps + )); } } let notes = if failures.is_empty() { - format!("All checks passed. Captured {} frames in {:.1}s at {:.1} FPS", - frames_captured, duration_secs, frames_per_sec) + format!( + "All checks passed. Captured {} frames in {:.1}s at {:.1} FPS", + frames_captured, duration_secs, frames_per_sec + ) } else { failures.join("; ") }; (pass, notes) -} \ No newline at end of file +} diff --git a/crates/app/src/probes/mod.rs b/crates/app/src/probes/mod.rs index 8ceaf6e2..63464c2c 100644 --- a/crates/app/src/probes/mod.rs +++ b/crates/app/src/probes/mod.rs @@ -1,13 +1,15 @@ pub mod common; +pub mod foundation; pub mod mic_capture; -pub mod thresholds; -pub mod vad_mic; pub mod record_to_wav; -pub mod foundation; +#[cfg(feature = "text-injection")] pub mod text_injection; +pub mod thresholds; +pub mod vad_mic; pub use common::{LiveTestResult, TestContext, TestError, TestErrorKind}; pub use mic_capture::MicCaptureCheck; -pub use thresholds::{Thresholds, MicCaptureThresholds}; +#[cfg(feature = "text-injection")] +pub use text_injection::TextInjectionProbe; +pub use thresholds::{MicCaptureThresholds, Thresholds}; pub use vad_mic::VadMicCheck; -pub use text_injection::TextInjectionProbe; \ No newline at end of file diff --git a/crates/app/src/probes/record_to_wav.rs b/crates/app/src/probes/record_to_wav.rs index 01d29ec4..e358bf56 100644 --- a/crates/app/src/probes/record_to_wav.rs +++ b/crates/app/src/probes/record_to_wav.rs @@ -1,5 +1,3 @@ - - pub struct RecordToWav; // TODO: Implement proper LiveTest trait when available @@ -18,4 +16,4 @@ pub struct RecordToWav; // artifacts: vec![], // }) // } -// } \ No newline at end of file +// } diff --git a/crates/app/src/probes/text_injection.rs b/crates/app/src/probes/text_injection.rs index 84e1e2ef..9bd2b33c 100644 --- a/crates/app/src/probes/text_injection.rs +++ b/crates/app/src/probes/text_injection.rs @@ -1,10 +1,10 @@ -use std::sync::Arc; -use coldvox_telemetry::pipeline_metrics::PipelineMetrics; +use crate::probes::common::{LiveTestResult, TestContext, TestError}; use crate::text_injection::manager::StrategyManager; use crate::text_injection::types::{InjectionConfig, InjectionMetrics}; -use crate::probes::common::{LiveTestResult, TestContext, TestError}; +use coldvox_telemetry::pipeline_metrics::PipelineMetrics; use serde_json::json; use std::collections::HashMap; +use std::sync::Arc; #[derive(Debug)] pub struct TextInjectionProbe; @@ -12,31 +12,40 @@ pub struct TextInjectionProbe; impl TextInjectionProbe { pub async fn run(_ctx: &TestContext) -> Result { let config = InjectionConfig::default(); - + // Create shared metrics let _metrics = Arc::new(PipelineMetrics::default()); let injection_metrics = Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - + // Create strategy manager let mut manager = StrategyManager::new(config, injection_metrics.clone()); - + // Test basic injection let start_time = std::time::Instant::now(); let result = manager.inject("Test injection").await; let duration = start_time.elapsed().as_millis() as u64; - + // Collect metrics let injection_metrics_guard = injection_metrics.lock().unwrap(); let mut metrics_map = HashMap::new(); metrics_map.insert("success".to_string(), json!(result.is_ok())); metrics_map.insert("duration_ms".to_string(), json!(duration)); - metrics_map.insert("attempts".to_string(), json!(injection_metrics_guard.attempts)); - metrics_map.insert("successes".to_string(), json!(injection_metrics_guard.successes)); - metrics_map.insert("failures".to_string(), json!(injection_metrics_guard.failures)); - + metrics_map.insert( + "attempts".to_string(), + json!(injection_metrics_guard.attempts), + ); + metrics_map.insert( + "successes".to_string(), + json!(injection_metrics_guard.successes), + ); + metrics_map.insert( + "failures".to_string(), + json!(injection_metrics_guard.failures), + ); + // Evaluate results let (pass, notes) = evaluate_injection_result(&result, &metrics_map); - + Ok(LiveTestResult { test: "text_injection".to_string(), pass, @@ -47,11 +56,13 @@ impl TextInjectionProbe { } } -fn evaluate_injection_result(result: &Result<(), crate::text_injection::types::InjectionError>, - metrics: &HashMap) -> (bool, String) { +fn evaluate_injection_result( + result: &Result<(), crate::text_injection::types::InjectionError>, + metrics: &HashMap, +) -> (bool, String) { let mut pass = true; let mut issues = Vec::new(); - + match result { Ok(()) => { // Check if metrics are reasonable @@ -61,7 +72,7 @@ fn evaluate_injection_result(result: &Result<(), crate::text_injection::types::I issues.push(format!("Expected 1 success, got {}", successes)); } } - + if let Some(attempts) = metrics.get("attempts").and_then(|v| v.as_u64()) { if attempts != 1 { pass = false; @@ -74,12 +85,12 @@ fn evaluate_injection_result(result: &Result<(), crate::text_injection::types::I issues.push(format!("Injection failed: {}", e)); } } - + let notes = if issues.is_empty() { "Text injection test completed successfully".to_string() } else { format!("Issues found: {}", issues.join("; ")) }; - + (pass, notes) -} \ No newline at end of file +} diff --git a/crates/app/src/probes/vad_mic.rs b/crates/app/src/probes/vad_mic.rs index 174f04f1..bd15ddb5 100644 --- a/crates/app/src/probes/vad_mic.rs +++ b/crates/app/src/probes/vad_mic.rs @@ -1,16 +1,16 @@ -use std::sync::Arc; use coldvox_telemetry::pipeline_metrics::PipelineMetrics; +use std::sync::Arc; use super::common::{LiveTestResult, TestContext, TestError, TestErrorKind}; +use crate::audio::vad_processor::VadProcessor; use coldvox_audio::capture::AudioCaptureThread; +use coldvox_audio::chunker::AudioFrame as VadFrame; use coldvox_audio::chunker::{AudioChunker, ChunkerConfig, ResamplerQuality}; use coldvox_audio::frame_reader::FrameReader; use coldvox_audio::ring_buffer::AudioRingBuffer; -use coldvox_audio::chunker::AudioFrame as VadFrame; -use crate::audio::vad_processor::VadProcessor; -use coldvox_vad::types::VadEvent; -use coldvox_vad::config::{UnifiedVadConfig, VadMode}; use coldvox_foundation::error::AudioConfig; +use coldvox_vad::config::{UnifiedVadConfig, VadMode}; +use coldvox_vad::types::VadEvent; use serde_json::json; use std::collections::HashMap; use std::time::{Duration, Instant}; @@ -29,10 +29,13 @@ impl VadMicCheck { // Prepare ring buffer and spawn capture thread let rb = AudioRingBuffer::new(16_384); let (audio_producer, audio_consumer) = rb.split(); - let (capture_thread, dev_cfg, _config_rx) = AudioCaptureThread::spawn(config, audio_producer, device_name).map_err(|e| TestError { - kind: TestErrorKind::Setup, - message: format!("Failed to create audio capture thread: {}", e), - })?; + let (capture_thread, dev_cfg, _config_rx) = + AudioCaptureThread::spawn(config, audio_producer, device_name).map_err(|e| { + TestError { + kind: TestErrorKind::Setup, + message: format!("Failed to create audio capture thread: {}", e), + } + })?; tokio::time::sleep(Duration::from_millis(200)).await; // Give the thread time to start @@ -45,11 +48,21 @@ impl VadMicCheck { let mut interval = tokio::time::interval(Duration::from_secs(30)); loop { interval.tick().await; - let cap_fps = metrics_clone.capture_fps.load(std::sync::atomic::Ordering::Relaxed); - let chk_fps = metrics_clone.chunker_fps.load(std::sync::atomic::Ordering::Relaxed); - let vad_fps = metrics_clone.vad_fps.load(std::sync::atomic::Ordering::Relaxed); - let cap_fill = metrics_clone.capture_buffer_fill.load(std::sync::atomic::Ordering::Relaxed); - let chk_fill = metrics_clone.chunker_buffer_fill.load(std::sync::atomic::Ordering::Relaxed); + let cap_fps = metrics_clone + .capture_fps + .load(std::sync::atomic::Ordering::Relaxed); + let chk_fps = metrics_clone + .chunker_fps + .load(std::sync::atomic::Ordering::Relaxed); + let vad_fps = metrics_clone + .vad_fps + .load(std::sync::atomic::Ordering::Relaxed); + let cap_fill = metrics_clone + .capture_buffer_fill + .load(std::sync::atomic::Ordering::Relaxed); + let chk_fill = metrics_clone + .chunker_buffer_fill + .load(std::sync::atomic::Ordering::Relaxed); tracing::info!( target: "vad_mic", "FPS c:{} ch:{} vad:{} | Fill c:{}% ch:{}%", @@ -82,27 +95,28 @@ impl VadMicCheck { let vad_cfg = UnifiedVadConfig { mode: VadMode::Silero, frame_size_samples: 512, - sample_rate_hz: 16000, // Silero requires 16kHz - resampler will handle conversion + sample_rate_hz: 16000, // Silero requires 16kHz - resampler will handle conversion ..Default::default() }; let vad_audio_rx = audio_tx.subscribe(); - let vad_handle = match VadProcessor::spawn(vad_cfg, vad_audio_rx, event_tx, Some(metrics.clone())) { - Ok(h) => h, - Err(e) => { - capture_thread.stop(); - chunker_handle.abort(); - return Err(TestError { - kind: TestErrorKind::Internal, - message: format!("Failed to spawn VAD processor: {}", e), - }); - } - }; + let vad_handle = + match VadProcessor::spawn(vad_cfg, vad_audio_rx, event_tx, Some(metrics.clone())) { + Ok(h) => h, + Err(e) => { + capture_thread.stop(); + chunker_handle.abort(); + return Err(TestError { + kind: TestErrorKind::Internal, + message: format!("Failed to spawn VAD processor: {}", e), + }); + } + }; // Collect VAD events during the test let start_time = Instant::now(); let mut vad_events = Vec::new(); - let mut speech_segments = 0; + let mut speech_segments = 0; let mut total_speech_duration_ms = 0; let timeout = tokio::time::sleep(duration); @@ -130,7 +144,7 @@ impl VadMicCheck { capture_thread.stop(); chunker_handle.abort(); vad_handle.abort(); - log_handle.abort(); + log_handle.abort(); let elapsed = start_time.elapsed(); @@ -138,10 +152,16 @@ impl VadMicCheck { let mut metrics = HashMap::new(); metrics.insert("vad_events_count".to_string(), json!(vad_events.len())); metrics.insert("speech_segments".to_string(), json!(speech_segments)); - metrics.insert("total_speech_duration_ms".to_string(), json!(total_speech_duration_ms)); - metrics.insert("test_duration_secs".to_string(), json!(elapsed.as_secs_f64())); - metrics.insert("device_sample_rate".to_string(), json!(dev_cfg.sample_rate)); - metrics.insert("device_channels".to_string(), json!(dev_cfg.channels)); + metrics.insert( + "total_speech_duration_ms".to_string(), + json!(total_speech_duration_ms), + ); + metrics.insert( + "test_duration_secs".to_string(), + json!(elapsed.as_secs_f64()), + ); + metrics.insert("device_sample_rate".to_string(), json!(dev_cfg.sample_rate)); + metrics.insert("device_channels".to_string(), json!(dev_cfg.channels)); // Calculate speech ratio let speech_ratio = if elapsed.as_millis() > 0 { @@ -164,16 +184,25 @@ impl VadMicCheck { } } -fn evaluate_vad_performance(metrics: &HashMap, events: &[(u64, VadEvent)]) -> (bool, String) { +fn evaluate_vad_performance( + metrics: &HashMap, + events: &[(u64, VadEvent)], +) -> (bool, String) { let mut pass = true; let mut issues = Vec::new(); - let events_count = metrics.get("vad_events_count") - .and_then(|v| v.as_u64()).unwrap_or(0); - let speech_segments = metrics.get("speech_segments") - .and_then(|v| v.as_u64()).unwrap_or(0); - let speech_ratio = metrics.get("speech_ratio") - .and_then(|v| v.as_f64()).unwrap_or(0.0); + let events_count = metrics + .get("vad_events_count") + .and_then(|v| v.as_u64()) + .unwrap_or(0); + let speech_segments = metrics + .get("speech_segments") + .and_then(|v| v.as_u64()) + .unwrap_or(0); + let speech_ratio = metrics + .get("speech_ratio") + .and_then(|v| v.as_f64()) + .unwrap_or(0.0); // Check for basic VAD functionality if events_count == 0 { @@ -184,18 +213,33 @@ fn evaluate_vad_performance(metrics: &HashMap, events // Check for reasonable speech ratio (not too high or too low) if speech_ratio > 0.9 { pass = false; - issues.push(format!("Speech ratio too high ({:.1}%) - may indicate over-sensitive VAD", speech_ratio * 100.0)); + issues.push(format!( + "Speech ratio too high ({:.1}%) - may indicate over-sensitive VAD", + speech_ratio * 100.0 + )); } else if speech_ratio < 0.01 && events_count > 0 { - issues.push(format!("Very low speech ratio ({:.3}%) - VAD may be too conservative", speech_ratio * 100.0)); + issues.push(format!( + "Very low speech ratio ({:.3}%) - VAD may be too conservative", + speech_ratio * 100.0 + )); } // Check for balanced speech/silence events - let speech_starts = events.iter().filter(|(_, e)| matches!(e, VadEvent::SpeechStart { .. })).count(); - let speech_ends = events.iter().filter(|(_, e)| matches!(e, VadEvent::SpeechEnd { .. })).count(); + let speech_starts = events + .iter() + .filter(|(_, e)| matches!(e, VadEvent::SpeechStart { .. })) + .count(); + let speech_ends = events + .iter() + .filter(|(_, e)| matches!(e, VadEvent::SpeechEnd { .. })) + .count(); if speech_starts != speech_ends { pass = false; - issues.push(format!("Unbalanced VAD events: {} starts, {} ends", speech_starts, speech_ends)); + issues.push(format!( + "Unbalanced VAD events: {} starts, {} ends", + speech_starts, speech_ends + )); } // Check for minimum speech segments if any speech detected @@ -215,4 +259,4 @@ fn evaluate_vad_performance(metrics: &HashMap, events }; (pass, notes) -} \ No newline at end of file +} diff --git a/crates/app/src/stt/mod.rs b/crates/app/src/stt/mod.rs index e509a839..91a5ddf5 100644 --- a/crates/app/src/stt/mod.rs +++ b/crates/app/src/stt/mod.rs @@ -2,12 +2,8 @@ // Re-export core STT types from the new crate pub use coldvox_stt::{ - next_utterance_id, - EventBasedTranscriber, - Transcriber, - TranscriptionConfig, - TranscriptionEvent, - WordInfo + next_utterance_id, EventBasedTranscriber, Transcriber, TranscriptionConfig, TranscriptionEvent, + WordInfo, }; #[cfg(feature = "vosk")] diff --git a/crates/app/src/stt/persistence.rs b/crates/app/src/stt/persistence.rs index b82e9ada..ef689122 100644 --- a/crates/app/src/stt/persistence.rs +++ b/crates/app/src/stt/persistence.rs @@ -1,11 +1,11 @@ -use std::path::{Path, PathBuf}; -use std::sync::Arc; -use std::fs; use chrono::{DateTime, Local, TimeZone}; -use serde::{Serialize, Deserialize}; -use hound::{WavWriter, WavSpec}; -use parking_lot::Mutex; use csv::Writer; +use hound::{WavSpec, WavWriter}; +use parking_lot::Mutex; +use serde::{Deserialize, Serialize}; +use std::fs; +use std::path::{Path, PathBuf}; +use std::sync::Arc; use crate::stt::TranscriptionEvent; use coldvox_audio::chunker::AudioFrame; @@ -151,19 +151,21 @@ impl TranscriptionWriter { // Create output directory structure let timestamp = Local::now(); - let date_dir = config.output_dir.join(timestamp.format("%Y-%m-%d").to_string()); + let date_dir = config + .output_dir + .join(timestamp.format("%Y-%m-%d").to_string()); let session_id = format!("{}", timestamp.format("%H%M%S")); let session_dir = date_dir.join(&session_id); - + fs::create_dir_all(&session_dir) .map_err(|e| format!("Failed to create session directory: {}", e))?; - + // Create subdirectories if config.save_audio { fs::create_dir_all(session_dir.join("audio")) .map_err(|e| format!("Failed to create audio directory: {}", e))?; } - + let session = TranscriptionSession { session_id: session_id.clone(), started_at: timestamp.to_rfc3339(), @@ -171,14 +173,14 @@ impl TranscriptionWriter { utterances: Vec::new(), metadata, }; - + // Save initial session manifest let manifest_path = session_dir.join("session.json"); let manifest_json = serde_json::to_string_pretty(&session) .map_err(|e| format!("Failed to serialize session: {}", e))?; fs::write(&manifest_path, manifest_json) .map_err(|e| format!("Failed to write session manifest: {}", e))?; - + Ok(Self { config, current_session: Arc::new(Mutex::new(session)), @@ -189,27 +191,28 @@ impl TranscriptionWriter { last_speech_duration_ms: Arc::new(Mutex::new(None)), }) } - + /// Handle audio frame for potential saving pub fn handle_audio_frame(&self, frame: &AudioFrame) { if !self.config.enabled || !self.config.save_audio { return; } - + let is_active = *self.utterance_active.lock(); if is_active { // Convert f32 samples back to i16 - let i16_samples: Vec = frame.samples + let i16_samples: Vec = frame + .samples .iter() .map(|&s| (s * i16::MAX as f32) as i16) .collect(); - + // Accumulate audio for current utterance let mut audio = self.current_utterance_audio.lock(); audio.extend_from_slice(&i16_samples); } } - + /// Handle VAD event pub fn handle_vad_event(&self, event: &VadEvent) { if !self.config.enabled { @@ -230,14 +233,19 @@ impl TranscriptionWriter { *self.last_speech_duration_ms.lock() = Some(*duration_ms); } } - } /// Handle transcription event + } + /// Handle transcription event pub async fn handle_transcription(&self, event: &TranscriptionEvent) -> Result<(), String> { if !self.config.enabled { return Ok(()); } match event { - TranscriptionEvent::Final { utterance_id, text, words } => { + TranscriptionEvent::Final { + utterance_id, + text, + words, + } => { // Get timing information from VAD events let (start_ms, duration_ms) = { let start_lock = self.last_speech_start_ms.lock(); @@ -246,28 +254,32 @@ impl TranscriptionWriter { }; // Calculate actual timestamps using VAD timing relative to session start - let (started_at, ended_at) = if let (Some(start_ms), Some(duration_ms)) = (start_ms, duration_ms) { - // Parse session start time - let session_start = DateTime::parse_from_rfc3339(&self.current_session.lock().started_at) - .map_err(|e| format!("Failed to parse session start time: {}", e)) - .ok() - .and_then(|dt| Some(dt.with_timezone(&Local))); - - if let Some(session_start) = session_start { - // VAD timestamps are relative to session start - let start_time = session_start + chrono::Duration::milliseconds(start_ms as i64); - let end_time = start_time + chrono::Duration::milliseconds(duration_ms as i64); - (start_time.to_rfc3339(), end_time.to_rfc3339()) + let (started_at, ended_at) = + if let (Some(start_ms), Some(duration_ms)) = (start_ms, duration_ms) { + // Parse session start time + let session_start = + DateTime::parse_from_rfc3339(&self.current_session.lock().started_at) + .map_err(|e| format!("Failed to parse session start time: {}", e)) + .ok() + .and_then(|dt| Some(dt.with_timezone(&Local))); + + if let Some(session_start) = session_start { + // VAD timestamps are relative to session start + let start_time = + session_start + chrono::Duration::milliseconds(start_ms as i64); + let end_time = + start_time + chrono::Duration::milliseconds(duration_ms as i64); + (start_time.to_rfc3339(), end_time.to_rfc3339()) + } else { + // Fallback if session start parsing fails + let now = Local::now(); + (now.to_rfc3339(), now.to_rfc3339()) + } } else { - // Fallback if session start parsing fails + // Fallback to current time if VAD timing not available let now = Local::now(); (now.to_rfc3339(), now.to_rfc3339()) - } - } else { - // Fallback to current time if VAD timing not available - let now = Local::now(); - (now.to_rfc3339(), now.to_rfc3339()) - }; + }; let audio_path = if self.config.save_audio { let audio_data = std::mem::take(&mut *self.current_utterance_audio.lock()); @@ -281,7 +293,9 @@ impl TranscriptionWriter { let sample_rate = self.config.sample_rate; match tokio::task::spawn_blocking(move || { Self::save_wav_file(&path_clone, &audio_data_move, sample_rate) - }).await { + }) + .await + { Ok(Ok(())) => Some(PathBuf::from(format!("audio/{}", filename))), Ok(Err(e)) => { tracing::error!("Failed to save audio: {}", e); @@ -309,12 +323,14 @@ impl TranscriptionWriter { confidence: None, audio_path, words: words.as_ref().map(|w| { - w.iter().map(|word| WordTiming { - word: word.text.clone(), - start_ms: (word.start * 1000.0) as u32, - end_ms: (word.end * 1000.0) as u32, - confidence: word.conf, - }).collect() + w.iter() + .map(|word| WordTiming { + word: word.text.clone(), + start_ms: (word.start * 1000.0) as u32, + end_ms: (word.end * 1000.0) as u32, + confidence: word.conf, + }) + .collect() }), }; @@ -337,7 +353,7 @@ impl TranscriptionWriter { Ok(()) } - + /// Save individual utterance based on format async fn save_utterance(&self, utterance: &UtteranceRecord) -> Result<(), String> { let filename = format!("utterance_{:06}", utterance.utterance_id); @@ -347,16 +363,15 @@ impl TranscriptionWriter { let path = self.session_dir.join(format!("{}.json", filename)); let json = serde_json::to_string_pretty(utterance) .map_err(|e| format!("Failed to serialize utterance: {}", e))?; - tokio::fs::write(&path, json).await + tokio::fs::write(&path, json) + .await .map_err(|e| format!("Failed to write utterance file: {}", e))?; } TranscriptFormat::Text => { let path = self.session_dir.join(format!("{}.txt", filename)); - let content = format!("[{}] {}\n", - utterance.started_at, - utterance.text - ); - tokio::fs::write(&path, content).await + let content = format!("[{}] {}\n", utterance.started_at, utterance.text); + tokio::fs::write(&path, content) + .await .map_err(|e| format!("Failed to write text file: {}", e))?; } TranscriptFormat::Csv => { @@ -364,7 +379,8 @@ impl TranscriptionWriter { let path = self.session_dir.join("transcriptions.csv"); // Check if file exists and is empty to determine if we need headers - let needs_header = tokio::fs::metadata(&path).await + let needs_header = tokio::fs::metadata(&path) + .await .map(|m| m.len() == 0) .unwrap_or(true); @@ -383,8 +399,14 @@ impl TranscriptionWriter { // Write header if file is new if needs_header { - wtr.write_record(&["utterance_id", "timestamp", "duration_ms", "text", "audio_path"]) - .map_err(|e| format!("Failed to write CSV header: {}", e))?; + wtr.write_record(&[ + "utterance_id", + "timestamp", + "duration_ms", + "text", + "audio_path", + ]) + .map_err(|e| format!("Failed to write CSV header: {}", e))?; } // Write the record (CSV writer handles proper escaping and quoting) @@ -393,7 +415,9 @@ impl TranscriptionWriter { utterance_clone.started_at, utterance_clone.duration_ms.to_string(), utterance_clone.text, - utterance_clone.audio_path.as_ref() + utterance_clone + .audio_path + .as_ref() .map(|p| p.to_string_lossy().to_string()) .unwrap_or_default(), ]) @@ -403,7 +427,8 @@ impl TranscriptionWriter { .map_err(|e| format!("Failed to flush CSV writer: {}", e))?; Ok::<(), String>(()) - }).await + }) + .await .map_err(|e| format!("CSV writing task panicked: {}", e))?; csv_join.map_err(|e| e)?; } @@ -411,18 +436,19 @@ impl TranscriptionWriter { Ok(()) } - + /// Update the session manifest file async fn update_session_manifest(&self) -> Result<(), String> { let session = self.current_session.lock().clone(); let manifest_path = self.session_dir.join("session.json"); let json = serde_json::to_string_pretty(&session) .map_err(|e| format!("Failed to serialize session: {}", e))?; - tokio::fs::write(&manifest_path, json).await + tokio::fs::write(&manifest_path, json) + .await .map_err(|e| format!("Failed to update session manifest: {}", e))?; Ok(()) } - + /// Save audio data as WAV file fn save_wav_file(path: &Path, samples: &[i16], sample_rate: u32) -> Result<(), String> { let spec = WavSpec { @@ -436,38 +462,41 @@ impl TranscriptionWriter { .map_err(|e| format!("Failed to create WAV file: {}", e))?; for sample in samples { - writer.write_sample(*sample) + writer + .write_sample(*sample) .map_err(|e| format!("Failed to write WAV sample: {}", e))?; } - writer.finalize() + writer + .finalize() .map_err(|e| format!("Failed to finalize WAV file: {}", e))?; Ok(()) } - + /// Finalize the session pub async fn finalize(&self) -> Result<(), String> { if !self.config.enabled { return Ok(()); } - + { let mut session = self.current_session.lock(); session.ended_at = Some(Local::now().to_rfc3339()); } - + self.update_session_manifest().await?; - + // Create summary file let summary = self.generate_summary(); let summary_path = self.session_dir.join("summary.txt"); - tokio::fs::write(&summary_path, summary).await + tokio::fs::write(&summary_path, summary) + .await .map_err(|e| format!("Failed to write summary: {}", e))?; - + Ok(()) } - + /// Generate session summary fn generate_summary(&self) -> String { let session = self.current_session.lock(); @@ -477,14 +506,18 @@ impl TranscriptionWriter { .map(|dt| dt.with_timezone(&Local)) .unwrap_or_else(|_| Local::now()); - let end_time = session.ended_at.as_ref() + let end_time = session + .ended_at + .as_ref() .and_then(|s| DateTime::parse_from_rfc3339(s).ok()) .map(|dt| dt.with_timezone(&Local)) .unwrap_or_else(|| Local::now()); let duration = end_time.signed_duration_since(start_time); - let total_words: usize = session.utterances.iter() + let total_words: usize = session + .utterances + .iter() .map(|u| u.text.split_whitespace().count()) .sum(); @@ -507,7 +540,7 @@ impl TranscriptionWriter { session.metadata.stt_model, ) } - + /// Clean up old files based on retention policy pub async fn cleanup_old_files(&self) -> Result<(), String> { if self.config.retention_days == 0 { @@ -530,14 +563,19 @@ impl TranscriptionWriter { // Parse date from directory name (YYYY-MM-DD format) if let Some(dir_name) = path.file_name().and_then(|n| n.to_str()) { if let Ok(date) = chrono::NaiveDate::parse_from_str(dir_name, "%Y-%m-%d") { - let datetime = date.and_hms_opt(0, 0, 0) + let datetime = date + .and_hms_opt(0, 0, 0) .and_then(|dt| Local.from_local_datetime(&dt).single()); if let Some(dt) = datetime { if dt < cutoff { - tracing::info!("Removing old transcription directory: {:?}", path); - std::fs::remove_dir_all(&path) - .map_err(|e| format!("Failed to remove old directory: {}", e))?; + tracing::info!( + "Removing old transcription directory: {:?}", + path + ); + std::fs::remove_dir_all(&path).map_err(|e| { + format!("Failed to remove old directory: {}", e) + })?; } } } @@ -546,7 +584,8 @@ impl TranscriptionWriter { } Ok(()) - }).await + }) + .await .map_err(|e| format!("Cleanup task panicked: {}", e))? } } @@ -567,13 +606,13 @@ pub fn spawn_persistence_handler( return; } }; - + tracing::info!("Transcription persistence handler started"); // Apply retention policy at startup (best-effort) if let Err(e) = writer.cleanup_old_files().await { tracing::warn!("Retention cleanup failed: {}", e); } - + loop { tokio::select! { Ok(frame) = audio_rx.recv() => { @@ -593,7 +632,7 @@ pub fn spawn_persistence_handler( } } } - + // Finalize session if let Err(e) = writer.finalize().await { tracing::error!("Failed to finalize session: {}", e); diff --git a/crates/app/src/stt/processor.rs b/crates/app/src/stt/processor.rs index 5859d3c6..5970a2b1 100644 --- a/crates/app/src/stt/processor.rs +++ b/crates/app/src/stt/processor.rs @@ -4,12 +4,12 @@ // for the speech recognition model, leading to more accurate transcriptions. // Text injection happens immediately (0ms timeout) after transcription completes. -use tokio::sync::{broadcast, mpsc}; +use crate::stt::{TranscriptionConfig, TranscriptionEvent}; use coldvox_audio::chunker::AudioFrame; -use crate::stt::{TranscriptionEvent, TranscriptionConfig}; use coldvox_vad::types::VadEvent; use std::sync::Arc; use std::time::Instant; +use tokio::sync::{broadcast, mpsc}; #[cfg(feature = "vosk")] use crate::stt::VoskTranscriber; @@ -82,10 +82,10 @@ impl SttProcessor { if !config.enabled { tracing::info!("STT processor disabled in configuration"); } - + // Create Vosk transcriber with configuration let transcriber = VoskTranscriber::new(config.clone(), 16000.0)?; - + Ok(Self { audio_rx, vad_event_rx, @@ -96,7 +96,7 @@ impl SttProcessor { config, }) } - + /// Create with default configuration (backward compatibility) pub fn new_with_default( audio_rx: broadcast::Receiver, @@ -104,7 +104,7 @@ impl SttProcessor { ) -> Result { // Create a simple event channel for compatibility let (event_tx, _event_rx) = mpsc::channel(100); - + // Use default config with the default model path let config = TranscriptionConfig { enabled: true, @@ -114,15 +114,15 @@ impl SttProcessor { include_words: false, buffer_size_ms: 512, }; - + Self::new(audio_rx, vad_event_rx, event_tx, config) } - + /// Get current metrics pub fn metrics(&self) -> SttMetrics { self.metrics.read().clone() } - + /// Run the STT processor loop pub async fn run(mut self) { // Exit early if STT is disabled @@ -133,7 +133,7 @@ impl SttProcessor { ); return; } - + tracing::info!( target: "stt", "STT processor starting (model: {}, partials: {}, words: {})", @@ -141,7 +141,7 @@ impl SttProcessor { self.config.partial_results, self.config.include_words ); - + loop { tokio::select! { // Listen for VAD events @@ -155,19 +155,19 @@ impl SttProcessor { } } } - + // Listen for audio frames Ok(frame) = self.audio_rx.recv() => { self.handle_audio_frame(frame).await; } - + else => { tracing::info!(target: "stt", "STT processor shutting down: all channels closed"); break; } } } - + // Log final metrics let metrics = self.metrics.read(); tracing::info!( @@ -181,28 +181,28 @@ impl SttProcessor { metrics.error_count ); } - + /// Handle speech start event async fn handle_speech_start(&mut self, timestamp_ms: u64) { tracing::debug!(target: "stt", "STT processor received SpeechStart at {}ms", timestamp_ms); - + // Store the start time as Instant for duration calculations let start_instant = Instant::now(); - + self.state = UtteranceState::SpeechActive { started_at: start_instant, audio_buffer: Vec::with_capacity(16000 * 10), // Pre-allocate for up to 10 seconds frames_buffered: 0, }; - + // Reset transcriber for new utterance if let Err(e) = coldvox_stt::EventBasedTranscriber::reset(&mut self.transcriber) { tracing::warn!(target: "stt", "Failed to reset transcriber: {}", e); } - + tracing::info!(target: "stt", "Started buffering audio for new utterance"); } - + /// Handle speech end event async fn handle_speech_end(&mut self, timestamp_ms: u64, duration_ms: Option) { tracing::debug!( @@ -211,24 +211,32 @@ impl SttProcessor { timestamp_ms, duration_ms ); - + // Process the buffered audio all at once - if let UtteranceState::SpeechActive { audio_buffer, frames_buffered, .. } = &self.state { + if let UtteranceState::SpeechActive { + audio_buffer, + frames_buffered, + .. + } = &self.state + { let buffer_size = audio_buffer.len(); tracing::info!( - target: "stt", + target: "stt", "Processing buffered audio: {} samples ({:.2}s), {} frames", buffer_size, buffer_size as f32 / 16000.0, frames_buffered ); - + if !audio_buffer.is_empty() { // Send the entire buffer to the transcriber at once - match coldvox_stt::EventBasedTranscriber::accept_frame(&mut self.transcriber, &audio_buffer) { + match coldvox_stt::EventBasedTranscriber::accept_frame( + &mut self.transcriber, + &audio_buffer, + ) { Ok(Some(event)) => { self.send_event(event).await; - + // Update metrics let mut metrics = self.metrics.write(); metrics.frames_out += frames_buffered; @@ -239,25 +247,25 @@ impl SttProcessor { } Err(e) => { tracing::error!(target: "stt", "Failed to process buffered audio: {}", e); - + // Send error event let error_event = TranscriptionEvent::Error { code: "BUFFER_PROCESS_ERROR".to_string(), message: e, }; self.send_event(error_event).await; - + // Update metrics self.metrics.write().error_count += 1; } } } - + // Finalize to get any remaining transcription match coldvox_stt::EventBasedTranscriber::finalize_utterance(&mut self.transcriber) { Ok(Some(event)) => { self.send_event(event).await; - + // Update metrics let mut metrics = self.metrics.write(); metrics.final_count += 1; @@ -268,40 +276,46 @@ impl SttProcessor { } Err(e) => { tracing::error!(target: "stt", "Failed to finalize transcription: {}", e); - + // Send error event let error_event = TranscriptionEvent::Error { code: "FINALIZE_ERROR".to_string(), message: e, }; self.send_event(error_event).await; - + // Update metrics self.metrics.write().error_count += 1; } } } - + self.state = UtteranceState::Idle; } - + /// Handle incoming audio frame async fn handle_audio_frame(&mut self, frame: AudioFrame) { // Update metrics self.metrics.write().frames_in += 1; - + // Only buffer if speech is active - if let UtteranceState::SpeechActive { ref mut audio_buffer, ref mut frames_buffered, .. } = &mut self.state { + if let UtteranceState::SpeechActive { + ref mut audio_buffer, + ref mut frames_buffered, + .. + } = &mut self.state + { // Convert f32 samples back to i16 - let i16_samples: Vec = frame.samples + let i16_samples: Vec = frame + .samples .iter() .map(|&s| (s * i16::MAX as f32) as i16) .collect(); - + // Buffer the audio frame audio_buffer.extend_from_slice(&i16_samples); *frames_buffered += 1; - + // Log periodically to show we're buffering if *frames_buffered % 100 == 0 { tracing::debug!( @@ -314,7 +328,7 @@ impl SttProcessor { } } } - + /// Send transcription event async fn send_event(&self, event: TranscriptionEvent) { // Log the event @@ -330,13 +344,12 @@ impl SttProcessor { tracing::error!(target: "stt", "Error [{}]: {}", code, message); } } - + // Send to channel with backpressure - wait if channel is full // Use timeout to prevent indefinite blocking - match tokio::time::timeout( - std::time::Duration::from_secs(5), - self.event_tx.send(event) - ).await { + match tokio::time::timeout(std::time::Duration::from_secs(5), self.event_tx.send(event)) + .await + { Ok(Ok(())) => { // Successfully sent } @@ -368,20 +381,25 @@ impl SttProcessor { tracing::info!("STT processor disabled - Vosk feature not enabled"); Ok(Self) } - + /// Stub method for backward compatibility pub fn new_with_default( _audio_rx: broadcast::Receiver, _vad_event_rx: mpsc::Receiver, ) -> Result { - Self::new(_audio_rx, _vad_event_rx, mpsc::channel(1).0, TranscriptionConfig::default()) + Self::new( + _audio_rx, + _vad_event_rx, + mpsc::channel(1).0, + TranscriptionConfig::default(), + ) } - + /// Get stub metrics pub fn metrics(&self) -> SttMetrics { SttMetrics::default() } - + /// Run stub processor pub async fn run(self) { tracing::info!("STT processor stub running - no actual processing (Vosk feature disabled)"); @@ -390,4 +408,4 @@ impl SttProcessor { tokio::time::sleep(std::time::Duration::from_secs(60)).await; } } -} \ No newline at end of file +} diff --git a/crates/app/src/stt/tests.rs b/crates/app/src/stt/tests.rs index 3811150c..16b6882b 100644 --- a/crates/app/src/stt/tests.rs +++ b/crates/app/src/stt/tests.rs @@ -8,11 +8,11 @@ mod vosk_tests { #[test] fn test_transcription_config_default() { let config = TranscriptionConfig::default(); - assert_eq!(config.enabled, false); + assert!(!config.enabled); assert_eq!(config.model_path, "models/vosk-model-small-en-us-0.15"); - assert_eq!(config.partial_results, true); + assert!(config.partial_results); assert_eq!(config.max_alternatives, 1); - assert_eq!(config.include_words, false); + assert!(!config.include_words); assert_eq!(config.buffer_size_ms, 512); } @@ -24,7 +24,7 @@ mod vosk_tests { assert_eq!(id2, id1 + 1); } - #[test] + #[test] fn test_word_info_creation() { let word = WordInfo { text: "hello".to_string(), @@ -46,7 +46,7 @@ mod vosk_tests { t0: Some(0.0), t1: Some(1.0), }; - + match partial { TranscriptionEvent::Partial { text, .. } => { assert_eq!(text, "partial text"); @@ -58,7 +58,7 @@ mod vosk_tests { code: "TEST_ERROR".to_string(), message: "Test error message".to_string(), }; - + match error { TranscriptionEvent::Error { code, message } => { assert_eq!(code, "TEST_ERROR"); @@ -70,9 +70,11 @@ mod vosk_tests { #[cfg(feature = "vosk")] mod vosk_integration_tests { + use coldvox_stt::EventBasedTranscriber; + use super::super::super::vosk::VoskTranscriber; use super::super::super::TranscriptionConfig; - + #[test] fn test_vosk_transcriber_missing_model() { let config = TranscriptionConfig { @@ -83,7 +85,7 @@ mod vosk_tests { include_words: false, buffer_size_ms: 512, }; - + let result = VoskTranscriber::new(config, 16000.0); assert!(result.is_err()); if let Err(e) = result { @@ -101,7 +103,7 @@ mod vosk_tests { include_words: false, buffer_size_ms: 512, }; - + let result = VoskTranscriber::new(config, 16000.0); assert!(result.is_err()); if let Err(e) = result { @@ -127,17 +129,17 @@ mod vosk_tests { include_words: false, buffer_size_ms: 512, }; - + let result = VoskTranscriber::new(config.clone(), 16000.0); assert!(result.is_ok()); - + let mut transcriber = result.unwrap(); - + // Test with silence (should not produce transcription) let silence = vec![0i16; 512]; let event = transcriber.accept_frame(&silence); assert!(event.is_ok()); - + // Test finalization let final_result = transcriber.finalize_utterance(); assert!(final_result.is_ok()); @@ -154,15 +156,17 @@ mod processor_tests { fn test_utterance_state_transitions() { let idle = UtteranceState::Idle; matches!(idle, UtteranceState::Idle); - + let active = UtteranceState::SpeechActive { started_at: Instant::now(), audio_buffer: Vec::new(), frames_buffered: 0, }; - + match active { - UtteranceState::SpeechActive { frames_buffered, .. } => { + UtteranceState::SpeechActive { + frames_buffered, .. + } => { assert_eq!(frames_buffered, 0); } _ => panic!("Expected SpeechActive state"), @@ -181,4 +185,4 @@ mod processor_tests { assert_eq!(metrics.queue_depth, 0); assert!(metrics.last_event_time.is_none()); } -} \ No newline at end of file +} diff --git a/crates/app/src/stt/tests/end_to_end_wav.rs b/crates/app/src/stt/tests/end_to_end_wav.rs index 7698b32c..ffa746c2 100644 --- a/crates/app/src/stt/tests/end_to_end_wav.rs +++ b/crates/app/src/stt/tests/end_to_end_wav.rs @@ -7,10 +7,10 @@ use std::time::{Duration, Instant}; use tokio::sync::{broadcast, mpsc}; use tracing::info; -use coldvox_audio::chunker::{AudioChunker, ChunkerConfig}; -use coldvox_audio::ring_buffer::{AudioRingBuffer, AudioProducer}; -use coldvox_audio::chunker::AudioFrame; use crate::stt::{processor::SttProcessor, TranscriptionConfig, TranscriptionEvent}; +use coldvox_audio::chunker::AudioFrame; +use coldvox_audio::chunker::{AudioChunker, ChunkerConfig}; +use coldvox_audio::ring_buffer::{AudioProducer, AudioRingBuffer}; // use crate::text_injection::{AsyncInjectionProcessor, InjectionProcessorConfig}; use coldvox_vad::config::{UnifiedVadConfig, VadMode}; use coldvox_vad::constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; @@ -52,17 +52,19 @@ impl WavFileLoader { pub fn new>(wav_path: P, target_sample_rate: u32) -> Result { let mut reader = WavReader::open(wav_path)?; let spec = reader.spec(); - - info!("Loading WAV: {} Hz, {} channels, {} bits", - spec.sample_rate, spec.channels, spec.bits_per_sample); + + info!( + "Loading WAV: {} Hz, {} channels, {} bits", + spec.sample_rate, spec.channels, spec.bits_per_sample + ); // Read all samples - let samples: Vec = reader.samples::() - .collect::, _>>()?; + let samples: Vec = reader.samples::().collect::, _>>()?; // Convert to mono if stereo let mono_samples = if spec.channels == 2 { - samples.chunks(2) + samples + .chunks(2) .map(|chunk| ((chunk[0] as i32 + chunk[1] as i32) / 2) as i16) .collect() } else { @@ -74,7 +76,7 @@ impl WavFileLoader { let ratio = target_sample_rate as f32 / spec.sample_rate as f32; let new_len = (mono_samples.len() as f32 * ratio) as usize; let mut resampled = Vec::with_capacity(new_len); - + for i in 0..new_len { let src_idx = i as f32 / ratio; let idx = src_idx as usize; @@ -87,7 +89,11 @@ impl WavFileLoader { mono_samples }; - info!("WAV loaded: {} samples at {} Hz", final_samples.len(), target_sample_rate); + info!( + "WAV loaded: {} samples at {} Hz", + final_samples.len(), + target_sample_rate + ); Ok(Self { samples: final_samples, @@ -99,12 +105,13 @@ impl WavFileLoader { /// Stream audio data to ring buffer with realistic timing pub async fn stream_to_ring_buffer(&mut self, mut producer: AudioProducer) -> Result<()> { - let frame_duration = Duration::from_millis((self.frame_size * 1000) as u64 / self.sample_rate as u64); - + let frame_duration = + Duration::from_millis((self.frame_size * 1000) as u64 / self.sample_rate as u64); + while self.current_pos < self.samples.len() { let end_pos = (self.current_pos + self.frame_size).min(self.samples.len()); let chunk = &self.samples[self.current_pos..end_pos]; - + // Try to write chunk to ring buffer let mut written = 0; while written < chunk.len() { @@ -116,13 +123,13 @@ impl WavFileLoader { } } } - + self.current_pos = end_pos; - + // Maintain realistic timing tokio::time::sleep(frame_duration).await; } - + info!("WAV streaming completed"); Ok(()) } @@ -220,11 +227,11 @@ pub async fn test_wav_pipeline>( let mock_injector = MockTextInjector::new(); let ring_buffer = AudioRingBuffer::new(16384 * 4); let (audio_producer, audio_consumer) = ring_buffer.split(); - + // Load WAV file let mut wav_loader = WavFileLoader::new(wav_path, SAMPLE_RATE_HZ)?; let test_duration = Duration::from_millis(wav_loader.duration_ms() + 2000); // Add buffer time - + // Set up audio chunker let (audio_tx, _) = broadcast::channel::(200); let frame_reader = coldvox_audio::frame_reader::FrameReader::new( @@ -234,13 +241,13 @@ pub async fn test_wav_pipeline>( 16384 * 4, None, ); - + let chunker_cfg = ChunkerConfig { frame_size_samples: FRAME_SIZE_SAMPLES, sample_rate_hz: SAMPLE_RATE_HZ, resampler_quality: coldvox_audio::chunker::ResamplerQuality::Balanced, }; - + let chunker = AudioChunker::new(frame_reader, audio_tx.clone(), chunker_cfg); let chunker_handle = chunker.spawn(); @@ -285,10 +292,11 @@ pub async fn test_wav_pipeline>( } let stt_audio_rx = audio_tx.subscribe(); - let stt_processor = match SttProcessor::new(stt_audio_rx, vad_event_rx, stt_transcription_tx, stt_config) { - Ok(processor) => processor, - Err(e) => anyhow::bail!("Failed to create STT processor: {}", e), - }; + let stt_processor = + match SttProcessor::new(stt_audio_rx, vad_event_rx, stt_transcription_tx, stt_config) { + Ok(processor) => processor, + Err(e) => anyhow::bail!("Failed to create STT processor: {}", e), + }; let stt_handle = tokio::spawn(async move { stt_processor.run().await; }); @@ -298,20 +306,14 @@ pub async fn test_wav_pipeline>( let mock_injector_clone = MockTextInjector { injections: Arc::clone(&mock_injector.injections), }; - - let injection_processor = MockInjectionProcessor::new( - mock_injector_clone, - stt_transcription_rx, - shutdown_rx, - ); - let _injection_handle = tokio::spawn(async move { - injection_processor.run().await - }); + + let injection_processor = + MockInjectionProcessor::new(mock_injector_clone, stt_transcription_rx, shutdown_rx); + let _injection_handle = tokio::spawn(async move { injection_processor.run().await }); // Start streaming WAV data - let streaming_handle = tokio::spawn(async move { - wav_loader.stream_to_ring_buffer(audio_producer).await - }); + let streaming_handle = + tokio::spawn(async move { wav_loader.stream_to_ring_buffer(audio_producer).await }); info!("Pipeline started, running for {:?}", test_duration); @@ -335,14 +337,14 @@ pub async fn test_wav_pipeline>( let all_text = injections.join(" ").to_lowercase(); let mut found_any = false; let mut found_fragments = Vec::new(); - + for expected in &expected_text_fragments { if all_text.contains(&expected.to_lowercase()) { found_any = true; found_fragments.push(expected.clone()); } } - + if !found_any && !expected_text_fragments.is_empty() { anyhow::bail!( "None of the expected text fragments {:?} were found in injections: {:?}", @@ -350,7 +352,7 @@ pub async fn test_wav_pipeline>( injections ); } - + info!("Found expected fragments: {:?}", found_fragments); Ok(injections) @@ -363,16 +365,16 @@ mod tests { #[tokio::test] #[ignore] // Requires WAV files and Vosk model async fn test_end_to_end_wav_pipeline() { - use std::fs; use rand::seq::SliceRandom; - + use std::fs; + // This test requires: // 1. A WAV file with known speech content // 2. Vosk model downloaded and configured - + // Look for test WAV files in test_data directory let test_data_dir = "test_data"; - + // If TEST_WAV is set, use that specific file let (wav_path, expected_fragments) = if let Ok(specific_wav) = std::env::var("TEST_WAV") { if !std::path::Path::new(&specific_wav).exists() { @@ -391,7 +393,7 @@ mod tests { return; } }; - + let mut wav_files = Vec::new(); for entry in entries.flatten() { let path = entry.path(); @@ -402,21 +404,22 @@ mod tests { } } } - + if wav_files.is_empty() { eprintln!("Skipping test: No WAV files with transcripts found in test_data/"); return; } - + // Randomly select a test file let mut rng = rand::thread_rng(); let selected_wav = wav_files.choose(&mut rng).unwrap().clone(); - + // Load the corresponding transcript let txt_path = std::path::Path::new(&selected_wav).with_extension("txt"); - let transcript = fs::read_to_string(&txt_path) - .unwrap_or_else(|e| panic!("Failed to read transcript {}: {}", txt_path.display(), e)); - + let transcript = fs::read_to_string(&txt_path).unwrap_or_else(|e| { + panic!("Failed to read transcript {}: {}", txt_path.display(), e) + }); + // Extract key words from transcript (longer words are more distinctive) let words: Vec = transcript .to_lowercase() @@ -425,7 +428,7 @@ mod tests { .take(3) // Take up to 3 key words .map(|s| s.to_string()) .collect(); - + let expected = if words.is_empty() { // Fallback to any word if no long words found transcript @@ -437,16 +440,16 @@ mod tests { } else { words }; - + (selected_wav, expected) }; - + println!("Testing with WAV file: {}", wav_path); println!("Expected keywords: {:?}", expected_fragments); - + // Convert Vec to Vec<&str> for the test function let expected_refs: Vec<&str> = expected_fragments.iter().map(|s| s.as_str()).collect(); - + match test_wav_pipeline(wav_path, expected_refs).await { Ok(injections) => { println!("✅ Test passed! Injections: {:?}", injections); @@ -463,15 +466,15 @@ mod tests { fn test_wav_file_loader() { // Test WAV file loading with a simple synthetic file // This could be expanded to create a simple test WAV file - + // For now, just test the struct creation let injector = MockTextInjector::new(); assert_eq!(injector.get_injections().len(), 0); - + // Test injection tokio_test::block_on(async { injector.inject("test").await.unwrap(); assert_eq!(injector.get_injections(), vec!["test"]); }); } -} \ No newline at end of file +} diff --git a/crates/app/src/stt/vosk.rs b/crates/app/src/stt/vosk.rs index 68f75b45..0e075ece 100644 --- a/crates/app/src/stt/vosk.rs +++ b/crates/app/src/stt/vosk.rs @@ -2,4 +2,4 @@ pub use coldvox_stt_vosk::VoskTranscriber; // For backward compatibility, also re-export the default model path function -pub use coldvox_stt_vosk::default_model_path; \ No newline at end of file +pub use coldvox_stt_vosk::default_model_path; diff --git a/crates/app/src/text_injection/atspi_injector.rs b/crates/app/src/text_injection/atspi_injector.rs deleted file mode 100644 index aaf3a6c6..00000000 --- a/crates/app/src/text_injection/atspi_injector.rs +++ /dev/null @@ -1,199 +0,0 @@ -use crate::text_injection::focus::{FocusTracker, FocusStatus}; -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics}; -use atspi::action::Action; -use atspi::editable_text::EditableText; -use atspi::Accessible; -use std::time::Duration; -use tokio::time::timeout; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// AT-SPI2 injector for direct text insertion -pub struct AtspiInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - focus_tracker: FocusTracker, -} - -impl AtspiInjector { - /// Create a new AT-SPI2 injector - pub fn new(config: InjectionConfig) -> Self { - Self { - config: config.clone(), - metrics: InjectionMetrics::default(), - focus_tracker: FocusTracker::new(config), - } - } - - /// Insert text directly into the focused element using EditableText interface - async fn insert_text_direct(&self, text: &str, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get EditableText interface - let editable_text = EditableText::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Get current text length to insert at end - let text_length = editable_text.get_text(0, -1).await - .map_err(|e| InjectionError::Atspi(e))? - .len() as i32; - - // Insert text at the end - editable_text.insert_text(text_length, text).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::AtspiInsert, duration); - info!("Successfully inserted text via AT-SPI2 EditableText ({} chars)", text.len()); - - Ok(()) - } - - /// Trigger paste action on the focused element - async fn trigger_paste_action(&self, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get Action interface - let action = Action::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Find paste action - let n_actions = action.n_actions().await - .map_err(|e| InjectionError::Atspi(e))?; - - for i in 0..n_actions { - let action_name = action.get_action_name(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let action_description = action.get_action_description(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Check if this is a paste action (case-insensitive) - if action_name.to_lowercase().contains("paste") || - action_description.to_lowercase().contains("paste") { - debug!("Found paste action: {} ({})", action_name, action_description); - - // Execute the paste action - action.do_action(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::AtspiInsert, duration); - info!("Successfully triggered paste action via AT-SPI2"); - return Ok(()); - } - } - - Err(InjectionError::MethodUnavailable("No paste action found".to_string())) - } -} - -#[async_trait] -impl super::types::TextInjector for AtspiInjector { - fn name(&self) -> &'static str { - "AT-SPI2" - } - - fn is_available(&self) -> bool { - // AT-SPI2 should be available on KDE/Wayland - std::env::var("XDG_SESSION_TYPE").map(|t| t == "wayland").unwrap_or(false) - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - let start = std::time::Instant::now(); - - // Get focus status - let focus_status = self.focus_tracker.get_focus_status().await.map_err(|e| { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::AtspiInsert, duration, e.to_string()); - e - })?; - - // Only proceed if we have a confirmed editable field or unknown focus (if allowed) - if focus_status == FocusStatus::NonEditable { - // We can't insert text directly, but might be able to paste - debug!("Focused element is not editable, skipping direct insertion"); - return Err(InjectionError::MethodUnavailable("Focused element not editable".to_string())); - } - - if focus_status == FocusStatus::Unknown && !self.config.inject_on_unknown_focus { - debug!("Focus state unknown and injection on unknown focus disabled"); - return Err(InjectionError::Other("Unknown focus state".to_string())); - } - - // Get focused element - let focused = match self.focus_tracker.get_focused_element().await { - Ok(Some(element)) => element, - Ok(None) => { - debug!("No focused element"); - return Err(InjectionError::Other("No focused element".to_string())); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::AtspiInsert, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Try direct insertion first - let direct_res = timeout( - Duration::from_millis(self.config.per_method_timeout_ms), - self.insert_text_direct(text, &focused), - ).await; - match direct_res { - Ok(Ok(())) => return Ok(()), - Ok(Err(e)) => { - debug!("Direct insertion failed: {}", e); - } - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - format!("Timeout after {}ms", self.config.per_method_timeout_ms) - ); - return Err(InjectionError::Timeout(self.config.per_method_timeout_ms)); - } - } - - // If direct insertion failed, try paste action if the element supports it - if self.focus_tracker.supports_paste_action(&focused).await.unwrap_or(false) { - let paste_res = timeout( - Duration::from_millis(self.config.paste_action_timeout_ms), - self.trigger_paste_action(&focused), - ).await; - match paste_res { - Ok(Ok(())) => return Ok(()), - Ok(Err(e)) => { - debug!("Paste action failed: {}", e); - } - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - format!("Timeout after {}ms", self.config.paste_action_timeout_ms) - ); - return Err(InjectionError::Timeout(self.config.paste_action_timeout_ms)); - } - } - } - - // If we get here, both methods failed - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - "Both direct insertion and paste action failed".to_string() - ); - Err(InjectionError::MethodFailed("AT-SPI2 injection failed".to_string())) - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/backend.rs b/crates/app/src/text_injection/backend.rs deleted file mode 100644 index 6f1115f8..00000000 --- a/crates/app/src/text_injection/backend.rs +++ /dev/null @@ -1,192 +0,0 @@ -use crate::text_injection::types::InjectionConfig; -use std::env; - -/// Available text injection backends -#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum Backend { - /// Wayland with virtual keyboard (wlroots/wlr-virtual-keyboard) - WaylandVirtualKeyboard, - /// Wayland with xdg-desktop-portal's RemoteDesktop/VirtualKeyboard - WaylandXdgDesktopPortal, - /// X11 with xdotool/xtest - X11Xdotool, - /// X11 with native Rust wrapper - X11Native, - /// macOS with CGEvent/AX API - MacCgEvent, - /// macOS with NSPasteboard - MacPasteboard, - /// Windows with SendInput - WindowsSendInput, - /// Windows with clipboard - WindowsClipboard, -} - -/// Backend capability detector -pub struct BackendDetector { - _config: InjectionConfig, -} - -impl BackendDetector { - /// Create a new backend detector - pub fn new(config: InjectionConfig) -> Self { - Self { _config: config } - } - - /// Detect available backends on the current system - pub fn detect_available_backends(&self) -> Vec { - let mut available = Vec::new(); - - // Detect Wayland backends - if self.is_wayland() { - // Check for xdg-desktop-portal VirtualKeyboard - if self.has_xdg_desktop_portal_virtual_keyboard() { - available.push(Backend::WaylandXdgDesktopPortal); - } - - // Check for wlr-virtual-keyboard (requires compositor support) - if self.has_wlr_virtual_keyboard() { - available.push(Backend::WaylandVirtualKeyboard); - } - } - - // Detect X11 backends - if self.is_x11() { - // Check for xdotool - if self.has_xdotool() { - available.push(Backend::X11Xdotool); - } - - // Native X11 wrapper is always available if on X11 - available.push(Backend::X11Native); - } - - // Detect macOS backends - if self.is_macos() { - available.push(Backend::MacCgEvent); - available.push(Backend::MacPasteboard); - } - - // Detect Windows backends - if self.is_windows() { - available.push(Backend::WindowsSendInput); - available.push(Backend::WindowsClipboard); - } - - available - } - - /// Get the preferred backend based on availability and configuration - pub fn get_preferred_backend(&self) -> Option { - let available = self.detect_available_backends(); - - // Return the most preferred available backend - Self::preferred_order().into_iter().find(|&preferred| available.contains(&preferred)) - } - - /// Get the preferred order of backends - fn preferred_order() -> Vec { - vec![ - Backend::WaylandXdgDesktopPortal, // Preferred on Wayland - Backend::WaylandVirtualKeyboard, // Fallback on Wayland - Backend::X11Xdotool, // Preferred on X11 - Backend::X11Native, // Fallback on X11 - Backend::MacCgEvent, // Preferred on macOS - Backend::MacPasteboard, // Fallback on macOS - Backend::WindowsSendInput, // Preferred on Windows - Backend::WindowsClipboard, // Fallback on Windows - ] - } - - /// Check if running on Wayland - fn is_wayland(&self) -> bool { - env::var("XDG_SESSION_TYPE") - .map(|s| s == "wayland") - .unwrap_or(false) - || env::var("WAYLAND_DISPLAY").is_ok() - } - - /// Check if running on X11 - fn is_x11(&self) -> bool { - env::var("XDG_SESSION_TYPE") - .map(|s| s == "x11") - .unwrap_or(false) - || env::var("DISPLAY").is_ok() - } - - /// Check if running on macOS - fn is_macos(&self) -> bool { - cfg!(target_os = "macos") - } - - /// Check if running on Windows - fn is_windows(&self) -> bool { - cfg!(target_os = "windows") - } - - /// Check if xdg-desktop-portal VirtualKeyboard is available - fn has_xdg_desktop_portal_virtual_keyboard(&self) -> bool { - // Check if xdg-desktop-portal is running and supports VirtualKeyboard - // This would typically involve D-Bus communication - // For now, we'll check if the portal is available - std::process::Command::new("pgrep") - .arg("xdg-desktop-portal") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) - } - - /// Check if wlr-virtual-keyboard is available - fn has_wlr_virtual_keyboard(&self) -> bool { - // This would require checking if the compositor supports wlr-virtual-keyboard - // For now, we'll check if the binary is available - std::process::Command::new("which") - .arg("wlr-virtual-keyboard") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) - } - - /// Check if xdotool is available - fn has_xdotool(&self) -> bool { - std::process::Command::new("which") - .arg("xdotool") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) - } -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_backend_detection() { - let config = InjectionConfig::default(); - let detector = BackendDetector::new(config); - - let backends = detector.detect_available_backends(); - - // At least one backend should be available - assert!(!backends.is_empty()); - - // Check that the preferred backend is in the list - if let Some(preferred) = detector.get_preferred_backend() { - assert!(backends.contains(&preferred)); - } - } - - #[test] - fn test_preferred_order() { - let order = BackendDetector::preferred_order(); - - // Check that Wayland backends are preferred first - assert_eq!(order[0], Backend::WaylandXdgDesktopPortal); - assert_eq!(order[1], Backend::WaylandVirtualKeyboard); - - // Check that X11 backends come next - assert_eq!(order[2], Backend::X11Xdotool); - assert_eq!(order[3], Backend::X11Native); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/clipboard_injector.rs b/crates/app/src/text_injection/clipboard_injector.rs deleted file mode 100644 index db6f38d8..00000000 --- a/crates/app/src/text_injection/clipboard_injector.rs +++ /dev/null @@ -1,343 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use std::time::{Duration, Instant}; -use tracing::{debug, info, warn}; -use wl_clipboard_rs::copy::{Options, Source, MimeType}; -use wl_clipboard_rs::paste::{MimeType as PasteMimeType}; -use async_trait::async_trait; - -/// Clipboard injector using Wayland-native API -pub struct ClipboardInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - /// Previous clipboard content if we're restoring - previous_clipboard: Option, -} - -impl ClipboardInjector { - /// Create a new clipboard injector - pub fn new(config: InjectionConfig) -> Self { - Self { - config, - metrics: InjectionMetrics::default(), - previous_clipboard: None, - } - } -} - -#[async_trait] -impl TextInjector for ClipboardInjector { - fn name(&self) -> &'static str { - "Clipboard" - } - - fn is_available(&self) -> bool { - // Check if we can access the Wayland display - std::env::var("WAYLAND_DISPLAY").is_ok() - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - let start = Instant::now(); - - // Save current clipboard if configured - // Note: Clipboard saving would require async context or separate thread - // Pattern note: TextInjector is synchronous by design; for async-capable - // backends, we offload to a blocking thread and communicate via channels. - // This keeps the trait simple while still allowing async operations under the hood. - - // Set new clipboard content with timeout - let text_clone = text.to_string(); - let timeout_ms = self.config.per_method_timeout_ms; - - let result = tokio::task::spawn_blocking(move || { - let source = Source::Bytes(text_clone.into_bytes().into()); - let options = Options::new(); - - wl_clipboard_rs::copy::copy(options, source, MimeType::Text) - }).await; - - match result { - Ok(Ok(_)) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::Clipboard, duration); - info!("Clipboard set successfully ({} chars)", text.len()); - Ok(()) - } - Ok(Err(e)) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::Clipboard, - duration, - e.to_string() - ); - Err(InjectionError::Clipboard(e.to_string())) - } - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::Clipboard, - duration, - format!("Timeout after {}ms", timeout_ms) - ); - Err(InjectionError::Timeout(timeout_ms)) - } - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} - -impl ClipboardInjector { - /// Save current clipboard content for restoration - async fn save_clipboard(&mut self) -> Result, InjectionError> { - if !self.config.restore_clipboard { - return Ok(None); - } - - #[cfg(feature = "text-injection-clipboard")] - { - - use std::io::Read; - - // Try to get current clipboard content - match wl_clipboard_rs::paste::get_contents(wl_clipboard_rs::paste::ClipboardType::Regular, wl_clipboard_rs::paste::Seat::Unspecified, PasteMimeType::Text) { - Ok((mut pipe, _mime)) => { - let mut contents = String::new(); - if pipe.read_to_string(&mut contents).is_ok() { - debug!("Saved clipboard content ({} chars)", contents.len()); - return Ok(Some(contents)); - } - } - Err(e) => { - debug!("Could not save clipboard: {}", e); - } - } - } - - Ok(None) - } - - /// Restore previously saved clipboard content - async fn restore_clipboard(&mut self, content: Option) -> Result<(), InjectionError> { - if let Some(content) = content { - if !self.config.restore_clipboard { - return Ok(()); - } - - #[cfg(feature = "text-injection-clipboard")] - { - use wl_clipboard_rs::copy::{MimeType, Options, Source}; - - let opts = Options::new(); - match opts.copy(Source::Bytes(content.as_bytes().into()), MimeType::Text) { - Ok(_) => { - debug!("Restored clipboard content ({} chars)", content.len()); - } - Err(e) => { - warn!("Failed to restore clipboard: {}", e); - } - } - } - } - - Ok(()) - } - - /// Enhanced clipboard operation with automatic save/restore - async fn clipboard_with_restore(&mut self, text: &str) -> Result<(), InjectionError> { - // Save current clipboard - let saved = self.save_clipboard().await?; - - // Set new clipboard content - let result = self.set_clipboard(text).await; - - // Schedule restoration after a delay (to allow paste to complete) - if saved.is_some() && self.config.restore_clipboard { - let delay_ms = self.config.clipboard_restore_delay_ms.unwrap_or(500); - tokio::spawn(async move { - tokio::time::sleep(Duration::from_millis(delay_ms)).await; - // Note: In production, this would need access to self to call restore_clipboard - // For now, we'll rely on the Drop implementation - }); - } - - result - } - - /// Set clipboard content (internal helper) - async fn set_clipboard(&self, text: &str) -> Result<(), InjectionError> { - #[cfg(feature = "text-injection-clipboard")] - { - use wl_clipboard_rs::copy::{MimeType, Options, Source}; - - let source = Source::Bytes(text.as_bytes().to_vec().into()); - let opts = Options::new(); - - match opts.copy(source, MimeType::Text) { - Ok(_) => { - debug!("Set clipboard content ({} chars)", text.len()); - Ok(()) - } - Err(e) => { - Err(InjectionError::Clipboard(e.to_string())) - } - } - } - - #[cfg(not(feature = "text-injection-clipboard"))] - { - Err(InjectionError::MethodUnavailable("Clipboard feature not enabled".to_string())) - } - } -} - -// No Drop impl: restore is async and should be handled by caller scheduling - -#[cfg(test)] -mod tests { - use super::*; - use std::env; - use std::sync::Mutex; - use std::time::Duration; - - - // Mock for wl_clipboard_rs to avoid actual system calls - struct MockClipboard { - content: Mutex>, - } - - impl MockClipboard { - fn new() -> Self { - Self { - content: Mutex::new(None), - } - } - - fn set(&self, text: String) -> Result<(), String> { - let mut content = self.content.lock().unwrap(); - *content = Some(text); - Ok(()) - } - - fn get(&self) -> Result { - let content = self.content.lock().unwrap(); - content.clone().ok_or("No content".to_string()) - } - } - - // Test that clipboard injector can be created - #[test] - fn test_clipboard_injector_creation() { - let config = InjectionConfig::default(); - let injector = ClipboardInjector::new(config); - - assert_eq!(injector.name(), "Clipboard"); - assert!(injector.metrics.attempts == 0); - } - - // Test that inject works with valid text - #[test] - fn test_clipboard_inject_valid_text() { - // Set WAYLAND_DISPLAY to simulate Wayland environment - env::set_var("WAYLAND_DISPLAY", "wayland-0"); - - let config = InjectionConfig::default(); - let mut injector = ClipboardInjector::new(config); - - // Mock clipboard - let clipboard = MockClipboard::new(); - - // Override the actual clipboard operations with our mock - // This is a simplified test - in real code we'd use proper mocking - // Simulate successful clipboard operation and metrics update - let text = "test text"; - let _ = clipboard.set(text.to_string()); - let duration = 100; - injector.metrics.record_success(InjectionMethod::Clipboard, duration); - assert_eq!(injector.metrics.successes, 1); - assert_eq!(injector.metrics.attempts, 1); - - env::remove_var("WAYLAND_DISPLAY"); - assert_eq!(injector.metrics.successes, 1); - } - - // Test that inject fails with empty text - #[tokio::test] - async fn test_clipboard_inject_empty_text() { - let config = InjectionConfig::default(); - let mut injector = ClipboardInjector::new(config); - - let result = injector.inject("").await; - assert!(result.is_ok()); - assert_eq!(injector.metrics.attempts, 0); // Should not record attempt for empty text - } - - // Test that inject fails when clipboard is not available - #[test] - fn test_clipboard_inject_no_wayland() { - // Don't set WAYLAND_DISPLAY to simulate non-Wayland environment - let config = InjectionConfig::default(); - let mut injector = ClipboardInjector::new(config); - - // Availability depends on environment; just ensure calling inject doesn't panic - let _ = injector.inject("test"); - } - - // Test clipboard restoration - #[test] - fn test_clipboard_restore() { - env::set_var("WAYLAND_DISPLAY", "wayland-0"); - - let mut config = InjectionConfig::default(); - config.restore_clipboard = true; - - let mut injector = ClipboardInjector::new(config); - - // Simulate previous clipboard content - injector.previous_clipboard = Some("previous content".to_string()); - - // Mock clipboard - let clipboard = MockClipboard::new(); - let _ = clipboard.set("new content".to_string()); - - // Restore should work - let _ = clipboard.get(); - - env::remove_var("WAYLAND_DISPLAY"); - assert!(true); - } - - // Test timeout handling - #[test] - fn test_clipboard_inject_timeout() { - env::set_var("WAYLAND_DISPLAY", "wayland-0"); - - let mut config = InjectionConfig::default(); - config.per_method_timeout_ms = 1; // Very short timeout - let to_ms = config.per_method_timeout_ms; - - let mut injector = ClipboardInjector::new(config.clone()); - - // Test with a text that would cause timeout in real implementation - // In our mock, we'll simulate timeout by using a long-running operation - // Simulate timeout metrics - let start = Instant::now(); - while start.elapsed() < Duration::from_millis(10) {} - let duration = start.elapsed().as_millis() as u64; - injector.metrics.record_failure( - InjectionMethod::Clipboard, - duration, - format!("Timeout after {}ms", to_ms) - ); - assert_eq!(injector.metrics.failures, 1); - assert_eq!(injector.metrics.attempts, 1); - - env::remove_var("WAYLAND_DISPLAY"); - assert_eq!(injector.metrics.failures, 1); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/combo_clip_atspi.rs b/crates/app/src/text_injection/combo_clip_atspi.rs deleted file mode 100644 index 11689dff..00000000 --- a/crates/app/src/text_injection/combo_clip_atspi.rs +++ /dev/null @@ -1,162 +0,0 @@ -use crate::text_injection::clipboard_injector::ClipboardInjector; -use crate::text_injection::focus::{FocusTracker, FocusStatus}; -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use atspi::action::Action; -use atspi::Accessible; -use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// Combo injector that sets clipboard and then triggers AT-SPI paste action -pub struct ComboClipboardAtspiInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - clipboard_injector: ClipboardInjector, - focus_tracker: FocusTracker, -} - -impl ComboClipboardAtspiInjector { - /// Create a new combo clipboard+AT-SPI injector - pub fn new(config: InjectionConfig) -> Self { - Self { - config: config.clone(), - metrics: InjectionMetrics::default(), - clipboard_injector: ClipboardInjector::new(config.clone()), - focus_tracker: FocusTracker::new(config), - } - } - - /// Trigger paste action on the focused element via AT-SPI2 - async fn trigger_paste_action(&self, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get Action interface - let action = Action::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Find paste action - let n_actions = action.n_actions().await - .map_err(|e| InjectionError::Atspi(e))?; - - for i in 0..n_actions { - let action_name = action.get_action_name(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let action_description = action.get_action_description(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Check if this is a paste action (case-insensitive) - if action_name.to_lowercase().contains("paste") || - action_description.to_lowercase().contains("paste") { - debug!("Found paste action: {} ({})", action_name, action_description); - - // Execute the paste action - action.do_action(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::ClipboardAndPaste, duration); - info!("Successfully triggered paste action via AT-SPI2"); - return Ok(()); - } - } - - Err(InjectionError::MethodUnavailable("No paste action found".to_string())) - } -} - -#[async_trait] -impl TextInjector for ComboClipboardAtspiInjector { - fn name(&self) -> &'static str { - "Clipboard+AT-SPI Paste" - } - - fn is_available(&self) -> bool { - // Available if both clipboard and AT-SPI are available - self.clipboard_injector.is_available() && - std::env::var("XDG_SESSION_TYPE").map(|t| t == "wayland").unwrap_or(false) - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - let start = std::time::Instant::now(); - - // First, set the clipboard - match self.clipboard_injector.inject(text) { - Ok(()) => { - debug!("Clipboard set successfully, proceeding to trigger paste action"); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::MethodFailed("Failed to set clipboard".to_string())); - } - } - - // Small delay for clipboard to settle - tokio::time::sleep(Duration::from_millis(50)).await; - - // Get focus status - let focus_status = match self.focus_tracker.get_focus_status().await { - Ok(status) => status, - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Only proceed if we have a focused element - if focus_status == FocusStatus::Unknown { - debug!("Focus state unknown"); - return Err(InjectionError::Other("Unknown focus state".to_string())); - } - - // Get focused element - let focused = match self.focus_tracker.get_focused_element().await { - Ok(Some(element)) => element, - Ok(None) => { - debug!("No focused element"); - return Err(InjectionError::Other("No focused element".to_string())); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Check if the element supports paste action - if !self.focus_tracker.supports_paste_action(&focused).await.unwrap_or(false) { - debug!("Focused element does not support paste action"); - return Err(InjectionError::MethodUnavailable("Focused element does not support paste action".to_string())); - } - - // Trigger paste action - let res = timeout( - Duration::from_millis(self.config.paste_action_timeout_ms), - self.trigger_paste_action(&focused), - ).await; - match res { - Ok(Ok(())) => Ok(()), - Ok(Err(e)) => Err(e), - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::ClipboardAndPaste, - duration, - format!("Timeout after {}ms", self.config.paste_action_timeout_ms) - ); - Err(InjectionError::Timeout(self.config.paste_action_timeout_ms)) - } - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/enigo_injector.rs b/crates/app/src/text_injection/enigo_injector.rs deleted file mode 100644 index 6e89880c..00000000 --- a/crates/app/src/text_injection/enigo_injector.rs +++ /dev/null @@ -1,134 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use enigo::{Enigo, KeyboardControllable, Key}; -use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// Enigo injector for synthetic input -pub struct EnigoInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - /// Whether enigo is available and can be used - is_available: bool, -} - -impl EnigoInjector { - /// Create a new enigo injector - pub fn new(config: InjectionConfig) -> Self { - let is_available = Self::check_availability(); - - Self { - config, - metrics: InjectionMetrics::default(), - is_available, - } - } - - /// Check if enigo can be used (permissions, backend availability) - fn check_availability() -> bool { - // Check if we can create an Enigo instance - // This will fail if we don't have the necessary permissions - Enigo::new().is_ok() - } - - /// Type text using enigo - async fn type_text(&mut self, text: &str) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - let text_clone = text.to_string(); - - let result = tokio::task::spawn_blocking(move || { - let mut enigo = Enigo::new(); - - // Type each character with a small delay - for c in text_clone.chars() { - match c { - ' ' => enigo.key_click(Key::Space), - '\n' => enigo.key_click(Key::Return), - '\t' => enigo.key_click(Key::Tab), - _ => { - if c.is_ascii() { - enigo.key_sequence(&c.to_string()); - } else { - // For non-ASCII characters, we might need to use clipboard - return Err(InjectionError::MethodFailed("Enigo doesn't support non-ASCII characters directly".to_string())); - } - } - } - } - - Ok(()) - }).await; - - match result { - Ok(Ok(())) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::EnigoText, duration); - info!("Successfully typed text via enigo ({} chars)", text.len()); - Ok(()) - } - Ok(Err(e)) => Err(e), - Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed - } - } - - /// Trigger paste action using enigo (Ctrl+V) - async fn trigger_paste(&mut self) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - let result = tokio::task::spawn_blocking(|| { - let mut enigo = Enigo::new(); - - // Press Ctrl+V - enigo.key_down(Key::Control); - enigo.key_click(Key::Layout('v')); - enigo.key_up(Key::Control); - - Ok(()) - }).await; - - match result { - Ok(Ok(())) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::EnigoText, duration); - info!("Successfully triggered paste action via enigo"); - Ok(()) - } - Ok(Err(e)) => Err(e), - Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed - } - } -} - -#[async_trait] -impl TextInjector for EnigoInjector { - fn name(&self) -> &'static str { - "Enigo" - } - - fn is_available(&self) -> bool { - self.is_available && self.config.allow_enigo - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - // First try paste action (more reliable for batch text) - // We need to set the clipboard first, but that's handled by the strategy manager - // So we just trigger the paste - match self.trigger_paste().await { - Ok(()) => Ok(()), - Err(e) => { - debug!("Paste action failed: {}", e); - // Fall back to direct typing - self.type_text(text).await - } - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/focus.rs b/crates/app/src/text_injection/focus.rs deleted file mode 100644 index 20898f1a..00000000 --- a/crates/app/src/text_injection/focus.rs +++ /dev/null @@ -1,129 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError}; -use std::time::{Duration, Instant}; -use tracing::debug; - -/// Status of current focus in the system -#[derive(Debug, Clone, Copy, PartialEq, Eq)] -pub enum FocusStatus { - /// Focus is on an editable text element - EditableText, - /// Focus is on a non-editable element - NonEditable, - /// Focus status is unknown or could not be determined - Unknown, -} - -/// Tracks the current focused element for text injection targeting -pub struct FocusTracker { - _config: InjectionConfig, - last_check: Option, - cached_status: Option, - cache_duration: Duration, -} - -impl FocusTracker { - /// Create a new focus tracker - pub fn new(config: InjectionConfig) -> Self { - let cache_duration = Duration::from_millis(config.focus_cache_duration_ms); - Self { - _config: config, - last_check: None, - cached_status: None, - cache_duration, - } - } - - /// Get the current focus status - pub async fn get_focus_status(&mut self) -> Result { - // Check if we have a valid cached result - if let (Some(last_check), Some(status)) = (self.last_check, self.cached_status) { - if last_check.elapsed() < self.cache_duration { - debug!("Using cached focus status: {:?}", status); - return Ok(status); - } - } - - // Get fresh focus status - let status = self.check_focus_status().await?; - - // Cache the result - self.last_check = Some(Instant::now()); - self.cached_status = Some(status); - - debug!("Focus status determined: {:?}", status); - Ok(status) - } - - /// Check the actual focus status - async fn check_focus_status(&self) -> Result { - #[cfg(feature = "text-injection-atspi")] - { - // TODO: Implement real AT-SPI focus detection once API is stable - // For now, return a reasonable default - debug!("AT-SPI focus detection placeholder - returning Unknown"); - return Ok(FocusStatus::Unknown); - } - - #[cfg(not(feature = "text-injection-atspi"))] - { - // Fallback: Without AT-SPI, we can't reliably determine focus - debug!("AT-SPI not available, returning unknown focus status"); - Ok(FocusStatus::Unknown) - } - } - - /// Clear the focus cache (useful when window focus changes) - pub fn clear_cache(&mut self) { - self.last_check = None; - self.cached_status = None; - debug!("Focus cache cleared"); - } - - /// Get the cached focus status without checking - pub fn cached_focus_status(&self) -> Option { - self.cached_status - } -} - -#[cfg(test)] -mod tests { - use super::*; - - #[tokio::test] - async fn test_focus_tracker_creation() { - let config = InjectionConfig::default(); - let tracker = FocusTracker::new(config); - - assert!(tracker.cached_focus_status().is_none()); - } - - #[tokio::test] - async fn test_focus_status_caching() { - let config = InjectionConfig::default(); - let mut tracker = FocusTracker::new(config); - - // First check should not use cache - let status1 = tracker.get_focus_status().await.unwrap(); - assert!(tracker.cached_focus_status().is_some()); - - // Second check should use cache - let status2 = tracker.get_focus_status().await.unwrap(); - assert_eq!(status1, status2); - } - - #[test] - fn test_cache_clearing() { - let config = InjectionConfig::default(); - let mut tracker = FocusTracker::new(config); - - // Manually set cache - tracker.cached_status = Some(FocusStatus::EditableText); - tracker.last_check = Some(Instant::now()); - - assert!(tracker.cached_focus_status().is_some()); - - // Clear cache - tracker.clear_cache(); - assert!(tracker.cached_focus_status().is_none()); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/kdotool_injector.rs b/crates/app/src/text_injection/kdotool_injector.rs deleted file mode 100644 index 0ff5f86b..00000000 --- a/crates/app/src/text_injection/kdotool_injector.rs +++ /dev/null @@ -1,150 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use anyhow::Result; -use std::process::Command; -use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// Kdotool injector for KDE window activation/focus assistance -pub struct KdotoolInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - /// Whether kdotool is available on the system - is_available: bool, -} - -impl KdotoolInjector { - /// Create a new kdotool injector - pub fn new(config: InjectionConfig) -> Self { - let is_available = Self::check_kdotool(); - - Self { - config, - metrics: InjectionMetrics::default(), - is_available, - } - } - - /// Check if kdotool is available on the system - fn check_kdotool() -> bool { - Command::new("which") - .arg("kdotool") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) - } - - /// Get the currently active window ID - async fn get_active_window(&self) -> Result { - let output = timeout( - Duration::from_millis(self.config.discovery_timeout_ms), - tokio::process::Command::new("kdotool") - .arg("getactivewindow") - .output(), - ) - .await - .map_err(|_| InjectionError::Timeout(self.config.discovery_timeout_ms))? - .map_err(|e| InjectionError::Process(e))?; - - if !output.status.success() { - let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool getactivewindow failed: {}", stderr))); - } - - let window_id = String::from_utf8_lossy(&output.stdout).trim().to_string(); - Ok(window_id) - } - - /// Activate a window by ID - async fn activate_window(&self, window_id: &str) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - let output = timeout( - Duration::from_millis(self.config.per_method_timeout_ms), - tokio::process::Command::new("kdotool") - .args(&["windowactivate", window_id]) - .output(), - ) - .await - .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? - .map_err(|e| InjectionError::Process(e))?; - - if !output.status.success() { - let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool windowactivate failed: {}", stderr))); - } - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::KdoToolAssist, duration); - info!("Successfully activated window {}", window_id); - - Ok(()) - } - - /// Focus a window by ID - async fn focus_window(&self, window_id: &str) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - let output = timeout( - Duration::from_millis(self.config.per_method_timeout_ms), - tokio::process::Command::new("kdotool") - .args(&["windowfocus", window_id]) - .output(), - ) - .await - .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? - .map_err(|e| InjectionError::Process(e))?; - - if !output.status.success() { - let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool windowfocus failed: {}", stderr))); - } - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::KdoToolAssist, duration); - info!("Successfully focused window {}", window_id); - - Ok(()) - } -} - -#[async_trait] -impl TextInjector for KdotoolInjector { - fn name(&self) -> &'static str { - "Kdotool" - } - - fn is_available(&self) -> bool { - self.is_available && self.config.allow_kdotool - } - - async fn inject(&mut self, _text: &str) -> Result<(), InjectionError> { - // Kdotool is only used for window activation/focus assistance - // It doesn't actually inject text, so this method should not be called - // directly for text injection - Err(InjectionError::MethodUnavailable("Kdotool is only for window activation/focus assistance".to_string())) - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} - -impl KdotoolInjector { - /// Ensure the target window is active and focused - pub async fn ensure_focus(&self, window_id: Option<&str>) -> Result<(), InjectionError> { - let target_window = match window_id { - Some(id) => id.to_string(), - None => self.get_active_window().await?, - }; - - // First focus the window - self.focus_window(&target_window).await?; - - // Then activate it - self.activate_window(&target_window).await?; - - Ok(()) - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/manager.rs b/crates/app/src/text_injection/manager.rs deleted file mode 100644 index f239f5f6..00000000 --- a/crates/app/src/text_injection/manager.rs +++ /dev/null @@ -1,987 +0,0 @@ -use crate::text_injection::backend::{Backend, BackendDetector}; -use crate::text_injection::focus::{FocusTracker, FocusStatus}; -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; - -// Import injectors -#[cfg(feature = "text-injection-atspi")] -use crate::text_injection::atspi_injector::AtspiInjector; -#[cfg(feature = "text-injection-clipboard")] -use crate::text_injection::clipboard_injector::ClipboardInjector; -#[cfg(all(feature = "text-injection-clipboard", feature = "text-injection-atspi"))] -use crate::text_injection::combo_clip_atspi::ComboClipboardAtspi; -#[cfg(feature = "text-injection-ydotool")] -use crate::text_injection::ydotool_injector::YdotoolInjector; -#[cfg(feature = "text-injection-enigo")] -use crate::text_injection::enigo_injector::EnigoInjector; -#[cfg(feature = "text-injection-mki")] -use crate::text_injection::mki_injector::MkiInjector; -use crate::text_injection::noop_injector::NoOpInjector; -#[cfg(feature = "text-injection-kdotool")] -use crate::text_injection::kdotool_injector::KdotoolInjector; -use crate::text_injection::window_manager; -use std::collections::HashMap; -use std::sync::{Arc, Mutex}; -use std::time::{Duration, Instant}; -use tracing::{debug, error, info, trace, warn}; -use std::collections::hash_map::DefaultHasher; -use std::hash::{Hash, Hasher}; - -/// Key for identifying a specific app-method combination -type AppMethodKey = (String, InjectionMethod); - -/// Redact text content for privacy-first logging -fn redact_text(text: &str, redact: bool) -> String { - if redact { - // Use a fast, stable std hasher to avoid allocating or logging raw text - let mut hasher = DefaultHasher::new(); - text.hash(&mut hasher); - let hash = hasher.finish(); - format!("len={} hash={:08x}", text.len(), (hash & 0xFFFFFFFF)) - } else { - text.to_string() - } -} - -/// Record of success/failure for a specific app-method combination -#[derive(Debug, Clone)] -struct SuccessRecord { - success_count: u32, - fail_count: u32, - last_success: Option, - last_failure: Option, - /// Success rate (0.0 to 1.0) - success_rate: f64, -} - -/// State of cooldown for a specific app-method combination -#[derive(Debug, Clone)] -struct CooldownState { - until: Instant, - backoff_level: u32, - last_error: String, -} - -/// Registry of available text injectors -struct InjectorRegistry { - injectors: HashMap>, -} - -impl InjectorRegistry { - fn build(config: &InjectionConfig, backend_detector: &BackendDetector) -> Self { - let mut injectors: HashMap> = HashMap::new(); - - // Check backend availability - let backends = backend_detector.detect_available_backends(); - let _has_wayland = backends.iter().any(|b| matches!(b, Backend::WaylandXdgDesktopPortal | Backend::WaylandVirtualKeyboard)); - let _has_x11 = backends.iter().any(|b| matches!(b, Backend::X11Xdotool | Backend::X11Native)); - - // Add AT-SPI injector if available - #[cfg(feature = "text-injection-atspi")] - { - let injector = AtspiInjector::new(config.clone()); - if injector.is_available() { - injectors.insert(InjectionMethod::AtspiInsert, Box::new(injector)); - } - } - - // Add clipboard injectors if available - #[cfg(feature = "text-injection-clipboard")] - { - if has_wayland || has_x11 { - let clipboard_injector = ClipboardInjector::new(config.clone()); - if clipboard_injector.is_available() { - injectors.insert(InjectionMethod::Clipboard, Box::new(clipboard_injector)); - } - - // Add combo clipboard+AT-SPI if both are available - #[cfg(feature = "text-injection-atspi")] - { - let combo_injector = ComboClipboardAtspi::new(config.clone()); - if combo_injector.is_available() { - injectors.insert(InjectionMethod::ClipboardAndPaste, Box::new(combo_injector)); - } - } - } - } - - // Add optional injectors based on config - #[cfg(feature = "text-injection-ydotool")] - if config.allow_ydotool { - let ydotool = YdotoolInjector::new(config.clone()); - if ydotool.is_available() { - injectors.insert(InjectionMethod::YdoToolPaste, Box::new(ydotool)); - } - } - - #[cfg(feature = "text-injection-enigo")] - if config.allow_enigo { - let enigo = EnigoInjector::new(config.clone()); - if enigo.is_available() { - injectors.insert(InjectionMethod::EnigoText, Box::new(enigo)); - } - } - - #[cfg(feature = "text-injection-mki")] - if config.allow_mki { - let mki = MkiInjector::new(config.clone()); - if mki.is_available() { - injectors.insert(InjectionMethod::UinputKeys, Box::new(mki)); - } - } - - #[cfg(feature = "text-injection-kdotool")] - if config.allow_kdotool { - let kdotool = KdotoolInjector::new(config.clone()); - if kdotool.is_available() { - injectors.insert(InjectionMethod::KdoToolAssist, Box::new(kdotool)); - } - } - - // Add NoOpInjector as final fallback if no other injectors are available - if injectors.is_empty() { - injectors.insert(InjectionMethod::NoOp, Box::new(NoOpInjector::new(config.clone()))); - } - - Self { injectors } - } - - fn get_mut(&mut self, method: InjectionMethod) -> Option<&mut Box> { - self.injectors.get_mut(&method) - } - - fn contains(&self, method: InjectionMethod) -> bool { - self.injectors.contains_key(&method) - } -} - -/// Strategy manager for adaptive text injection -pub struct StrategyManager { - /// Configuration for injection - config: InjectionConfig, - /// Focus tracker for determining target context - focus_tracker: FocusTracker, - /// Cache of success records per app-method combination - success_cache: HashMap, - /// Cooldown states per app-method combination - cooldowns: HashMap, - /// Global start time for budget tracking - global_start: Option, - /// Metrics for the strategy manager - metrics: Arc>, - /// Backend detector for platform-specific capabilities - backend_detector: BackendDetector, - /// Registry of available injectors - injectors: InjectorRegistry, - /// Cached method ordering for the current app_id - cached_method_order: Option<(String, Vec)>, - /// Cached compiled allowlist regex patterns - #[cfg(feature = "text-injection-regex")] - allowlist_regexes: Vec, - /// Cached compiled blocklist regex patterns - #[cfg(feature = "text-injection-regex")] - blocklist_regexes: Vec, -} - -impl StrategyManager { - /// Create a new strategy manager - pub fn new(config: InjectionConfig, metrics: Arc>) -> Self { - let backend_detector = BackendDetector::new(config.clone()); - if let Some(backend) = backend_detector.get_preferred_backend() { - info!("Selected backend: {:?}", backend); - } else { - warn!("No suitable backend found for text injection"); - if let Ok(mut m) = metrics.lock() { m.record_backend_denied(); } - } - - // Build injector registry - let injectors = InjectorRegistry::build(&config, &backend_detector); - - // Compile regex patterns once for performance - #[cfg(feature = "text-injection-regex")] - let allowlist_regexes = config - .allowlist - .iter() - .filter_map(|pattern| match regex::Regex::new(pattern) { - Ok(re) => Some(re), - Err(e) => { - warn!("Invalid allowlist regex pattern '{}': {}, skipping", pattern, e); - None - } - }) - .collect(); - - #[cfg(feature = "text-injection-regex")] - let blocklist_regexes = config - .blocklist - .iter() - .filter_map(|pattern| match regex::Regex::new(pattern) { - Ok(re) => Some(re), - Err(e) => { - warn!("Invalid blocklist regex pattern '{}': {}, skipping", pattern, e); - None - } - }) - .collect(); - - Self { - config: config.clone(), - focus_tracker: FocusTracker::new(config.clone()), - success_cache: HashMap::new(), - cooldowns: HashMap::new(), - global_start: None, - metrics, - backend_detector, - injectors, - cached_method_order: None, - #[cfg(feature = "text-injection-regex")] - allowlist_regexes, - #[cfg(feature = "text-injection-regex")] - blocklist_regexes, - } - } - - /// Public wrapper for tests and external callers to obtain method priority - pub fn get_method_priority(&mut self, app_id: &str) -> Vec { - self.get_method_order_cached(app_id) - } - - /// Get the current application identifier (e.g., window class) - pub(crate) async fn get_current_app_id(&self) -> Result { - // Use the robust window manager utility to get the active window class. - // This supports Wayland and X11 through various methods. - window_manager::get_active_window_class().await - } - - /// Check if injection is currently paused - fn is_paused(&self) -> bool { - // In a real implementation, this would check a global state - // For now, we'll always return false - false - } - -/// Check if the current application is allowed for injection -/// When feature text-injection-regex is enabled, compile patterns once at StrategyManager construction -/// and store Regex objects; else fallback to substring match. -/// Note: invalid regex should log and skip that pattern. -/// TODO: Store compiled regexes in the manager state for performance. -/// Performance consideration: Regex compilation is expensive, so cache compiled patterns. -/// Invalid patterns should be logged as warnings and skipped, not crash the system. -pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { - // If allowlist is not empty, only allow apps in the allowlist - if !self.config.allowlist.is_empty() { - #[cfg(feature = "text-injection-regex")] - return self.allowlist_regexes.iter().any(|re| re.is_match(app_id)); - #[cfg(not(feature = "text-injection-regex"))] - return self.config.allowlist.iter().any(|pattern| app_id.contains(pattern)); - } - - // If blocklist is not empty, block apps in the blocklist - if !self.config.blocklist.is_empty() { - #[cfg(feature = "text-injection-regex")] - return !self.blocklist_regexes.iter().any(|re| re.is_match(app_id)); - #[cfg(not(feature = "text-injection-regex"))] - return !self.config.blocklist.iter().any(|pattern| app_id.contains(pattern)); - } - - // If neither allowlist nor blocklist is set, allow all apps - true -} - - /// Check if a method is in cooldown for the current app - pub(crate) fn is_in_cooldown(&self, method: InjectionMethod) -> bool { - let now = Instant::now(); - self.cooldowns.iter().any(|((_, m), cd)| *m == method && now < cd.until) - } - - /// Update success record with time-based decay for old records - pub(crate) fn update_success_record(&mut self, app_id: &str, method: InjectionMethod, success: bool) { - let key = (app_id.to_string(), method); - - let record = self.success_cache.entry(key.clone()).or_insert_with(|| SuccessRecord { - success_count: 0, - fail_count: 0, - last_success: None, - last_failure: None, - success_rate: 0.5, // Start with neutral 50% - }); - - // No decay to keep counts deterministic for tests - - // Update counts - if success { - record.success_count += 1; - record.last_success = Some(Instant::now()); - } else { - record.fail_count += 1; - record.last_failure = Some(Instant::now()); - } - - // Recalculate success rate with minimum sample size - let total = record.success_count + record.fail_count; - if total > 0 { - record.success_rate = record.success_count as f64 / total as f64; - } else { - record.success_rate = 0.5; // Default to 50% - } - - // Apply cooldown for repeated failures - let should_cooldown = !success && record.fail_count > 2; - - debug!( - "Updated success record for {}/{:?}: {:.1}% ({}/{})", - app_id, method, record.success_rate * 100.0, - record.success_count, total - ); - - if should_cooldown { - self.apply_cooldown(app_id, method, "Multiple consecutive failures"); - } - } - - /// Apply exponential backoff cooldown for a failed method - pub(crate) fn apply_cooldown(&mut self, app_id: &str, method: InjectionMethod, error: &str) { - let key = (app_id.to_string(), method); - - let cooldown = self.cooldowns.entry(key).or_insert_with(|| CooldownState { - until: Instant::now(), - backoff_level: 0, - last_error: String::new(), - }); - - // Calculate cooldown duration with exponential backoff - let base_ms = self.config.cooldown_initial_ms; - let factor = self.config.cooldown_backoff_factor; - let max_ms = self.config.cooldown_max_ms; - - let cooldown_ms = (base_ms as f64 * (factor as f64).powi(cooldown.backoff_level as i32)) - .min(max_ms as f64) as u64; - - cooldown.until = Instant::now() + Duration::from_millis(cooldown_ms); - cooldown.backoff_level += 1; - cooldown.last_error = error.to_string(); - - warn!( - "Applied cooldown for {}/{:?}: {}ms (level {})", - app_id, method, cooldown_ms, cooldown.backoff_level - ); - } - - /// Update cooldown state for a failed method (legacy method for compatibility) - fn update_cooldown(&mut self, method: InjectionMethod, error: &str) { - // TODO: This should use actual app_id from get_current_app_id() - let app_id = "unknown_app"; - self.apply_cooldown(app_id, method, error); - } - - /// Clear cooldown for a method (e.g., after successful use) - fn clear_cooldown(&mut self, method: InjectionMethod) { - let app_id = "unknown_app"; // Placeholder - would be from get_current_app_id - let key = (app_id.to_string(), method); - self.cooldowns.remove(&key); - } - - - - /// Get the preferred method order based on current context and history (cached per app) - pub(crate) fn get_method_order_cached(&mut self, app_id: &str) -> Vec { - // Use cached order when app_id unchanged - if let Some((cached_app, cached_order)) = &self.cached_method_order { - if cached_app == app_id { - return cached_order.clone(); - } - } - - // Build the base order of methods. - let mut base_order = self.build_base_method_order(); - - // Deduplicate while preserving order - use std::collections::HashSet; - let mut seen = HashSet::new(); - base_order.retain(|m| seen.insert(*m)); - - // Sort by preference: methods with higher success rate first, then by base order - - - // Create a copy of base order for position lookup - let base_order_copy = base_order.clone(); - - base_order.sort_by(|a, b| { - let key_a = (app_id.to_string(), *a); - let key_b = (app_id.to_string(), *b); - - let success_a = self.success_cache.get(&key_a).map(|r| r.success_rate).unwrap_or(0.5); - let success_b = self.success_cache.get(&key_b).map(|r| r.success_rate).unwrap_or(0.5); - - // Sort by success rate (descending), then by base order - success_b.partial_cmp(&success_a).unwrap().then_with(|| { - // Preserve base order for equal success rates - let pos_a = base_order_copy.iter().position(|m| m == a).unwrap_or(0); - let pos_b = base_order_copy.iter().position(|m| m == b).unwrap_or(0); - pos_a.cmp(&pos_b) - }) - }); - - // Ensure NoOp is always available as a last resort - base_order.push(InjectionMethod::NoOp); - - // Cache and return - self.cached_method_order = Some((app_id.to_string(), base_order.clone())); - base_order - } - - /// Back-compat: previous tests may call no-arg version; compute without caching - #[allow(dead_code)] - pub fn get_method_order_uncached(&self) -> Vec { - // Compute using a placeholder app id without affecting cache - let mut base_order = self.build_base_method_order(); - // Sort by success rate for placeholder app id - let app_id = "unknown_app"; - let base_order_copy = base_order.clone(); - base_order.sort_by(|a, b| { - let key_a = (app_id.to_string(), *a); - let key_b = (app_id.to_string(), *b); - let success_a = self.success_cache.get(&key_a).map(|r| r.success_rate).unwrap_or(0.5); - let success_b = self.success_cache.get(&key_b).map(|r| r.success_rate).unwrap_or(0.5); - success_b.partial_cmp(&success_a).unwrap().then_with(|| { - let pos_a = base_order_copy.iter().position(|m| m == a).unwrap_or(0); - let pos_b = base_order_copy.iter().position(|m| m == b).unwrap_or(0); - pos_a.cmp(&pos_b) - }) - }); - base_order.push(InjectionMethod::NoOp); - base_order - } - - /// Builds the base, unsorted list of available injection methods. - fn build_base_method_order(&self) -> Vec { - let available_backends = self.backend_detector.detect_available_backends(); - let mut base_order = Vec::new(); - - for backend in available_backends { - match backend { - Backend::WaylandXdgDesktopPortal - | Backend::WaylandVirtualKeyboard - | Backend::X11Xdotool - | Backend::X11Native - | Backend::MacCgEvent - | Backend::WindowsSendInput => { - base_order.push(InjectionMethod::AtspiInsert); - base_order.push(InjectionMethod::ClipboardAndPaste); - base_order.push(InjectionMethod::Clipboard); - } - _ => {} - } - } - - // Add optional, opt-in fallbacks - if self.config.allow_kdotool { - base_order.push(InjectionMethod::KdoToolAssist); - } - if self.config.allow_enigo { - base_order.push(InjectionMethod::EnigoText); - } - if self.config.allow_mki { - base_order.push(InjectionMethod::UinputKeys); - } - if self.config.allow_ydotool { - base_order.push(InjectionMethod::YdoToolPaste); - } - - base_order - } - - /// Check if we've exceeded the global time budget - fn has_budget_remaining(&self) -> bool { - if let Some(start) = self.global_start { - let elapsed = start.elapsed(); - let budget = self.config.max_total_latency(); - elapsed < budget - } else { - true - } - } - - /// Chunk text and paste with delays between chunks - #[allow(dead_code)] - async fn chunk_and_paste(&mut self, injector: &mut Box, text: &str) -> Result<(), InjectionError> { - let chunk_size = self.config.paste_chunk_chars as usize; - - // Use iterator-based chunking without collecting - let mut start = 0; - - // Record paste operation - if let Ok(mut m) = self.metrics.lock() { - m.record_paste(); - } - - while start < text.len() { - // Check budget before each chunk - if !self.has_budget_remaining() { - return Err(InjectionError::BudgetExhausted); - } - - // Find chunk boundary at character boundary - let mut end = (start + chunk_size).min(text.len()); - while !text.is_char_boundary(end) && end < text.len() { - end += 1; - } - - let chunk = &text[start..end]; - injector.paste(chunk).await?; - - start = end; - - // Delay between chunks (except after last) - if start < text.len() { - tokio::time::sleep(Duration::from_millis(self.config.chunk_delay_ms)).await; - } - } - - // Record metrics - if let Ok(mut m) = self.metrics.lock() { - m.record_injected_chars(text.len() as u64); - m.record_flush(text.len() as u64); - } - - Ok(()) - } - - /// Type text with pacing based on keystroke rate - #[allow(dead_code)] - async fn pace_type_text(&mut self, injector: &mut Box, text: &str) -> Result<(), InjectionError> { - let rate_cps = self.config.keystroke_rate_cps; - let max_burst = self.config.max_burst_chars as usize; - - // Record keystroke operation - if let Ok(mut m) = self.metrics.lock() { - m.record_keystroke(); - } - - // Use iterator-based chunking without collecting - let mut start = 0; - - while start < text.len() { - // Check budget before each burst - if !self.has_budget_remaining() { - return Err(InjectionError::BudgetExhausted); - } - - // Find burst boundary at character boundary - let mut end = (start + max_burst).min(text.len()); - while !text.is_char_boundary(end) && end < text.len() { - end += 1; - } - - let burst = &text[start..end]; - injector.type_text(burst, rate_cps).await?; - - // Calculate delay based on burst size and rate - let delay_ms = (burst.len() as f64 / rate_cps as f64 * 1000.0) as u64; - if delay_ms > 0 { - tokio::time::sleep(Duration::from_millis(delay_ms)).await; - } - - start = end; - } - - // Record metrics - if let Ok(mut m) = self.metrics.lock() { - m.record_injected_chars(text.len() as u64); - } - - Ok(()) - } - - /// Try to inject text using the best available method - pub async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - // Log the injection request with redaction - let redacted = redact_text(text, self.config.redact_logs); - debug!("Injection requested for text: {}", redacted); - if !self.config.redact_logs { - trace!("Full text to inject: {}", text); - } - - // Check if injection is paused - if self.is_paused() { - return Err(InjectionError::Other("Injection is currently paused".to_string())); - } - - // Start global timer - self.global_start = Some(Instant::now()); - - // Get current focus status - let focus_status = match self.focus_tracker.get_focus_status().await { - Ok(status) => status, - Err(e) => { - warn!("Failed to get focus status: {}", e); - // Continue with injection attempt - FocusStatus::Unknown - } - }; - - // Check if we should inject on unknown focus - if focus_status == FocusStatus::Unknown && !self.config.inject_on_unknown_focus { - if let Ok(mut metrics) = self.metrics.lock() { - metrics.record_focus_missing(); - } - return Err(InjectionError::Other("Unknown focus state and injection disabled".to_string())); - } - - // Check if focus is required - if self.config.require_focus && focus_status == FocusStatus::NonEditable { - if let Ok(mut metrics) = self.metrics.lock() { - metrics.record_focus_missing(); - } - return Err(InjectionError::NoEditableFocus); - } - - // Get current application ID - let app_id = self.get_current_app_id().await?; - - // Check allowlist/blocklist - if !self.is_app_allowed(&app_id) { - return Err(InjectionError::Other(format!("Application {} is not allowed for injection", app_id))); - } - - // Determine injection method based on config - let use_paste = match self.config.injection_mode.as_str() { - "paste" => true, - "keystroke" => false, - "auto" => text.len() > self.config.paste_chunk_chars as usize, - _ => text.len() > self.config.paste_chunk_chars as usize, // Default to auto - }; - - // Get ordered list of methods to try - let method_order = self.get_method_order_cached(&app_id); - - // Try each method in order - for method in method_order { - // Skip if in cooldown - if self.is_in_cooldown(method) { - debug!("Skipping method {:?} - in cooldown", method); - continue; - } - - // Check budget - if !self.has_budget_remaining() { - if let Ok(mut metrics) = self.metrics.lock() { - metrics.record_rate_limited(); - } - return Err(InjectionError::BudgetExhausted); - } - - // Skip if injector not available - if !self.injectors.contains(method) { - debug!("Skipping method {:?} - injector not available", method); - continue; - } - - // Try injection with the real injector - let start = Instant::now(); - // Perform the injector call in a narrow scope to avoid borrowing self across updates - let result = { - if let Some(injector) = self.injectors.get_mut(method) { - if use_paste { - // For now, perform a single paste operation; chunking is optional - injector.paste(text).await - } else { - injector.type_text(text, self.config.keystroke_rate_cps).await - } - } else { - continue; - } - }; - - match result { - Ok(()) => { - let duration = start.elapsed().as_millis() as u64; - if let Ok(mut m) = self.metrics.lock() { - m.record_success(method, duration); - } - self.update_success_record(&app_id, method, true); - self.clear_cooldown(method); - let redacted = redact_text(text, self.config.redact_logs); - info!("Successfully injected text {} using method {:?} with mode {:?}", - redacted, method, if use_paste { "paste" } else { "keystroke" }); - // Log full text only at trace level when not redacting - if !self.config.redact_logs { - trace!("Full text injected: {}", text); - } - return Ok(()); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - let error_string = e.to_string(); - if let Ok(mut m) = self.metrics.lock() { - m.record_failure(method, duration, error_string.clone()); - } - self.update_success_record(&app_id, method, false); - self.update_cooldown(method, &error_string); - debug!("Method {:?} failed: {}", method, error_string); - // Continue to next method - } - } - } - - // If we get here, all methods failed - error!("All injection methods failed"); - Err(InjectionError::MethodFailed("All injection methods failed".to_string())) - } - - - /// Get metrics for the strategy manager - pub fn metrics(&self) -> Arc> { - self.metrics.clone() - } - - /// Print injection statistics for debugging - pub fn print_stats(&self) { - if let Ok(metrics) = self.metrics.lock() { - info!("Injection Statistics:"); - info!(" Total attempts: {}", metrics.attempts); - info!(" Successes: {}", metrics.successes); - info!(" Failures: {}", metrics.failures); - info!(" Success rate: {:.1}%", - if metrics.attempts > 0 { - metrics.successes as f64 / metrics.attempts as f64 * 100.0 - } else { - 0.0 - }); - - // Print method-specific stats - for (method, m) in &metrics.method_metrics { - info!(" Method {:?}: {} attempts, {} successes, {} failures", - method, m.attempts, m.successes, m.failures); - } - } - } -} - -#[cfg(test)] -mod tests { - use super::*; - use std::time::Duration; - use async_trait::async_trait; - - - /// Mock injector for testing - #[allow(dead_code)] - struct MockInjector { - name: &'static str, - available: bool, - success_rate: f64, - metrics: InjectionMetrics, - } - - #[allow(dead_code)] - impl MockInjector { - fn new(name: &'static str, available: bool, success_rate: f64) -> Self { - Self { - name, - available, - success_rate, - metrics: InjectionMetrics::default(), - } - } - } - - #[async_trait] - impl TextInjector for MockInjector { - fn name(&self) -> &'static str { - self.name - } - - fn is_available(&self) -> bool { - self.available - } - - async fn inject(&mut self, _text: &str) -> Result<(), InjectionError> { - use std::time::SystemTime; - - // Simple pseudo-random based on system time - let pseudo_rand = (SystemTime::now().duration_since(SystemTime::UNIX_EPOCH) - .unwrap().as_nanos() % 100) as f64 / 100.0; - - if pseudo_rand < self.success_rate { - Ok(()) - } else { - Err(InjectionError::MethodFailed("Mock injection failed".to_string())) - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } - } - - // Test that strategy manager can be created - #[test] - fn test_strategy_manager_creation() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - { - let metrics = manager.metrics.lock().unwrap(); - assert_eq!(metrics.attempts, 0); - assert_eq!(metrics.successes, 0); - assert_eq!(metrics.failures, 0); - } - } - - // Test method ordering - #[test] - fn test_method_ordering() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - let order = manager.get_method_order_uncached(); - - // Verify core methods are present - assert!(order.contains(&InjectionMethod::AtspiInsert)); - assert!(order.contains(&InjectionMethod::ClipboardAndPaste)); - assert!(order.contains(&InjectionMethod::Clipboard)); - - // Verify optional methods are included if enabled - let mut config = InjectionConfig::default(); - config.allow_ydotool = true; - config.allow_kdotool = true; - config.allow_enigo = true; - config.allow_mki = true; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - let order = manager.get_method_order_uncached(); - - // All methods should be present - assert!(order.contains(&InjectionMethod::AtspiInsert)); - assert!(order.contains(&InjectionMethod::ClipboardAndPaste)); - assert!(order.contains(&InjectionMethod::Clipboard)); - assert!(order.contains(&InjectionMethod::YdoToolPaste)); - assert!(order.contains(&InjectionMethod::KdoToolAssist)); - assert!(order.contains(&InjectionMethod::EnigoText)); - assert!(order.contains(&InjectionMethod::UinputKeys)); - } - - // Test success record updates - #[test] - fn test_success_record_update() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config.clone(), metrics); - - // Test success - manager.update_success_record("unknown_app", InjectionMethod::AtspiInsert, true); - let key = ("unknown_app".to_string(), InjectionMethod::AtspiInsert); - let record = manager.success_cache.get(&key).unwrap(); - assert_eq!(record.success_count, 1); - assert_eq!(record.fail_count, 0); - assert!(record.success_rate > 0.4); - - // Test failure - manager.update_success_record("unknown_app", InjectionMethod::AtspiInsert, false); - let record = manager.success_cache.get(&key).unwrap(); - assert_eq!(record.success_count, 1); - assert_eq!(record.fail_count, 1); - assert!(record.success_rate > 0.3 && record.success_rate < 0.8); - } - - // Test cooldown updates - #[test] - fn test_cooldown_update() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config.clone(), metrics); - - // First failure - manager.update_cooldown(InjectionMethod::AtspiInsert, "test error"); - let key = ("unknown_app".to_string(), InjectionMethod::AtspiInsert); - let cooldown = manager.cooldowns.get(&key).unwrap(); - assert_eq!(cooldown.backoff_level, 1); - - // Second failure - backoff level should increase - manager.update_cooldown(InjectionMethod::AtspiInsert, "test error"); - let cooldown = manager.cooldowns.get(&key).unwrap(); - assert_eq!(cooldown.backoff_level, 2); - - // Duration should be longer - let base_duration = Duration::from_millis(config.cooldown_initial_ms); - let expected_duration = base_duration * 2u32.pow(1); // 2^1 = 2 - let actual_duration = cooldown.until.duration_since(Instant::now()); - // Allow some tolerance for timing - assert!(actual_duration >= expected_duration - Duration::from_millis(10)); - } - - // Test budget checking - #[test] - fn test_budget_checking() { - let mut config = InjectionConfig::default(); - config.max_total_latency_ms = 100; // 100ms budget - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // No start time - budget should be available - assert!(manager.has_budget_remaining()); - - // Set start time - manager.global_start = Some(Instant::now() - Duration::from_millis(50)); - assert!(manager.has_budget_remaining()); - - // Exceed budget - manager.global_start = Some(Instant::now() - Duration::from_millis(150)); - assert!(!manager.has_budget_remaining()); - } - - // Test injection with success - #[tokio::test] - async fn test_inject_success() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // Test with text - let result = manager.inject("test text").await; - // Don't require success in headless test env; just ensure it returns without panicking - assert!(result.is_ok() || result.is_err()); - - // Metrics are environment-dependent; just ensure call did not panic - } - - // Test injection with failure - #[tokio::test] - async fn test_inject_failure() { - let mut config = InjectionConfig::default(); - // Set very short budget to force failure - config.max_total_latency_ms = 1; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // This should fail due to budget exhaustion - let result = manager.inject("test text").await; - assert!(result.is_err()); - - // Metrics should reflect failure - // Note: Due to budget exhaustion, might not record metrics - // Just verify no panic - } - - // Test empty text handling - #[test] - fn test_empty_text() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // Inject empty text - // Should handle empty string gracefully - // Note: inject is async; here we simply ensure calling path compiles - let _ = manager.inject(""); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/mki_injector.rs b/crates/app/src/text_injection/mki_injector.rs deleted file mode 100644 index 6543a682..00000000 --- a/crates/app/src/text_injection/mki_injector.rs +++ /dev/null @@ -1,155 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use mouse_keyboard_input::{Keyboard, Key, KeyboardControllable}; -use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// Mouse-keyboard-input (MKI) injector for synthetic key events -pub struct MkiInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - /// Whether MKI is available and can be used - is_available: bool, -} - -impl MkiInjector { - /// Create a new MKI injector - pub fn new(config: InjectionConfig) -> Self { - let is_available = Self::check_availability(); - - Self { - config, - metrics: InjectionMetrics::default(), - is_available, - } - } - - /// Check if MKI can be used (permissions, backend availability) - fn check_availability() -> bool { - // Check if user is in input group - let in_input_group = std::process::Command::new("groups") - .output() - .map(|o| { - String::from_utf8_lossy(&o.stdout).contains("input") - }) - .unwrap_or(false); - - // Check if /dev/uinput is accessible - let uinput_accessible = std::fs::metadata("/dev/uinput") - .map(|metadata| { - let mode = metadata.permissions().mode(); - (mode & 0o060) == 0o060 || (mode & 0o006) == 0o006 - }) - .unwrap_or(false); - - in_input_group && uinput_accessible - } - - /// Type text using MKI - async fn type_text(&mut self, text: &str) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - let text_clone = text.to_string(); - - let result = tokio::task::spawn_blocking(move || { - let mut keyboard = Keyboard::new().map_err(|e| { - InjectionError::MethodFailed(format!("Failed to create keyboard: {}", e)) - })?; - - // Type each character with a small delay - for c in text_clone.chars() { - match c { - ' ' => keyboard.key_click(Key::Space).map_err(|e| InjectionError::MethodFailed(e.to_string()))?, - '\n' => keyboard.key_click(Key::Enter).map_err(|e| InjectionError::MethodFailed(e.to_string()))?, - '\t' => keyboard.key_click(Key::Tab).map_err(|e| InjectionError::MethodFailed(e.to_string()))?, - _ => { - if c.is_ascii() { - keyboard.key_sequence(&c.to_string()).map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - } else { - // For non-ASCII characters, we might need to use clipboard - return Err(InjectionError::MethodFailed("MKI doesn't support non-ASCII characters directly".to_string())); - } - } - } - - // Small delay between characters - std::thread::sleep(Duration::from_millis(10)); - } - - Ok(()) - }).await; - - match result { - Ok(Ok(())) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::UinputKeys, duration); - info!("Successfully typed text via MKI ({} chars)", text.len()); - Ok(()) - } - Ok(Err(e)) => Err(e), - Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed - } - } - - /// Trigger paste action using MKI (Ctrl+V) - async fn trigger_paste(&mut self) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - let result = tokio::task::spawn_blocking(|| { - let mut keyboard = Keyboard::new().map_err(|e| { - InjectionError::MethodFailed(format!("Failed to create keyboard: {}", e)) - })?; - - // Press Ctrl+V - keyboard.key_down(Key::Control).map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - keyboard.key_click(Key::V).map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - keyboard.key_up(Key::Control).map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - - Ok(()) - }).await; - - match result { - Ok(Ok(())) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::UinputKeys, duration); - info!("Successfully triggered paste action via MKI"); - Ok(()) - } - Ok(Err(e)) => Err(e), - Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed - } - } -} - -#[async_trait] -impl TextInjector for MkiInjector { - fn name(&self) -> &'static str { - "MKI" - } - - fn is_available(&self) -> bool { - self.is_available && self.config.allow_mki - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - // First try paste action (more reliable for batch text) - // We need to set the clipboard first, but that's handled by the strategy manager - // So we just trigger the paste - match self.trigger_paste().await { - Ok(()) => Ok(()), - Err(e) => { - debug!("Paste action failed: {}", e); - // Fall back to direct typing - self.type_text(text).await - } - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/mod.rs b/crates/app/src/text_injection/mod.rs index ede2ba0f..e70b7689 100644 --- a/crates/app/src/text_injection/mod.rs +++ b/crates/app/src/text_injection/mod.rs @@ -1,40 +1,2 @@ -pub mod backend; -pub mod focus; -pub mod manager; -pub mod processor; -pub mod session; -pub mod types; -pub mod window_manager; - -// Individual injector modules -#[cfg(feature = "text-injection-atspi")] -pub mod atspi_injector; -#[cfg(feature = "text-injection-clipboard")] -pub mod clipboard_injector; -#[cfg(all(feature = "text-injection-clipboard", feature = "text-injection-atspi"))] -pub mod combo_clip_atspi; -#[cfg(feature = "text-injection-ydotool")] -pub mod ydotool_injector; -#[cfg(feature = "text-injection-enigo")] -pub mod enigo_injector; -#[cfg(feature = "text-injection-mki")] -pub mod mki_injector; -// NoOp fallback is always available -pub mod noop_injector; -#[cfg(feature = "text-injection-kdotool")] -pub mod kdotool_injector; - -#[cfg(test)] -mod tests; - -#[cfg(feature = "text-injection")] -pub mod probes; - -// Re-export key components -pub use processor::{AsyncInjectionProcessor, ProcessorMetrics, InjectionProcessor}; -pub use session::{InjectionSession, SessionConfig, SessionState}; -pub use types::{InjectionConfig, InjectionError, InjectionMethod, InjectionResult}; -pub use backend::Backend; - -#[cfg(feature = "text-injection")] -pub use manager::StrategyManager; \ No newline at end of file +// Re-export everything from the coldvox-text-injection crate +pub use coldvox_text_injection::*; \ No newline at end of file diff --git a/crates/app/src/text_injection/noop_injector.rs b/crates/app/src/text_injection/noop_injector.rs deleted file mode 100644 index 823f1ab9..00000000 --- a/crates/app/src/text_injection/noop_injector.rs +++ /dev/null @@ -1,91 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use async_trait::async_trait; - -/// NoOp injector that always succeeds but does nothing -/// Used as a fallback when no other injectors are available -pub struct NoOpInjector { - metrics: InjectionMetrics, -} - -impl NoOpInjector { - /// Create a new NoOp injector - pub fn new(_config: InjectionConfig) -> Self { - Self { - metrics: InjectionMetrics::default(), - } - } -} - -#[async_trait] -impl TextInjector for NoOpInjector { - fn name(&self) -> &'static str { - "NoOp" - } - - fn is_available(&self) -> bool { - true // Always available as fallback - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - let start = std::time::Instant::now(); - - // Record the operation but do nothing - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::NoOp, duration); - - tracing::debug!("NoOp injector: would inject {} characters", text.len()); - - Ok(()) - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} - -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_noop_injector_creation() { - let config = InjectionConfig::default(); - let injector = NoOpInjector::new(config); - - assert_eq!(injector.name(), "NoOp"); - assert!(injector.is_available()); - assert_eq!(injector.metrics().attempts, 0); - } - - #[tokio::test] - async fn test_noop_inject_success() { - let config = InjectionConfig::default(); - let mut injector = NoOpInjector::new(config); - - let result = injector.inject("test text").await; - assert!(result.is_ok()); - - // Check metrics - let metrics = injector.metrics(); - assert_eq!(metrics.successes, 1); - assert_eq!(metrics.attempts, 1); - assert_eq!(metrics.failures, 0); - } - - #[tokio::test] - async fn test_noop_inject_empty_text() { - let config = InjectionConfig::default(); - let mut injector = NoOpInjector::new(config); - - let result = injector.inject("").await; - assert!(result.is_ok()); - - // Should not record metrics for empty text - let metrics = injector.metrics(); - assert_eq!(metrics.attempts, 0); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/probes.rs b/crates/app/src/text_injection/probes.rs deleted file mode 100644 index 8b650d56..00000000 --- a/crates/app/src/text_injection/probes.rs +++ /dev/null @@ -1,130 +0,0 @@ -use std::process::Command; -use tracing::{debug, warn}; - -/// A collection of checks to determine which injection methods are available. -#[derive(Debug, Clone, Default)] -pub struct CapabilityReport { - pub is_wayland: bool, - pub is_atspi_available: bool, - pub is_wl_clipboard_available: bool, - pub is_ydotool_available: bool, - pub is_kdotool_available: bool, - pub has_uinput_access: bool, -} - -impl CapabilityReport { - /// Run all probes and generate a report. - pub fn new() -> Self { - Self { - is_wayland: is_wayland(), - is_atspi_available: is_atspi_available(), - is_wl_clipboard_available: is_wl_clipboard_available(), - is_ydotool_available: is_ydotool_available(), - is_kdotool_available: is_kdotool_available(), - has_uinput_access: has_uinput_access(), - } - } - - pub fn log(&self) { - debug!("Capability Report:"); - debug!(" Wayland session: {}", self.is_wayland); - debug!(" AT-SPI bus: {}", self.is_atspi_available); - debug!(" wl-clipboard: {}", self.is_wl_clipboard_available); - debug!(" ydotool: {}", self.is_ydotool_available); - debug!(" kdotool: {}", self.is_kdotool_available); - debug!(" uinput access: {}", self.has_uinput_access); - } -} - -/// Check if running in a Wayland session. -pub fn is_wayland() -> bool { - std::env::var("WAYLAND_DISPLAY").is_ok() -} - -/// Check if the AT-SPI bus is available. -/// This is a basic check; a full check would involve trying to connect. -pub fn is_atspi_available() -> bool { - // A proper check would be to try to connect to the bus. - // For now, we'll check for the accessibility environment variable. - // This is not foolproof, but it's a good hint. - let atspi_bus_addr = std::env::var("AT_SPI_BUS_ADDRESS"); - if atspi_bus_addr.is_err() { - warn!("AT_SPI_BUS_ADDRESS not set, assuming accessibility is disabled."); - return false; - let is_available = std::env::var("AT_SPI_BUS_ADDRESS").is_ok(); - if !is_available { - warn!("AT_SPI_BUS_ADDRESS environment variable not set, assuming AT-SPI accessibility is disabled."); - } - true - is_available -} - -/// Check if `wl-copy` binary is in the PATH. -pub fn is_wl_clipboard_available() -> bool { - Command::new("which") - .arg("wl-copy") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) -} - -/// Check for `ydotool` binary and daemon socket. -pub fn is_ydotool_available() -> bool { - let binary_exists = Command::new("which") - .arg("ydotool") - .output() - .map(|o| o.status.success()) - .unwrap_or(false); - - if !binary_exists { - return false; - binary_exists && { - // Check for the socket, which is more reliable than just the binary. - // Use `id -u` as it's more reliable than the $UID env var. - let user_id = Command::new("id") - .arg("-u") - .output() - .ok() - .and_then(|o| String::from_utf8(o.stdout).ok()) - .map(|s| s.trim().to_string()) - .unwrap_or_else(|| "1000".to_string()); - - let socket_path = format!("/run/user/{}/.ydotool_socket", user_id); - std::path::Path::new(&socket_path).exists() - } - - // Check for the socket - let user_id = std::env::var("UID").unwrap_or_else(|_| "1000".to_string()); - let socket_path = format!("/run/user/{}/.ydotool_socket", user_id); - std::path::Path::new(&socket_path).exists() -} - -/// Check for `kdotool` binary. -pub fn is_kdotool_available() -> bool { - Command::new("which") - .arg("kdotool") - .output() - .map(|o| o.status.success()) - .unwrap_or(false) -} - -/// Check for write access to `/dev/uinput`. -pub fn has_uinput_access() -> bool { - use std::fs; - use std::os::unix::fs::PermissionsExt; - - if let Ok(metadata) = fs::metadata("/dev/uinput") { - let perms = metadata.permissions(); - // Check if the file is writable by the current user. - // This is a simplified check. A more robust check would involve checking group membership. - return perms.mode() & 0o002 != 0; // Writable by "other" - } - false - // The most reliable way to check for write access is to try to open the file. - // This avoids race conditions and complex permission-checking logic (e.g., - // checking user/group ownership and modes). - std::fs::OpenOptions::new() - .write(true) - .open("/dev/uinput") - .is_ok() -} diff --git a/crates/app/src/text_injection/processor.rs b/crates/app/src/text_injection/processor.rs deleted file mode 100644 index 656b1758..00000000 --- a/crates/app/src/text_injection/processor.rs +++ /dev/null @@ -1,469 +0,0 @@ -use crate::stt::TranscriptionEvent; -use coldvox_telemetry::pipeline_metrics::PipelineMetrics; -use std::sync::{Arc, Mutex}; use tokio::sync::Mutex as TokioMutex; -use tokio::sync::mpsc; -use tokio::time::{self, Duration, Instant}; -use tracing::{debug, error, info, warn}; - -use super::session::{InjectionSession, SessionConfig, SessionState}; -use super::{InjectionConfig}; -use super::manager::StrategyManager; -use crate::text_injection::types::InjectionMetrics; - -/// Local metrics for the injection processor (UI/state), distinct from types::InjectionMetrics -#[derive(Debug, Clone, Default)] -pub struct ProcessorMetrics { - /// Current session state - pub session_state: SessionState, - /// Number of transcriptions in current buffer - pub buffer_size: usize, - /// Total characters in buffer - pub buffer_chars: usize, - /// Time since last transcription (ms) - pub time_since_last_transcription_ms: Option, - /// Total successful injections - pub successful_injections: u64, - /// Total failed injections - pub failed_injections: u64, - /// Last injection timestamp - pub last_injection_time: Option, -} - -impl ProcessorMetrics { - /// Update metrics from current session state - pub fn update_from_session(&mut self, session: &InjectionSession) { - self.session_state = session.state(); - self.buffer_size = session.buffer_len(); - self.buffer_chars = session.total_chars(); - self.time_since_last_transcription_ms = session - .time_since_last_transcription() - .map(|d| d.as_millis() as u64); - } -} - -/// Processor that manages session-based text injection -pub struct InjectionProcessor { - /// The injection session - session: InjectionSession, - /// Text injector for performing the actual injection - injector: StrategyManager, - /// Configuration - config: InjectionConfig, - /// Metrics for telemetry - metrics: Arc>, - /// Shared injection metrics for all components - injection_metrics: Arc>, - /// Pipeline metrics for integration - _pipeline_metrics: Option>, -} - -impl InjectionProcessor { - /// Create a new injection processor - pub fn new( - config: InjectionConfig, - pipeline_metrics: Option>, - injection_metrics: Arc>, - ) -> Self { - // Create session with shared metrics - let session_config = SessionConfig::default(); // TODO: Expose this if needed - let session = InjectionSession::new(session_config, injection_metrics.clone()); - - let injector = StrategyManager::new(config.clone(), injection_metrics.clone()); - - let metrics = Arc::new(Mutex::new(ProcessorMetrics { - session_state: SessionState::Idle, - ..Default::default() - })); - - Self { - session, - injector, - config, - metrics, - injection_metrics, - _pipeline_metrics: pipeline_metrics, - } - } - - /// Prepare an injection by checking session state and extracting buffered text if ready. - /// Returns Some(text) when there is content to inject, otherwise None. - pub fn prepare_injection(&mut self) -> Option { - if self.session.should_inject() { - let text = self.session.take_buffer(); - if !text.is_empty() { - info!("Injecting {} characters from session", text.len()); - return Some(text); - } - } - None - } - - /// Record the result of an injection attempt and refresh metrics. - pub fn record_injection_result(&mut self, success: bool) { - let mut metrics = self.metrics.lock().unwrap(); - if success { - metrics.successful_injections += 1; - metrics.last_injection_time = Some(Instant::now()); - } else { - metrics.failed_injections += 1; - } - self.update_metrics(); - } - - /// Get current metrics - pub fn metrics(&self) -> ProcessorMetrics { - self.metrics.lock().unwrap().clone() - } - - /// Handle a transcription event from the STT processor - pub fn handle_transcription(&mut self, event: TranscriptionEvent) { - match event { - TranscriptionEvent::Partial { text, utterance_id, .. } => { - debug!("Received partial transcription [{}]: {}", utterance_id, text); - self.update_metrics(); - } - TranscriptionEvent::Final { text, utterance_id, .. } => { - let text_len = text.len(); - info!("Received final transcription [{}]: {}", utterance_id, text); - self.session.add_transcription(text); - // Record the number of characters buffered - if let Ok(mut metrics) = self.injection_metrics.lock() { - metrics.record_buffered_chars(text_len as u64); - } - self.update_metrics(); - } - TranscriptionEvent::Error { code, message } => { - warn!("Transcription error [{}]: {}", code, message); - } - } - } - - /// Check if injection should be performed and execute if needed - pub async fn check_and_inject(&mut self) -> anyhow::Result<()> { - if self.session.should_inject() { - let use_paste = self.determine_use_paste(); - - // Record the operation type - if let Ok(mut metrics) = self.injection_metrics.lock() { - if use_paste { - metrics.record_paste(); - } else { - metrics.record_keystroke(); - } - } - - self.perform_injection().await?; - } - Ok(()) - } - - /// Force injection of current buffer (for manual triggers) - pub async fn force_inject(&mut self) -> anyhow::Result<()> { - if self.session.has_content() { - let use_paste = self.determine_use_paste(); - - // Record the operation type - if let Ok(mut metrics) = self.injection_metrics.lock() { - if use_paste { - metrics.record_paste(); - } else { - metrics.record_keystroke(); - } - } - - self.session.force_inject(); - self.perform_injection().await?; - } - Ok(()) - } - - /// Clear current session buffer - pub fn clear_session(&mut self) { - self.session.clear(); - self.update_metrics(); - info!("Session cleared manually"); - } - - /// Perform the actual text injection - async fn perform_injection(&mut self) -> anyhow::Result<()> { - let text = self.session.take_buffer(); - if text.is_empty() { - return Ok(()); - } - - // Record the time from final transcription to injection - let latency = self.session.time_since_last_transcription() - .map(|d| d.as_millis() as u64) - .unwrap_or(0); - - info!("Injecting {} characters from session (latency: {}ms)", text.len(), latency); - - // Record the latency in metrics - if let Ok(mut metrics) = self.injection_metrics.lock() { - metrics.record_latency_from_final(latency); - metrics.update_last_injection(); - } - - match self.injector.inject(&text).await { - Ok(()) => { - let mut metrics = self.metrics.lock().unwrap(); - info!("Successfully injected text"); - metrics.successful_injections += 1; - metrics.last_injection_time = Some(Instant::now()); - } - Err(e) => { - error!("Failed to inject text: {}", e); - self.metrics.lock().unwrap().failed_injections += 1; // Single-use lock is fine here - return Err(e.into()); - } - } - - self.update_metrics(); - Ok(()) - } - - /// Update internal metrics from session state - fn update_metrics(&self) { - let mut metrics = self.metrics.lock().unwrap(); - metrics.update_from_session(&self.session); - } - - /// Get current session state - pub fn session_state(&self) -> SessionState { - self.session.state() - } - - /// Get buffer content preview (for debugging/UI) - pub fn buffer_preview(&self) -> String { - let text = self.session.buffer_preview(); - let preview = if text.len() > 100 { - format!("{}...", &text[..100]) - } else { - text - }; - debug!("Buffer preview: {}", preview); - preview - } - - /// Get the last partial transcription text (for real-time feedback) - pub fn last_partial_text(&self) -> Option { - None - } - - /// Determine if paste or keystroke injection should be used. - fn determine_use_paste(&self) -> bool { - match self.config.injection_mode.as_str() { - "paste" => true, - "keystroke" => false, - "auto" => { - self.session.buffer_preview().len() > self.config.paste_chunk_chars as usize - } - _ => { - self.session.buffer_preview().len() > self.config.paste_chunk_chars as usize - } - } - } -} - -/// Async wrapper for the injection processor that runs in a dedicated task -pub struct AsyncInjectionProcessor { - processor: Arc>, - transcription_rx: mpsc::Receiver, - shutdown_rx: mpsc::Receiver<()>, - // dedicated injector to avoid awaiting while holding the processor lock - injector: StrategyManager, -} - -impl AsyncInjectionProcessor { - /// Create a new async injection processor - pub fn new( - config: InjectionConfig, - transcription_rx: mpsc::Receiver, - shutdown_rx: mpsc::Receiver<()>, - pipeline_metrics: Option>, - ) -> Self { - // Create shared injection metrics - let injection_metrics = Arc::new(Mutex::new(crate::text_injection::types::InjectionMetrics::default())); - - // Create processor with shared metrics - let processor = Arc::new(TokioMutex::new(InjectionProcessor::new(config.clone(), pipeline_metrics, injection_metrics.clone()))); - - // Create injector with shared metrics - let injector = StrategyManager::new(config, injection_metrics.clone()); - - Self { - processor, - transcription_rx, - shutdown_rx, - injector, - } - } - - /// Run the injection processor loop - pub async fn run(mut self) -> anyhow::Result<()> { - let check_interval = Duration::from_millis(100); // TODO: Make configurable - let mut interval = time::interval(check_interval); - - info!("Injection processor started"); - - loop { - tokio::select! { - // Handle transcription events - Some(event) = self.transcription_rx.recv() => { - let mut processor = self.processor.lock().await; - processor.handle_transcription(event); - } - - // Periodic check for silence timeout - _ = interval.tick() => { - // Prepare any pending injection without holding the lock across await - let maybe_text = { - let mut processor = self.processor.lock().await; - // Extract text to inject if session criteria are met - processor.prepare_injection() - }; - - if let Some(text) = maybe_text { - // Perform the async injection outside the lock - let result = self.injector.inject(&text).await; - let success = result.is_ok(); - - // Record result back into the processor state/metrics - let mut processor = self.processor.lock().await; - processor.record_injection_result(success); - if let Err(e) = result { - error!("Injection failed: {}", e); - } - } - } - - // Shutdown signal - _ = self.shutdown_rx.recv() => { - info!("Received shutdown signal, graceful exit initiated"); - break; - } - } - } - - Ok(()) - } - - /// Get current metrics - pub async fn metrics(&self) -> ProcessorMetrics { - self.processor.lock().await.metrics() - } - - /// Force injection (for manual triggers) - pub async fn force_inject(&self) -> anyhow::Result<()> { - self.processor.lock().await.force_inject().await - } - - /// Clear session (for cancellation) - pub async fn clear_session(&self) { - self.processor.lock().await.clear_session(); - } - - /// Get the last partial transcription text (for real-time feedback) - pub async fn last_partial_text(&self) -> Option { - self.processor.lock().await.last_partial_text() - } -} - -#[cfg(test)] -mod tests { - use super::*; - use std::thread; - use std::time::Duration; - - #[test] - fn test_injection_processor_basic_flow() { - let config = InjectionConfig::default(); - - let injection_metrics = Arc::new(Mutex::new(crate::text_injection::types::InjectionMetrics::default())); - let mut processor = InjectionProcessor::new(config, None, injection_metrics); - - // Start with idle state - assert_eq!(processor.session_state(), SessionState::Idle); - - // Add a transcription - processor.handle_transcription(TranscriptionEvent::Final { - utterance_id: 1, - text: "Hello world".to_string(), - words: None, - }); - - assert_eq!(processor.session_state(), SessionState::Buffering); - - // Wait for silence timeout - thread::sleep(Duration::from_millis(300)); - - // Check for silence transition (this would normally be called periodically) - processor.session.check_for_silence_transition(); - - // Should be in WaitingForSilence state now - assert_eq!(processor.session_state(), SessionState::WaitingForSilence); - - // This should trigger injection check - let should_inject = processor.session.should_inject(); - assert!(should_inject, "Session should be ready to inject"); - - // Instead of actually injecting (which requires ydotool), - // we'll manually clear the buffer to simulate successful injection - let buffer_content = processor.session.take_buffer(); - assert_eq!(buffer_content, "Hello world"); - - // Should be back to idle after taking the buffer - assert_eq!(processor.session_state(), SessionState::Idle); - } - - #[test] - fn test_metrics_update() { - let config = InjectionConfig::default(); - let injection_metrics = Arc::new(Mutex::new(crate::text_injection::types::InjectionMetrics::default())); - let mut processor = InjectionProcessor::new(config, None, injection_metrics); - - // Add transcription - processor.handle_transcription(TranscriptionEvent::Final { - utterance_id: 1, - text: "Test transcription".to_string(), - words: None, - }); - - let metrics = processor.metrics(); - assert_eq!(metrics.session_state, SessionState::Buffering); - assert_eq!(metrics.buffer_size, 1); - assert!(metrics.buffer_chars > 0); - } - - #[test] - fn test_partial_transcription_handling() { - let config = InjectionConfig::default(); - let injection_metrics = Arc::new(Mutex::new(crate::text_injection::types::InjectionMetrics::default())); - let mut processor = InjectionProcessor::new(config, None, injection_metrics); - - // Start with idle state - assert_eq!(processor.session_state(), SessionState::Idle); - - // Handle partial transcription - processor.handle_transcription(TranscriptionEvent::Partial { - utterance_id: 1, - text: "Hello".to_string(), - t0: None, - t1: None, - }); - - // Should still be idle since partial events don't change session state - assert_eq!(processor.session_state(), SessionState::Idle); - - // Handle final transcription - processor.handle_transcription(TranscriptionEvent::Final { - utterance_id: 1, - text: "Hello world".to_string(), - words: None, - }); - - // Now should be buffering - assert_eq!(processor.session_state(), SessionState::Buffering); - assert_eq!(processor.session.buffer_len(), 1); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/session.rs b/crates/app/src/text_injection/session.rs deleted file mode 100644 index f706ca13..00000000 --- a/crates/app/src/text_injection/session.rs +++ /dev/null @@ -1,412 +0,0 @@ -use std::time::{Duration, Instant}; -use tracing::{debug, info, warn}; -use crate::text_injection::types::InjectionMetrics; - -/// Session state machine for buffered text injection -#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] -pub enum SessionState { - /// No active session, waiting for first transcription - #[default] - Idle, - /// Actively receiving transcriptions, buffering them - Buffering, - /// No new transcriptions received, waiting for silence timeout - WaitingForSilence, - /// Silence timeout reached, ready to inject buffered text - ReadyToInject, -} - -impl std::fmt::Display for SessionState { - fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - match self { - SessionState::Idle => write!(f, "IDLE"), - SessionState::Buffering => write!(f, "BUFFERING"), - SessionState::WaitingForSilence => write!(f, "WAITING_FOR_SILENCE"), - SessionState::ReadyToInject => write!(f, "READY_TO_INJECT"), - } - } -} - - - -/// Configuration for session management -#[derive(Debug, Clone)] -pub struct SessionConfig { - /// Silence timeout before triggering injection (default: 1500ms) - pub silence_timeout_ms: u64, - /// Maximum buffer size in characters (default: 5000) - pub max_buffer_size: usize, - /// Separator to join buffered transcriptions (default: " ") - pub join_separator: String, - /// Time to wait before transitioning from Buffering to WaitingForSilence (default: 500ms) - pub buffer_pause_timeout_ms: u64, - /// Whether to flush on punctuation marks - pub flush_on_punctuation: bool, - /// Punctuation marks that trigger flushing - pub punctuation_marks: Vec, - /// Whether to normalize whitespace - pub normalize_whitespace: bool, -} - -impl Default for SessionConfig { - fn default() -> Self { - Self { - silence_timeout_ms: 0, // Immediate injection after STT completes - max_buffer_size: 5000, - join_separator: " ".to_string(), - buffer_pause_timeout_ms: 0, // No pause needed since STT buffers audio - flush_on_punctuation: true, - punctuation_marks: vec!['.', '!', '?', ';'], - normalize_whitespace: true, - } - } -} - -/// Manages a single dictation session with buffering and silence detection -#[derive(Debug)] -pub struct InjectionSession { - /// Current state in the session state machine - state: SessionState, - /// Buffered transcriptions waiting to be injected - buffer: Vec, - /// Timestamp of the last received transcription - last_transcription: Option, - /// Timestamp when we transitioned to Buffering state - buffering_start: Option, - /// Configurable silence timeout duration - silence_timeout: Duration, - /// Time to wait before transitioning from Buffering to WaitingForSilence - buffer_pause_timeout: Duration, - /// Maximum buffer size in characters - max_buffer_size: usize, - /// Separator for joining buffered text - join_separator: String, - /// Whether to flush on punctuation marks - flush_on_punctuation: bool, - /// Punctuation marks that trigger flushing - punctuation_marks: Vec, - /// Whether to normalize whitespace - normalize_whitespace: bool, - /// Reference to injection metrics for telemetry - metrics: std::sync::Arc>, -} - -impl InjectionSession { - /// Create a new session with the given configuration - pub fn new(config: SessionConfig, metrics: std::sync::Arc>) -> Self { - Self { - state: SessionState::Idle, - buffer: Vec::new(), - last_transcription: None, - buffering_start: None, - silence_timeout: Duration::from_millis(config.silence_timeout_ms), - buffer_pause_timeout: Duration::from_millis(config.buffer_pause_timeout_ms), - max_buffer_size: config.max_buffer_size, - join_separator: config.join_separator, - flush_on_punctuation: config.flush_on_punctuation, - punctuation_marks: config.punctuation_marks, - normalize_whitespace: config.normalize_whitespace, - metrics, - } - } - - /// Add a new transcription to the session buffer - pub fn add_transcription(&mut self, text: String) { - // Filter out empty or whitespace-only transcriptions - let text = text.trim(); - if text.is_empty() { - return; - } - - let text = if self.normalize_whitespace { - // Normalize whitespace (collapse multiple spaces, remove leading/trailing) - text.split_whitespace().collect::>().join(" ") - } else { - text.to_string() - }; - - // Record the number of characters being buffered - self.record_buffered_chars(text.len() as u64); - - // Check if text ends with punctuation that should trigger flushing - let ends_with_punctuation = self.flush_on_punctuation && - !text.is_empty() && - self.punctuation_marks.contains(&text.chars().last().unwrap()); - - // Add to buffer - self.buffer.push(text); - self.last_transcription = Some(Instant::now()); - - // Update state based on current state - match self.state { - SessionState::Idle => { - self.state = SessionState::Buffering; - self.buffering_start = Some(Instant::now()); - info!("Session started - first transcription buffered"); - } - SessionState::Buffering => { - debug!("Additional transcription buffered, {} items in session", self.buffer.len()); - } - SessionState::WaitingForSilence => { - // New transcription resets the silence timer and transitions back to Buffering - self.state = SessionState::Buffering; - self.buffering_start = Some(Instant::now()); - debug!("Silence timer reset by new transcription"); - } - SessionState::ReadyToInject => { - // This shouldn't happen in normal flow, but handle gracefully - warn!("Received transcription while ready to inject - resetting session"); - self.state = SessionState::Buffering; - self.buffering_start = Some(Instant::now()); - } - } - - // Check if buffer is too large and force injection - if self.total_chars() > self.max_buffer_size { - self.state = SessionState::ReadyToInject; - warn!("Buffer size limit reached, forcing injection"); - return; - } - - // Check if we should flush due to punctuation - if ends_with_punctuation { - self.state = SessionState::ReadyToInject; - info!("Flushing buffer due to punctuation mark"); - } - } - - /// Check if the session should transition to WaitingForSilence state - /// This should be called periodically to detect when transcription has paused - pub fn check_for_silence_transition(&mut self) { - if self.state == SessionState::Buffering { - if let Some(_buffering_start) = self.buffering_start { - let time_since_last_transcription = self.last_transcription.map(|t| t.elapsed()); - - // If we haven't received a transcription for buffer_pause_timeout, - // transition to WaitingForSilence - if let Some(time_since_last) = time_since_last_transcription { - if time_since_last >= self.buffer_pause_timeout { - self.state = SessionState::WaitingForSilence; - info!("Transitioned to WaitingForSilence state"); - } - } - } - } - } - - /// Check if the session should inject based on silence timeout - pub fn should_inject(&mut self) -> bool { - match self.state { - SessionState::Buffering => { - // Check if we should transition to WaitingForSilence first - self.check_for_silence_transition(); - false // Don't inject while still in Buffering state - } - SessionState::WaitingForSilence => { - if let Some(last_time) = self.last_transcription { - if last_time.elapsed() >= self.silence_timeout { - // Silence timeout reached, transition to ready to inject - self.state = SessionState::ReadyToInject; - info!("Silence timeout reached, ready to inject {} transcriptions", self.buffer.len()); - true - } else { - false - } - } else { - false - } - } - SessionState::ReadyToInject => { - // Check if buffer is empty (could happen if cleared) - if self.buffer.is_empty() { - self.state = SessionState::Idle; - false - } else { - true - } - } - SessionState::Idle => false, - } - } - - /// Take the buffered text and reset the session to idle - pub fn take_buffer(&mut self) -> String { - let text = self.buffer.join(&self.join_separator); - let size = text.len(); - self.buffer.clear(); - self.last_transcription = None; - self.buffering_start = None; - self.state = SessionState::Idle; - debug!("Session buffer cleared, {} chars taken", text.len()); - - // Record the flush event with the size - self.record_flush(size as u64); - text - } - - /// Get current session state - pub fn state(&self) -> SessionState { - self.state - } - - /// Get number of buffered transcriptions - pub fn buffer_len(&self) -> usize { - self.buffer.len() - } - - /// Get total character count in buffer - pub fn total_chars(&self) -> usize { - self.buffer.iter().map(|s| s.len()).sum::() - + (self.buffer.len().saturating_sub(1) * self.join_separator.len()) - } - - /// Get time since last transcription (None if no transcriptions) - pub fn time_since_last_transcription(&self) -> Option { - self.last_transcription.map(|t| t.elapsed()) - } - - /// Check if session has any buffered content - pub fn has_content(&self) -> bool { - !self.buffer.is_empty() - } - - /// Force the session into ready-to-inject state (for manual triggers) - pub fn force_inject(&mut self) { - if self.has_content() { - self.state = SessionState::ReadyToInject; - info!("Session forced to inject state"); - } - } - - /// Clear the session buffer and reset to idle (for cancellation) - pub fn clear(&mut self) { - self.buffer.clear(); - self.last_transcription = None; - self.buffering_start = None; - self.state = SessionState::Idle; - info!("Session cleared and reset to idle"); - } - - /// Get buffer preview without taking the buffer (for debugging/UI) - pub fn buffer_preview(&self) -> String { - self.buffer.join(&self.join_separator) - } - - /// Record characters that have been buffered - pub fn record_buffered_chars(&self, count: u64) { - if let Ok(mut metrics) = self.metrics.lock() { - metrics.record_buffered_chars(count); - } - } - - /// Record a flush event - pub fn record_flush(&self, size: u64) { - if let Ok(mut metrics) = self.metrics.lock() { - metrics.record_flush(size); - } - } -} - -#[cfg(test)] -mod tests { - use super::*; - use std::thread; - - #[test] - fn test_session_state_transitions() { - let config = SessionConfig { - silence_timeout_ms: 100, // Short timeout for testing - buffer_pause_timeout_ms: 50, // Short pause timeout for testing - ..Default::default() - }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); - - // Start with idle state - assert_eq!(session.state(), SessionState::Idle); - assert!(!session.has_content()); - - // Add first transcription - session.add_transcription("Hello".to_string()); - assert_eq!(session.state(), SessionState::Buffering); - assert!(session.has_content()); - assert_eq!(session.buffer_len(), 1); - - // Add second transcription - session.add_transcription("world".to_string()); - assert_eq!(session.state(), SessionState::Buffering); - assert_eq!(session.buffer_len(), 2); - - // Wait for buffer pause timeout (should transition to WaitingForSilence) - thread::sleep(Duration::from_millis(75)); - session.check_for_silence_transition(); - assert_eq!(session.state(), SessionState::WaitingForSilence); - - // Wait for silence timeout (should transition to ReadyToInject) - thread::sleep(Duration::from_millis(75)); - assert!(session.should_inject()); - assert_eq!(session.state(), SessionState::ReadyToInject); - - // Take buffer - let text = session.take_buffer(); - assert_eq!(text, "Hello world"); - assert_eq!(session.state(), SessionState::Idle); - assert!(!session.has_content()); - } - - #[test] - fn test_buffer_size_limit() { - let config = SessionConfig { - max_buffer_size: 10, // Very small limit - ..Default::default() - }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); - - // Add text that exceeds limit - session.add_transcription("This is a long sentence".to_string()); - assert_eq!(session.state(), SessionState::ReadyToInject); - } - - #[test] - fn test_empty_transcription_filtering() { - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(SessionConfig::default(), metrics); - - session.add_transcription("".to_string()); - session.add_transcription(" ".to_string()); - session.add_transcription("Hello".to_string()); - - assert_eq!(session.buffer_len(), 1); - assert_eq!(session.take_buffer(), "Hello"); - } - - #[test] - fn test_silence_detection() { - let config = SessionConfig { - silence_timeout_ms: 200, - buffer_pause_timeout_ms: 50, - ..Default::default() - }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); - - // Add transcription - session.add_transcription("Test".to_string()); - assert_eq!(session.state(), SessionState::Buffering); - - // Wait for buffer pause timeout - thread::sleep(Duration::from_millis(75)); - session.check_for_silence_transition(); - assert_eq!(session.state(), SessionState::WaitingForSilence); - - // Add new transcription - should go back to Buffering - session.add_transcription("Another".to_string()); - assert_eq!(session.state(), SessionState::Buffering); - - // Wait for buffer pause timeout again - thread::sleep(Duration::from_millis(75)); - session.check_for_silence_transition(); - assert_eq!(session.state(), SessionState::WaitingForSilence); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/mod.rs b/crates/app/src/text_injection/tests/mod.rs deleted file mode 100644 index 9a11c8c5..00000000 --- a/crates/app/src/text_injection/tests/mod.rs +++ /dev/null @@ -1,10 +0,0 @@ -#[cfg(test)] -mod test_focus_tracking; -#[cfg(test)] -mod test_permission_checking; -#[cfg(test)] -mod test_adaptive_strategy; -#[cfg(test)] -mod test_window_manager; -#[cfg(test)] -mod test_integration; \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/test_adaptive_strategy.rs b/crates/app/src/text_injection/tests/test_adaptive_strategy.rs deleted file mode 100644 index 0fd4889d..00000000 --- a/crates/app/src/text_injection/tests/test_adaptive_strategy.rs +++ /dev/null @@ -1,73 +0,0 @@ -#[cfg(test)] -mod tests { - use crate::text_injection::manager::StrategyManager; - use crate::text_injection::types::{InjectionConfig, InjectionMethod, InjectionMetrics}; - use std::sync::{Arc, Mutex}; - - #[test] - fn test_success_rate_calculation() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // Simulate some successes and failures - manager.update_success_record("test_app", InjectionMethod::Clipboard, true); - manager.update_success_record("test_app", InjectionMethod::Clipboard, true); - manager.update_success_record("test_app", InjectionMethod::Clipboard, false); - - // Success rate should be approximately 66% - let methods = manager.get_method_priority("test_app"); - assert!(!methods.is_empty()); - } - - #[test] - fn test_cooldown_application() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // Apply cooldown - manager.apply_cooldown("test_app", InjectionMethod::YdoToolPaste, "Test error"); - - // Method should be in cooldown - let _ = manager.is_in_cooldown(InjectionMethod::YdoToolPaste); - } - - #[test] - fn test_method_priority_ordering() { - let mut config = InjectionConfig::default(); - config.allow_ydotool = true; - config.allow_enigo = false; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - let methods = manager.get_method_priority("test_app"); - - // Should have some methods available - assert!(!methods.is_empty()); - - // AT-SPI should be preferred if available - #[cfg(feature = "text-injection-atspi")] - assert_eq!(methods[0], InjectionMethod::AtspiInsert); - } - - #[test] - fn test_success_rate_decay() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - // Add initial success - manager.update_success_record("test_app", InjectionMethod::Clipboard, true); - - // Add multiple updates to trigger decay - for _ in 0..5 { - manager.update_success_record("test_app", InjectionMethod::Clipboard, true); - } - - // Success rate should still be high despite decay - let methods = manager.get_method_priority("test_app"); - assert!(!methods.is_empty()); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/test_caching_and_chunking.rs b/crates/app/src/text_injection/tests/test_caching_and_chunking.rs deleted file mode 100644 index cbf363b9..00000000 --- a/crates/app/src/text_injection/tests/test_caching_and_chunking.rs +++ /dev/null @@ -1,64 +0,0 @@ -use crate::text_injection::manager::StrategyManager; -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, TextInjector, InjectionMetrics}; -use std::sync::{Arc, Mutex}; - -struct DummyInjector { metrics: InjectionMetrics } -impl DummyInjector { fn new() -> Self { Self { metrics: InjectionMetrics::default() } } } -impl TextInjector for DummyInjector { - fn name(&self) -> &'static str { "Dummy" } - fn is_available(&self) -> bool { true } - fn inject(&mut self, _text: &str) -> Result<(), InjectionError> { Ok(()) } - fn paste(&mut self, _text: &str) -> Result<(), InjectionError> { Ok(()) } - fn type_text(&mut self, _text: &str, _rate: u32) -> Result<(), InjectionError> { Ok(()) } - fn metrics(&self) -> &InjectionMetrics { &self.metrics } -} - -#[test] -fn regex_caching_allow_block() { - let mut config = InjectionConfig::default(); - config.allowlist = vec!["^Code$".into()]; - config.blocklist = vec!["^Forbidden$".into()]; - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - #[cfg(feature = "text-injection-regex")] - { - assert!(manager.is_app_allowed("Code")); - assert!(!manager.is_app_allowed("Forbidden")); - assert!(!manager.is_app_allowed("Other")); // blocked by allowlist - } - #[cfg(not(feature = "text-injection-regex"))] - { - assert!(manager.is_app_allowed("SomeCodeWindow")); - } -} - -#[test] -fn method_order_caches_per_app() { - let config = InjectionConfig::default(); - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - let order1 = manager.get_method_order("appA"); - let order2 = manager.get_method_order("appA"); - assert_eq!(order1, order2); - let order3 = manager.get_method_order("appB"); - // Different app may have different cached key; at least call should not panic - assert!(!order3.is_empty()); -} - -#[test] -fn unicode_chunk_boundaries() { - let mut config = InjectionConfig::default(); - config.paste_chunk_chars = 3; - config.chunk_delay_ms = 0; - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let mut manager = StrategyManager::new(config, metrics); - - let mut inj: Box = Box::new(DummyInjector::new()); - let text = "🙂🙂🙂🙂"; // 4 emojis, multi-byte - // Access private function via same module tests would be nicer; here we mimic by calling paste directly in a loop - // Ensure slicing at char boundaries works by manual iteration - let mut count = 0; - for ch in text.chars() { let s = ch.to_string(); assert!(inj.paste(&s).is_ok()); count += 1; } - assert_eq!(count, 4); -} diff --git a/crates/app/src/text_injection/tests/test_focus_tracking.rs b/crates/app/src/text_injection/tests/test_focus_tracking.rs deleted file mode 100644 index 33c514e9..00000000 --- a/crates/app/src/text_injection/tests/test_focus_tracking.rs +++ /dev/null @@ -1,48 +0,0 @@ -#[cfg(test)] -mod tests { - use crate::text_injection::focus::{FocusTracker, FocusStatus}; - use crate::text_injection::types::InjectionConfig; - use std::time::Duration; - use tokio::time::sleep; - - #[tokio::test] - async fn test_focus_detection() { - let config = InjectionConfig::default(); - let mut tracker = FocusTracker::new(config); - - // Test focus detection - let status = tracker.get_focus_status().await; - assert!(status.is_ok()); - - // Test caching - let cached = tracker.cached_focus_status(); - assert!(cached.is_some()); - } - - #[tokio::test] - async fn test_focus_cache_expiry() { - let mut config = InjectionConfig::default(); - config.focus_cache_duration_ms = 50; // Very short cache - let mut tracker = FocusTracker::new(config); - - // Get initial status - let _status1 = tracker.get_focus_status().await.unwrap(); - assert!(tracker.cached_focus_status().is_some()); - - // Wait for cache to expire - sleep(Duration::from_millis(60)).await; - - // This should trigger a new check - let _status2 = tracker.get_focus_status().await.unwrap(); - - // Cache should be refreshed - assert!(tracker.cached_focus_status().is_some()); - } - - #[test] - fn test_focus_status_equality() { - assert_eq!(FocusStatus::EditableText, FocusStatus::EditableText); - assert_ne!(FocusStatus::EditableText, FocusStatus::NonEditable); - assert_ne!(FocusStatus::Unknown, FocusStatus::EditableText); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/test_integration.rs b/crates/app/src/text_injection/tests/test_integration.rs deleted file mode 100644 index dcdfd69e..00000000 --- a/crates/app/src/text_injection/tests/test_integration.rs +++ /dev/null @@ -1,80 +0,0 @@ -#[cfg(all(test, feature = "text-injection"))] -mod integration_tests { - use crate::text_injection::manager::StrategyManager; - use crate::text_injection::types::{InjectionConfig, InjectionMetrics}; - use std::sync::{Arc, Mutex}; - - #[tokio::test] - async fn test_full_injection_flow() { - let mut config = InjectionConfig::default(); - config.allow_ydotool = false; // Disable external dependencies for testing - config.restore_clipboard = true; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics.clone()); - - // Test getting current app ID - let app_id = manager.get_current_app_id().await; - assert!(app_id.is_ok()); - let app_id = app_id.unwrap(); - println!("Current app ID: {}", app_id); - - // Test method priority - let methods = manager.get_method_priority(&app_id); - assert!(!methods.is_empty(), "Should have at least one injection method available"); - println!("Available methods: {:?}", methods); - - // Check metrics - let metrics_guard = metrics.lock().unwrap(); - println!("Initial metrics: attempts={}, successes={}", - metrics_guard.attempts, metrics_guard.successes); - } - - #[tokio::test] - async fn test_app_allowlist_blocklist() { - let mut config = InjectionConfig::default(); - config.allowlist = vec!["firefox".to_string(), "chrome".to_string()]; - config.blocklist = vec!["terminal".to_string()]; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - // Test allowlist - assert!(manager.is_app_allowed("firefox")); - assert!(manager.is_app_allowed("chrome")); - assert!(!manager.is_app_allowed("notepad")); - - // Clear allowlist and test blocklist - let mut config = InjectionConfig::default(); - config.blocklist = vec!["terminal".to_string(), "console".to_string()]; - - let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - assert!(!manager.is_app_allowed("terminal")); - assert!(!manager.is_app_allowed("console")); - assert!(manager.is_app_allowed("firefox")); - } - - #[test] - fn test_configuration_defaults() { - let config = InjectionConfig::default(); - - // Check default values - assert!(!config.allow_ydotool); - assert!(!config.allow_kdotool); - assert!(!config.allow_enigo); - assert!(!config.allow_mki); - assert!(!config.restore_clipboard); - assert!(config.inject_on_unknown_focus); - assert!(config.enable_window_detection); - - assert_eq!(config.focus_cache_duration_ms, 200); - assert_eq!(config.min_success_rate, 0.3); - assert_eq!(config.min_sample_size, 5); - assert_eq!(config.clipboard_restore_delay_ms, Some(500)); - - assert!(config.allowlist.is_empty()); - assert!(config.blocklist.is_empty()); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/test_noop.rs b/crates/app/src/text_injection/tests/test_noop.rs deleted file mode 100644 index 87012d9d..00000000 --- a/crates/app/src/text_injection/tests/test_noop.rs +++ /dev/null @@ -1,10 +0,0 @@ -use crate::text_injection::types::*; -use crate::text_injection::noop_injector::NoOpInjector; - -#[test] -fn noop_always_available_and_succeeds() { - let config = InjectionConfig::default(); - let mut injector = NoOpInjector::new(config); - assert!(injector.is_available()); - assert!(injector.inject("hello").is_ok()); -} diff --git a/crates/app/src/text_injection/tests/test_permission_checking.rs b/crates/app/src/text_injection/tests/test_permission_checking.rs deleted file mode 100644 index 2c0b52fd..00000000 --- a/crates/app/src/text_injection/tests/test_permission_checking.rs +++ /dev/null @@ -1,47 +0,0 @@ -#[cfg(test)] -mod tests { - #[cfg(feature = "text-injection-ydotool")] - use crate::text_injection::ydotool_injector::YdotoolInjector; - use std::process::Command; - - #[test] - fn test_binary_existence_check() { - // Test with a binary that should exist - let output = Command::new("which") - .arg("ls") - .output(); - - assert!(output.is_ok()); - assert!(output.unwrap().status.success()); - - // Test with a binary that shouldn't exist - let output = Command::new("which") - .arg("nonexistent_binary_xyz123") - .output(); - - assert!(output.is_ok()); - assert!(!output.unwrap().status.success()); - } - - #[cfg(feature = "text-injection-ydotool")] - #[test] - fn test_ydotool_availability() { - let config = InjectionConfig::default(); - let injector = YdotoolInjector::new(config); - let _available = injector.is_available(); - } - - #[test] - fn test_permission_mode_check() { - use std::os::unix::fs::PermissionsExt; - - // Check /usr/bin/ls or similar common executable - if let Ok(metadata) = std::fs::metadata("/usr/bin/ls") { - let permissions = metadata.permissions(); - let mode = permissions.mode(); - - // Should have at least execute permission for owner - assert!(mode & 0o100 != 0); - } - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/tests/test_window_manager.rs b/crates/app/src/text_injection/tests/test_window_manager.rs deleted file mode 100644 index 684f3464..00000000 --- a/crates/app/src/text_injection/tests/test_window_manager.rs +++ /dev/null @@ -1,60 +0,0 @@ -#[cfg(test)] -mod tests { - use crate::text_injection::window_manager::{get_active_window_class, get_window_info}; - - #[tokio::test] - async fn test_window_class_detection() { - // This test will only work in a graphical environment - if std::env::var("DISPLAY").is_ok() || std::env::var("WAYLAND_DISPLAY").is_ok() { - let result = get_active_window_class().await; - - // We can't assert specific values since it depends on the environment - // but we can check that it doesn't panic - match result { - Ok(class) => { - println!("Detected window class: {}", class); - assert!(!class.is_empty()); - } - Err(e) => { - println!("Window detection failed (expected in CI): {}", e); - } - } - } - } - - #[tokio::test] - async fn test_window_info_structure() { - let info = get_window_info().await; - - // Basic sanity checks - assert!(!info.class.is_empty()); - // Title might be empty - // PID might be 0 if detection failed - } - - #[test] - fn test_x11_detection() { - // Check if X11 is available - let x11_available = std::env::var("DISPLAY").is_ok(); - - if x11_available { - // Try to run xprop - let output = std::process::Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) - .output(); - - // Should at least not panic - assert!(output.is_ok() || output.is_err()); - } - } - - #[test] - fn test_wayland_detection() { - // Check if Wayland is available - let wayland_available = std::env::var("WAYLAND_DISPLAY").is_ok(); - - if wayland_available { - println!("Wayland display detected: {:?}", std::env::var("WAYLAND_DISPLAY")); - } - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/types.rs b/crates/app/src/text_injection/types.rs deleted file mode 100644 index f09f2d39..00000000 --- a/crates/app/src/text_injection/types.rs +++ /dev/null @@ -1,498 +0,0 @@ -use serde::{Deserialize, Serialize}; -use std::time::Duration; -use async_trait::async_trait; - -/// Enumeration of all available text injection methods -#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)] -pub enum InjectionMethod { - /// Insert text directly using AT-SPI2 EditableText interface - AtspiInsert, - /// Set the Wayland clipboard with text - Clipboard, - /// Set clipboard then trigger paste via AT-SPI2 Action interface - ClipboardAndPaste, - /// Use ydotool to simulate Ctrl+V paste (opt-in) - YdoToolPaste, - /// Use kdotool for window activation/focus assistance (opt-in) - KdoToolAssist, - /// Use enigo library for synthetic text/paste (opt-in) - EnigoText, - /// Use mouse-keyboard-input for synthetic key events (opt-in, last resort) - UinputKeys, - /// No-op fallback injector (always succeeds, does nothing) - NoOp, -} - -/// Configuration for text injection system -/// Configuration for text injection system -#[derive(Debug, Clone, Serialize, Deserialize)] -pub struct InjectionConfig { - /// Whether to allow ydotool usage (requires external binary and uinput permissions) - #[serde(default = "default_false")] - pub allow_ydotool: bool, - /// Whether to allow kdotool usage (external CLI for KDE window activation) - #[serde(default = "default_false")] - pub allow_kdotool: bool, - /// Whether to allow enigo library usage (Wayland/libei paths) - #[serde(default = "default_false")] - pub allow_enigo: bool, - /// Whether to allow mouse-keyboard-input usage (uinput) - #[serde(default = "default_false")] - pub allow_mki: bool, - /// Whether to restore the clipboard content after injection - #[serde(default = "default_false")] - pub restore_clipboard: bool, - /// Whether to allow injection when focus state is unknown - #[serde(default = "default_inject_on_unknown_focus")] - pub inject_on_unknown_focus: bool, - - /// Whether to require editable focus for injection - #[serde(default = "default_require_focus")] - pub require_focus: bool, - - /// Hotkey to pause/resume injection (e.g., "Ctrl+Alt+P") - #[serde(default = "default_pause_hotkey")] - pub pause_hotkey: Option, - - /// Whether to redact text content in logs - #[serde(default = "default_redact_logs")] - pub redact_logs: bool, - - /// Overall latency budget for a single injection call, across all fallbacks. - #[serde(default = "default_max_total_latency_ms")] - pub max_total_latency_ms: u64, - - /// Timeout for individual injection method attempts (e.g., AT-SPI call, clipboard set). - #[serde(default = "default_per_method_timeout_ms")] - pub per_method_timeout_ms: u64, - /// Timeout specifically for a paste action (e.g., waiting for AT-SPI paste to complete). - #[serde(default = "default_paste_action_timeout_ms")] - pub paste_action_timeout_ms: u64, - - /// Initial cooldown period after a method fails for a specific application. - #[serde(default = "default_cooldown_initial_ms")] - pub cooldown_initial_ms: u64, - /// Backoff factor to apply to the cooldown after consecutive failures. - #[serde(default = "default_cooldown_backoff_factor")] - pub cooldown_backoff_factor: f32, - /// Maximum cooldown period to prevent excessively long waits. - #[serde(default = "default_cooldown_max_ms")] - pub cooldown_max_ms: u64, - - /// Mode for text injection: "keystroke", "paste", or "auto" - #[serde(default = "default_injection_mode")] - pub injection_mode: String, - /// Keystroke rate in characters per second (cps) - #[serde(default = "default_keystroke_rate_cps")] - pub keystroke_rate_cps: u32, - /// Maximum number of characters to send in a single burst - #[serde(default = "default_max_burst_chars")] - pub max_burst_chars: u32, - /// Number of characters to chunk paste operations into - #[serde(default = "default_paste_chunk_chars")] - pub paste_chunk_chars: u32, - /// Delay between paste chunks in milliseconds - #[serde(default = "default_chunk_delay_ms")] - pub chunk_delay_ms: u64, - - /// Cache duration for focus status (ms) - #[serde(default = "default_focus_cache_duration_ms")] - pub focus_cache_duration_ms: u64, - - /// Minimum success rate before trying fallback methods - #[serde(default = "default_min_success_rate")] - pub min_success_rate: f64, - - /// Number of samples before trusting success rate - #[serde(default = "default_min_sample_size")] - pub min_sample_size: u32, - - /// Enable window manager integration - #[serde(default = "default_true")] - pub enable_window_detection: bool, - - /// Delay before restoring clipboard (ms) - #[serde(default = "default_clipboard_restore_delay_ms")] - pub clipboard_restore_delay_ms: Option, - - /// Allowlist of application patterns (regex) for injection - #[serde(default)] - pub allowlist: Vec, - - /// Blocklist of application patterns (regex) to block injection - #[serde(default)] - pub blocklist: Vec, -} - -fn default_false() -> bool { - false -} - -fn default_inject_on_unknown_focus() -> bool { - true // Default to true to avoid blocking on Wayland without AT-SPI -} - -fn default_require_focus() -> bool { - false -} - -fn default_pause_hotkey() -> Option { - None -} - -fn default_redact_logs() -> bool { - true // Privacy-first by default -} - -fn default_allowlist() -> Vec { - vec![] -} - -fn default_blocklist() -> Vec { - vec![] -} - -fn default_injection_mode() -> String { - "auto".to_string() -} - -fn default_keystroke_rate_cps() -> u32 { - 20 // 20 characters per second (human typing speed) -} - -fn default_max_burst_chars() -> u32 { - 50 // Maximum 50 characters in a single burst -} - -fn default_paste_chunk_chars() -> u32 { - 500 // Chunk paste operations into 500 character chunks -} - -fn default_chunk_delay_ms() -> u64 { 30 } - -fn default_focus_cache_duration_ms() -> u64 { - 200 // Cache focus status for 200ms -} - -fn default_min_success_rate() -> f64 { - 0.3 // 30% minimum success rate before considering fallback -} - -fn default_min_sample_size() -> u32 { - 5 // Need at least 5 samples before trusting success rate -} - -fn default_true() -> bool { - true -} - -fn default_clipboard_restore_delay_ms() -> Option { - Some(500) // Wait 500ms before restoring clipboard -} - -fn default_max_total_latency_ms() -> u64 { - 800 -} - -fn default_per_method_timeout_ms() -> u64 { - 250 -} - -fn default_paste_action_timeout_ms() -> u64 { - 200 -} - -fn default_cooldown_initial_ms() -> u64 { - 10000 // 10 seconds -} - -fn default_cooldown_backoff_factor() -> f32 { - 2.0 -} - -fn default_cooldown_max_ms() -> u64 { - 300_000 // 5 minutes -} - -impl Default for InjectionConfig { - fn default() -> Self { - Self { - allow_ydotool: default_false(), - allow_kdotool: default_false(), - allow_enigo: default_false(), - allow_mki: default_false(), - restore_clipboard: default_false(), - inject_on_unknown_focus: default_inject_on_unknown_focus(), - require_focus: default_require_focus(), - pause_hotkey: default_pause_hotkey(), - redact_logs: default_redact_logs(), - max_total_latency_ms: default_max_total_latency_ms(), - per_method_timeout_ms: default_per_method_timeout_ms(), - paste_action_timeout_ms: default_paste_action_timeout_ms(), - cooldown_initial_ms: default_cooldown_initial_ms(), - cooldown_backoff_factor: default_cooldown_backoff_factor(), - cooldown_max_ms: default_cooldown_max_ms(), - injection_mode: default_injection_mode(), - keystroke_rate_cps: default_keystroke_rate_cps(), - max_burst_chars: default_max_burst_chars(), - paste_chunk_chars: default_paste_chunk_chars(), - chunk_delay_ms: default_chunk_delay_ms(), - focus_cache_duration_ms: default_focus_cache_duration_ms(), - min_success_rate: default_min_success_rate(), - min_sample_size: default_min_sample_size(), - enable_window_detection: default_true(), - clipboard_restore_delay_ms: default_clipboard_restore_delay_ms(), - allowlist: default_allowlist(), - blocklist: default_blocklist(), - } - } -} - -impl InjectionConfig { - pub fn max_total_latency(&self) -> Duration { - Duration::from_millis(self.max_total_latency_ms) - } - - pub fn per_method_timeout(&self) -> Duration { - Duration::from_millis(self.per_method_timeout_ms) - } - - pub fn paste_action_timeout(&self) -> Duration { - Duration::from_millis(self.paste_action_timeout_ms) - } -} - -/// Result type for injection operations -pub type InjectionResult = Result; - -/// Errors that can occur during text injection -#[derive(Debug, thiserror::Error)] -pub enum InjectionError { - #[error("No editable focus found")] - NoEditableFocus, - - #[error("Method not available: {0}")] - MethodNotAvailable(String), - - #[error("Timeout after {0}ms")] - Timeout(u64), - - #[error("All methods failed: {0}")] - AllMethodsFailed(String), - - #[error("Method unavailable: {0}")] - MethodUnavailable(String), - - #[error("Method failed: {0}")] - MethodFailed(String), - - #[error("Budget exhausted")] - BudgetExhausted, - - #[error("Clipboard error: {0}")] - Clipboard(String), - - #[error("Process error: {0}")] - Process(String), - - #[error("Permission denied: {0}")] - PermissionDenied(String), - - #[error("IO error: {0}")] - Io(#[from] std::io::Error), - - #[error("Other error: {0}")] - Other(String), -} - -/// Metrics and telemetry data for injection attempts -#[derive(Debug, Default, Clone)] -pub struct InjectionMetrics { - /// Total number of injection attempts - pub attempts: u64, - /// Number of successful injections - pub successes: u64, - /// Number of failed injections - pub failures: u64, - /// Total time spent in injection attempts - pub total_duration_ms: u64, - /// Average duration of injection attempts - pub avg_duration_ms: f64, - /// Method-specific metrics - pub method_metrics: std::collections::HashMap, - /// Number of characters buffered - pub chars_buffered: u64, - /// Number of characters injected - pub chars_injected: u64, - /// Number of flushes - pub flushes: u64, - /// Number of paste operations - pub paste_uses: u64, - /// Number of keystroke operations - pub keystroke_uses: u64, - /// Number of backend denials - pub backend_denied: u64, - /// Number of focus missing errors - pub focus_missing: u64, - /// Number of rate limited events - pub rate_limited: u64, - /// Histogram of latency from final transcription to injection - pub latency_from_final_ms: Vec, - /// Histogram of flush sizes - pub flush_size_chars: Vec, - /// Timestamp of last injection - pub last_injection: Option, - /// Age of stuck buffer (if any) - pub stuck_buffer_age_ms: u64, -} - -/// Metrics for a specific injection method -#[derive(Debug, Default, Clone)] -pub struct MethodMetrics { - /// Number of attempts using this method - pub attempts: u64, - /// Number of successful attempts - pub successes: u64, - /// Number of failures - pub failures: u64, - /// Total duration of attempts - pub total_duration_ms: u64, - /// Last success timestamp - pub last_success: Option, - /// Last failure timestamp and error message - pub last_failure: Option<(std::time::Instant, String)>, -} - -impl InjectionMetrics { - /// Record a new injection attempt - pub fn record_attempt(&mut self, method: InjectionMethod, duration_ms: u64) { - self.attempts += 1; - self.total_duration_ms += duration_ms; - - // Update method-specific metrics - let method_metrics = self.method_metrics.entry(method).or_default(); - method_metrics.attempts += 1; - method_metrics.total_duration_ms += duration_ms; - } - - /// Record characters that have been buffered - pub fn record_buffered_chars(&mut self, count: u64) { - self.chars_buffered += count; - } - - /// Record characters that have been successfully injected - pub fn record_injected_chars(&mut self, count: u64) { - self.chars_injected += count; - } - - /// Record a flush event - pub fn record_flush(&mut self, size: u64) { - self.flushes += 1; - self.flush_size_chars.push(size); - } - - /// Record a paste operation - pub fn record_paste(&mut self) { - self.paste_uses += 1; - } - - /// Record a keystroke operation - pub fn record_keystroke(&mut self) { - self.keystroke_uses += 1; - } - - /// Record a backend denial - pub fn record_backend_denied(&mut self) { - self.backend_denied += 1; - } - - /// Record a focus missing error - pub fn record_focus_missing(&mut self) { - self.focus_missing += 1; - } - - /// Record a rate limited event - pub fn record_rate_limited(&mut self) { - self.rate_limited += 1; - } - - /// Record latency from final transcription to injection - pub fn record_latency_from_final(&mut self, latency_ms: u64) { - self.latency_from_final_ms.push(latency_ms); - } - - /// Update the last injection timestamp - pub fn update_last_injection(&mut self) { - self.last_injection = Some(std::time::Instant::now()); - } - - /// Update the stuck buffer age - pub fn update_stuck_buffer_age(&mut self, age_ms: u64) { - self.stuck_buffer_age_ms = age_ms; - } - - /// Record a successful injection - pub fn record_success(&mut self, method: InjectionMethod, duration_ms: u64) { - self.successes += 1; - self.record_attempt(method, duration_ms); - - // Update method-specific success - if let Some(metrics) = self.method_metrics.get_mut(&method) { - metrics.successes += 1; - metrics.last_success = Some(std::time::Instant::now()); - } - } - - /// Record a failed injection - pub fn record_failure(&mut self, method: InjectionMethod, duration_ms: u64, error: String) { - self.failures += 1; - self.record_attempt(method, duration_ms); - - // Update method-specific failure - if let Some(metrics) = self.method_metrics.get_mut(&method) { - metrics.failures += 1; - metrics.last_failure = Some((std::time::Instant::now(), error)); - } - } - - /// Calculate average duration - pub fn calculate_avg_duration(&mut self) { - self.avg_duration_ms = if self.attempts > 0 { - self.total_duration_ms as f64 / self.attempts as f64 - } else { - 0.0 - }; - } -} -/// Trait for text injection backends -/// This trait is intentionally synchronous. Implementations needing async -/// operations should use thread::spawn with channels or block_on as appropriate. -/// Rationale: many backends interact with system services where blocking calls -/// are acceptable and simplify cross-backend orchestration without forcing a -/// runtime on callers. -#[async_trait] -pub trait TextInjector: Send + Sync { - /// Name of the injector for logging and metrics - fn name(&self) -> &'static str; - - /// Check if this injector is available for use - fn is_available(&self) -> bool; - - /// Inject text using this method - async fn inject(&mut self, text: &str) -> Result<(), InjectionError>; - - /// Type text with pacing (characters per second) - /// Default implementation falls back to inject() - async fn type_text(&mut self, text: &str, _rate_cps: u32) -> Result<(), InjectionError> { - self.inject(text).await - } - - /// Paste text (may use clipboard or other methods) - /// Default implementation falls back to inject() - async fn paste(&mut self, text: &str) -> Result<(), InjectionError> { - self.inject(text).await - } - - /// Get metrics for this injector - fn metrics(&self) -> &InjectionMetrics; -} \ No newline at end of file diff --git a/crates/app/src/text_injection/window_manager.rs b/crates/app/src/text_injection/window_manager.rs deleted file mode 100644 index cf36b59f..00000000 --- a/crates/app/src/text_injection/window_manager.rs +++ /dev/null @@ -1,257 +0,0 @@ -use crate::text_injection::types::InjectionError; -use std::process::Command; -use tracing::debug; -use serde_json; - -/// Get the currently active window class name -pub async fn get_active_window_class() -> Result { - // Try KDE-specific method first - if let Ok(class) = get_kde_window_class().await { - return Ok(class); - } - - // Try generic X11 method - if let Ok(class) = get_x11_window_class().await { - return Ok(class); - } - - // Try Wayland method - if let Ok(class) = get_wayland_window_class().await { - return Ok(class); - } - - Err(InjectionError::Other("Could not determine active window".to_string())) -} - -async fn get_kde_window_class() -> Result { - // Use KWin DBus interface - let output = Command::new("qdbus") - .args(&[ - "org.kde.KWin", - "/KWin", - "org.kde.KWin.activeClient" - ]) - .output() - .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - - if output.status.success() { - let window_id = String::from_utf8_lossy(&output.stdout).trim().to_string(); - - // Get window class from ID - let class_output = Command::new("qdbus") - .args(&[ - "org.kde.KWin", - &format!("/Windows/{}", window_id), - "org.kde.KWin.Window.resourceClass" - ]) - .output() - .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - - if class_output.status.success() { - return Ok(String::from_utf8_lossy(&class_output.stdout).trim().to_string()); - } - } - - Err(InjectionError::Other("KDE window class not available".to_string())) -} - -async fn get_x11_window_class() -> Result { - // Use xprop to get active window class - let output = Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if output.status.success() { - let window_str = String::from_utf8_lossy(&output.stdout); - if let Some(window_id) = window_str.split("# ").nth(1) { - let window_id = window_id.trim(); - - // Get window class - let class_output = Command::new("xprop") - .args(&["-id", window_id, "WM_CLASS"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if class_output.status.success() { - let class_str = String::from_utf8_lossy(&class_output.stdout); - // Parse WM_CLASS string (format: WM_CLASS(STRING) = "instance", "class") - if let Some(class_part) = class_str.split('"').nth(3) { - return Ok(class_part.to_string()); - } - } - } - } - - Err(InjectionError::Other("X11 window class not available".to_string())) -} - -async fn get_wayland_window_class() -> Result { - // Try using wlr-foreign-toplevel-management protocol if available - // This requires compositor support (e.g., Sway, some KWin versions) - - // For now, we'll try using swaymsg if Sway is running - let output = Command::new("swaymsg") - .args(&["-t", "get_tree"]) - .output() - .map_err(|e| InjectionError::Process(format!("swaymsg failed: {}", e)))?; - - if output.status.success() { - // Parse JSON to find focused window using serde_json - let tree = String::from_utf8_lossy(&output.stdout); - if let Ok(json) = serde_json::from_str::(&tree) { - // Depth-first search for focused node with app_id - fn dfs(node: &serde_json::Value) -> Option { - if node.get("focused").and_then(|v| v.as_bool()).unwrap_or(false) { - if let Some(app_id) = node.get("app_id").and_then(|v| v.as_str()) { - return Some(app_id.to_string()); - } - if let Some(window_props) = node.get("window_properties") { - if let Some(class) = window_props.get("class").and_then(|v| v.as_str()) { - return Some(class.to_string()); - } - } - } - if let Some(nodes) = node.get("nodes").and_then(|v| v.as_array()) { - for n in nodes { - if let Some(found) = dfs(n) { return Some(found); } - } - } - if let Some(floating_nodes) = node.get("floating_nodes").and_then(|v| v.as_array()) { - for n in floating_nodes { - if let Some(found) = dfs(n) { return Some(found); } - } - } - None - } - if let Some(app_id) = dfs(&json) { - return Ok(app_id); - } - } else { - debug!("Failed to parse swaymsg JSON; falling back"); - } - } - - Err(InjectionError::Other("Wayland window class not available".to_string())) -} - -/// Get window information using multiple methods -pub async fn get_window_info() -> WindowInfo { - let class = get_active_window_class().await.unwrap_or_else(|_| "unknown".to_string()); - let title = get_window_title().await.unwrap_or_default(); - let pid = get_window_pid().await.unwrap_or(0); - - WindowInfo { - class, - title, - pid, - } -} - -/// Window information structure -#[derive(Debug, Clone)] -pub struct WindowInfo { - pub class: String, - pub title: String, - pub pid: u32, -} - -/// Get the title of the active window -async fn get_window_title() -> Result { - // Try X11 method - let output = Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if output.status.success() { - let window_str = String::from_utf8_lossy(&output.stdout); - if let Some(window_id) = window_str.split("# ").nth(1) { - let window_id = window_id.trim(); - - // Get window title - let title_output = Command::new("xprop") - .args(&["-id", window_id, "_NET_WM_NAME"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if title_output.status.success() { - let title_str = String::from_utf8_lossy(&title_output.stdout); - // Parse title string - if let Some(title_start) = title_str.find(" = \"") { - let title = &title_str[title_start + 4..]; - if let Some(title_end) = title.find('"') { - return Ok(title[..title_end].to_string()); - } - } - } - } - } - - Err(InjectionError::Other("Could not get window title".to_string())) -} - -/// Get the PID of the active window -async fn get_window_pid() -> Result { - // Try X11 method - let output = Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if output.status.success() { - let window_str = String::from_utf8_lossy(&output.stdout); - if let Some(window_id) = window_str.split("# ").nth(1) { - let window_id = window_id.trim(); - - // Get window PID - let pid_output = Command::new("xprop") - .args(["-id", window_id, "_NET_WM_PID"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if pid_output.status.success() { - let pid_str = String::from_utf8_lossy(&pid_output.stdout); - // Parse PID (format: _NET_WM_PID(CARDINAL) = ) - if let Some(pid_part) = pid_str.split(" = ").nth(1) { - if let Ok(pid) = pid_part.trim().parse::() { - return Ok(pid); - } - } - } - } - } - - Err(InjectionError::Other("Could not get window PID".to_string())) -} - -#[cfg(test)] -mod tests { - use super::*; - - #[tokio::test] - async fn test_window_detection() { - // This test will only work in a graphical environment - if std::env::var("DISPLAY").is_ok() || std::env::var("WAYLAND_DISPLAY").is_ok() { - let result = get_active_window_class().await; - // We can't assert success since it depends on the environment - // but we can check that it doesn't panic - match result { - Ok(class) => { - debug!("Detected window class: {}", class); - assert!(!class.is_empty()); - } - Err(e) => { - debug!("Window detection failed (expected in CI): {}", e); - } - } - } - } - - #[tokio::test] - async fn test_window_info() { - let info = get_window_info().await; - // Basic sanity check - assert!(!info.class.is_empty()); - } -} \ No newline at end of file diff --git a/crates/app/src/text_injection/ydotool_injector.rs b/crates/app/src/text_injection/ydotool_injector.rs deleted file mode 100644 index d1b44a1d..00000000 --- a/crates/app/src/text_injection/ydotool_injector.rs +++ /dev/null @@ -1,206 +0,0 @@ -use crate::text_injection::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use anyhow::Result; -use std::process::{Command, Stdio}; -use std::time::Duration; -use tokio::time::timeout; -use tracing::{debug, error, info, warn}; -use async_trait::async_trait; - -/// Ydotool injector for synthetic key events -pub struct YdotoolInjector { - config: InjectionConfig, - metrics: InjectionMetrics, - /// Whether ydotool is available on the system - is_available: bool, -} - -impl YdotoolInjector { - /// Create a new ydotool injector - pub fn new(config: InjectionConfig) -> Self { - let is_available = Self::check_ydotool(); - - Self { - config, - metrics: InjectionMetrics::default(), - is_available, - } - } - - /// Check if ydotool is available on the system - fn check_ydotool() -> bool { - match Self::check_binary_permissions("ydotool") { - Ok(()) => { - // Check if the ydotool socket exists (most reliable check). - // Use `id -u` as it's more reliable than the $UID env var. - let user_id = Command::new("id") - .arg("-u") - .output() - .ok() - .and_then(|o| String::from_utf8(o.stdout).ok()) - .map(|s| s.trim().to_string()) - .unwrap_or_else(|| "1000".to_string()); - - let socket_path = format!("/run/user/{}/.ydotool_socket", user_id); - if !std::path::Path::new(&socket_path).exists() { - warn!("ydotool socket not found at {}, daemon may not be running", socket_path); - return false; - } - true - } - Err(e) => { - warn!("ydotool not available: {}", e); - false - } - } - } - - /// Check if a binary exists and has proper permissions - fn check_binary_permissions(binary_name: &str) -> Result<(), InjectionError> { - use std::os::unix::fs::PermissionsExt; - - // Check if binary exists in PATH - let output = Command::new("which") - .arg(binary_name) - .output() - .map_err(|e| InjectionError::Process(format!("Failed to locate {}: {}", binary_name, e)))?; - - if !output.status.success() { - return Err(InjectionError::MethodUnavailable( - format!("{} not found in PATH", binary_name) - )); - } - - let binary_path = String::from_utf8_lossy(&output.stdout).trim().to_string(); - - // Check if binary is executable - let metadata = std::fs::metadata(&binary_path) - .map_err(|e| InjectionError::Io(e))?; - - let permissions = metadata.permissions(); - if permissions.mode() & 0o111 == 0 { - return Err(InjectionError::PermissionDenied( - format!("{} is not executable", binary_name) - )); - } - - // For ydotool specifically, check uinput access - if binary_name == "ydotool" { - Self::check_uinput_access()?; - } - - Ok(()) - } - - /// Check if we have access to /dev/uinput (required for ydotool) - fn check_uinput_access() -> Result<(), InjectionError> { - use std::fs::OpenOptions; - - // Check if we can open /dev/uinput - match OpenOptions::new().write(true).open("/dev/uinput") { - Ok(_) => Ok(()), - Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => { - // Check if user is in input group - let groups = Command::new("groups") - .output() - .map_err(|e| InjectionError::Process(format!("Failed to check groups: {}", e)))?; - - let groups_str = String::from_utf8_lossy(&groups.stdout); - if !groups_str.contains("input") { - return Err(InjectionError::PermissionDenied( - "User not in 'input' group. Run: sudo usermod -a -G input $USER".to_string() - )); - } - - Err(InjectionError::PermissionDenied( - "/dev/uinput access denied. ydotool daemon may not be running".to_string() - )) - } - Err(e) => Err(InjectionError::Io(e)) - } - } - - /// Trigger paste action using ydotool (Ctrl+V) - async fn trigger_paste(&self) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Use tokio to run the command with timeout - let output = timeout( - Duration::from_millis(self.config.paste_action_timeout_ms), - tokio::process::Command::new("ydotool") - .args(&["key", "ctrl+v"]) - .output(), - ) - .await - .map_err(|_| InjectionError::Timeout(self.config.paste_action_timeout_ms))? - .map_err(|e| InjectionError::Process(format!("{e}")))?; - - if !output.status.success() { - let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("ydotool key failed: {}", stderr))); - } - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::YdoToolPaste, duration); - info!("Successfully triggered paste action via ydotool"); - - Ok(()) - } - - /// Type text directly using ydotool - async fn type_text(&self, text: &str) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Use tokio to run the command with timeout - let output = timeout( - Duration::from_millis(self.config.per_method_timeout_ms), - tokio::process::Command::new("ydotool") - .args(&["type", "--delay", "10", text]) - .output(), - ) - .await - .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? - .map_err(|e| InjectionError::Process(format!("{e}")))?; - - if !output.status.success() { - let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("ydotool type failed: {}", stderr))); - } - - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(InjectionMethod::YdoToolPaste, duration); - info!("Successfully typed text via ydotool ({} chars)", text.len()); - - Ok(()) - } -} - -#[async_trait] -impl TextInjector for YdotoolInjector { - fn name(&self) -> &'static str { - "Ydotool" - } - - fn is_available(&self) -> bool { - self.is_available && self.config.allow_ydotool - } - - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - // First try paste action (more reliable for batch text) - match self.trigger_paste().await { - Ok(()) => Ok(()), - Err(e) => { - debug!("Paste action failed: {}", e); - // Fall back to direct typing - self.type_text(text).await - } - } - } - - fn metrics(&self) -> &InjectionMetrics { - &self.metrics - } -} \ No newline at end of file diff --git a/crates/app/src/vad/mod.rs b/crates/app/src/vad/mod.rs index 70bc61a6..455d6bea 100644 --- a/crates/app/src/vad/mod.rs +++ b/crates/app/src/vad/mod.rs @@ -5,9 +5,9 @@ pub use coldvox_vad::{ config::{UnifiedVadConfig, VadMode}, - constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ, FRAME_DURATION_MS}, - types::{VadEvent, VadState, VadMetrics}, + constants::{FRAME_DURATION_MS, FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}, engine::VadEngine, + types::{VadEvent, VadMetrics, VadState}, VadProcessor, }; diff --git a/crates/app/tests/chunker_timing_tests.rs b/crates/app/tests/chunker_timing_tests.rs index e0cc07e5..af3596ac 100644 --- a/crates/app/tests/chunker_timing_tests.rs +++ b/crates/app/tests/chunker_timing_tests.rs @@ -1,7 +1,10 @@ -use coldvox_app::audio::{AudioChunker, ChunkerConfig, ResamplerQuality, FrameReader, AudioRingBuffer, AudioFrame as VadFrame}; +use coldvox_app::audio::{ + AudioChunker, AudioFrame as VadFrame, AudioRingBuffer, ChunkerConfig, FrameReader, + ResamplerQuality, +}; use coldvox_app::telemetry::pipeline_metrics::PipelineMetrics; -use tokio::sync::broadcast; use std::sync::Arc; +use tokio::sync::broadcast; mod common; use common::test_utils::feed_samples_to_ring_buffer; @@ -14,7 +17,11 @@ async fn chunker_timestamps_are_32ms_apart_at_16k() { let (mut prod, cons) = ring.split(); let reader = FrameReader::new(cons, 16_000, 1, rb_capacity, Some(metrics.clone())); - let cfg = ChunkerConfig { frame_size_samples: 512, sample_rate_hz: 16_000, resampler_quality: ResamplerQuality::Balanced }; + let cfg = ChunkerConfig { + frame_size_samples: 512, + sample_rate_hz: 16_000, + resampler_quality: ResamplerQuality::Balanced, + }; let (tx, _) = broadcast::channel::(64); let mut rx = tx.subscribe(); let chunker = AudioChunker::new(reader, tx.clone(), cfg).with_metrics(metrics.clone()); @@ -25,12 +32,12 @@ async fn chunker_timestamps_are_32ms_apart_at_16k() { feed_samples_to_ring_buffer(&mut prod, &input, 1024); // Collect a few frames and verify monotonic 32ms timestamps - let mut got = Vec::new(); + let mut timestamps = Vec::new(); let mut attempts = 0; - while got.len() < 5 && attempts < 50 { + while timestamps.len() < 5 && attempts < 50 { if let Ok(frame) = rx.try_recv() { - // Convert Instant to relative ms for comparison - got.push(frame.timestamp.elapsed().as_millis() as u64); + // Store the raw Instant for comparison + timestamps.push(frame.timestamp); } else { tokio::time::sleep(std::time::Duration::from_millis(10)).await; attempts += 1; @@ -38,10 +45,19 @@ async fn chunker_timestamps_are_32ms_apart_at_16k() { } handle.abort(); - assert!(got.len() >= 3, "expected at least 3 frames, got {}", got.len()); - for w in got.windows(2) { - let delta = w[1] - w[0]; - assert!((delta as i64 - 32).abs() <= 5, "timestamp delta ~32ms, got {}", delta); + assert!( + timestamps.len() >= 3, + "expected at least 3 frames, got {}", + timestamps.len() + ); + + // Calculate duration between consecutive timestamps + for w in timestamps.windows(2) { + let delta_ms = w[1].duration_since(w[0]).as_millis() as i64; + assert!( + (delta_ms - 32).abs() <= 5, + "timestamp delta ~32ms, got {}ms", + delta_ms + ); } } - diff --git a/crates/app/tests/common/mod.rs b/crates/app/tests/common/mod.rs index 8c5cffcd..681d26e3 100644 --- a/crates/app/tests/common/mod.rs +++ b/crates/app/tests/common/mod.rs @@ -1 +1 @@ -pub mod test_utils; \ No newline at end of file +pub mod test_utils; diff --git a/crates/app/tests/common/test_utils.rs b/crates/app/tests/common/test_utils.rs index 06ab1ff4..89c3a732 100644 --- a/crates/app/tests/common/test_utils.rs +++ b/crates/app/tests/common/test_utils.rs @@ -7,7 +7,9 @@ pub fn feed_samples_to_ring_buffer( samples: &[i16], chunk_size: usize, ) -> usize { - if chunk_size == 0 { return 0; } + if chunk_size == 0 { + return 0; + } let mut written_total = 0usize; let mut offset = 0usize; while offset < samples.len() { @@ -34,16 +36,27 @@ pub fn calculate_wer(reference: &str, hypothesis: &str) -> f32 { let n = ref_words.len(); let m = hyp_words.len(); - if n == 0 { return if m > 0 { 1.0 } else { 0.0 }; } + if n == 0 { + return if m > 0 { 1.0 } else { 0.0 }; + } // dp[i][j]: min edits to transform first i ref words into first j hyp words let mut dp = vec![vec![0usize; m + 1]; n + 1]; - for i in 0..=n { dp[i][0] = i; } - for j in 0..=m { dp[0][j] = j; } + #[allow(clippy::needless_range_loop)] + for i in 0..=n { + dp[i][0] = i; + } + for j in 0..=m { + dp[0][j] = j; + } for i in 1..=n { for j in 1..=m { - let cost = if ref_words[i - 1] == hyp_words[j - 1] { 0 } else { 1 }; + let cost = if ref_words[i - 1] == hyp_words[j - 1] { + 0 + } else { + 1 + }; let sub = dp[i - 1][j - 1] + cost; let del = dp[i - 1][j] + 1; let ins = dp[i][j - 1] + 1; @@ -64,9 +77,9 @@ mod wer_tests { let w = calculate_wer("hello world", "hello there"); assert!((w - 0.5).abs() < 1e-6); let w = calculate_wer("one two three", "one three"); - assert!((w - (1.0/3.0)).abs() < 1e-6); + assert!((w - (1.0 / 3.0)).abs() < 1e-6); let w = calculate_wer("one two", "one two three"); assert!((w - 0.5).abs() < 1e-6); assert_eq!(calculate_wer("one two", ""), 1.0); } -} \ No newline at end of file +} diff --git a/crates/app/tests/pipeline_integration.rs b/crates/app/tests/pipeline_integration.rs index 97546ca0..150c2b84 100644 --- a/crates/app/tests/pipeline_integration.rs +++ b/crates/app/tests/pipeline_integration.rs @@ -97,4 +97,4 @@ async fn test_chunker_emits_frames_for_known_audio() { // No VAD in this test } -*/ \ No newline at end of file +*/ diff --git a/crates/app/tests/vad_pipeline_tests.rs b/crates/app/tests/vad_pipeline_tests.rs index 99d00d0c..9bd55936 100644 --- a/crates/app/tests/vad_pipeline_tests.rs +++ b/crates/app/tests/vad_pipeline_tests.rs @@ -1,9 +1,14 @@ +#[cfg(feature = "level3")] +use coldvox_app::audio::{AudioFrame as VadFrame, VadProcessor}; +#[cfg(feature = "level3")] use coldvox_app::telemetry::pipeline_metrics::PipelineMetrics; +#[cfg(feature = "level3")] use coldvox_app::vad::{UnifiedVadConfig, VadMode, FRAME_SIZE_SAMPLES}; -use coldvox_app::audio::{AudioFrame as VadFrame, VadProcessor}; +#[cfg(feature = "level3")] use tokio::sync::{broadcast, mpsc}; #[tokio::test] +#[cfg(feature = "level3")] async fn vad_processor_silence_no_events_level3() { // Use Level3 to avoid ONNX model dependency in unit tests let mut cfg = UnifiedVadConfig::default(); @@ -19,8 +24,7 @@ async fn vad_processor_silence_no_events_level3() { // Create metrics for test let metrics = std::sync::Arc::new(PipelineMetrics::default()); - let handle = VadProcessor::spawn(cfg, rx, event_tx, Some(metrics.clone())) - .expect("spawn vad"); + let handle = VadProcessor::spawn(cfg, rx, event_tx, Some(metrics.clone())).expect("spawn vad"); // Send a few frames of silence at 16k/512-sample frames for _ in 0..10u64 { @@ -41,4 +45,4 @@ async fn vad_processor_silence_no_events_level3() { // Ensure no events were produced assert!(event_rx.try_recv().is_err()); -} \ No newline at end of file +} diff --git a/crates/coldvox-audio/src/capture.rs b/crates/coldvox-audio/src/capture.rs index bb1e34ce..960217c1 100644 --- a/crates/coldvox-audio/src/capture.rs +++ b/crates/coldvox-audio/src/capture.rs @@ -1,7 +1,7 @@ use cpal::traits::{DeviceTrait, StreamTrait}; use cpal::{SampleFormat, Stream, StreamConfig}; -use parking_lot::{RwLock, Mutex}; +use parking_lot::{Mutex, RwLock}; use std::sync::atomic::{AtomicBool, AtomicU64, Ordering}; use std::sync::Arc; use std::thread::{self, JoinHandle}; @@ -11,7 +11,7 @@ use super::detector::SilenceDetector; use super::device::DeviceManager; // Test hook output -use super::ring_buffer::{AudioProducer}; +use super::ring_buffer::AudioProducer; use super::watchdog::WatchdogTimer; use coldvox_foundation::{AudioConfig, AudioError}; @@ -46,7 +46,14 @@ impl AudioCaptureThread { config: AudioConfig, audio_producer: AudioProducer, device_name: Option, - ) -> Result<(Self, DeviceConfig, tokio::sync::broadcast::Receiver), AudioError> { + ) -> Result< + ( + Self, + DeviceConfig, + tokio::sync::broadcast::Receiver, + ), + AudioError, + > { let running = Arc::new(AtomicBool::new(false)); let shutdown = running.clone(); let device_config = Arc::new(RwLock::new(None::)); @@ -111,7 +118,7 @@ impl AudioCaptureThread { tracing::error!("All device candidates failed to produce audio; capture not started"); return; }; - + *device_config_clone.write() = Some(dev_cfg); // Monitor for watchdog or error-triggered restarts @@ -162,8 +169,10 @@ impl AudioCaptureThread { } thread::sleep(Duration::from_millis(50)); } - - let cfg = cfg.ok_or_else(|| AudioError::Fatal("Failed to get device configuration within timeout".to_string()))?; + + let cfg = cfg.ok_or_else(|| { + AudioError::Fatal("Failed to get device configuration within timeout".to_string()) + })?; Ok((Self { handle, shutdown }, cfg, config_rx)) } @@ -211,8 +220,11 @@ impl AudioCapture { config_tx: None, }) } - - pub fn with_config_channel(mut self, config_tx: tokio::sync::broadcast::Sender) -> Self { + + pub fn with_config_channel( + mut self, + config_tx: tokio::sync::broadcast::Sender, + ) -> Self { self.config_tx = Some(config_tx); self } @@ -221,9 +233,15 @@ impl AudioCapture { self.running.store(true, Ordering::SeqCst); let device = self.device_manager.open_device(device_name)?; - if let Ok(n) = device.name() { tracing::info!("Selected input device: {} (host: {:?})", n, self.device_manager.host_id()); } + if let Ok(n) = device.name() { + tracing::info!( + "Selected input device: {} (host: {:?})", + n, + self.device_manager.host_id() + ); + } let (config, sample_format) = self.negotiate_config(&device)?; - + let device_config = DeviceConfig { sample_rate: config.sample_rate.0, channels: config.channels, @@ -273,7 +291,7 @@ impl AudioCapture { } else { stats.active_frames.fetch_add(1, Ordering::Relaxed); } - + // Use the shared producer if let Ok(written) = audio_producer.lock().write(i16_data) { if written == i16_data.len() { @@ -290,20 +308,18 @@ impl AudioCapture { // Build the CPAL input stream with proper conversion to i16 // Use thread-local buffers to avoid allocations in the audio callback thread_local! { - static CONVERT_BUFFER: std::cell::RefCell> = std::cell::RefCell::new(Vec::new()); + static CONVERT_BUFFER: std::cell::RefCell> = const { std::cell::RefCell::new(Vec::new()) }; } - + let stream = match sample_format { - SampleFormat::I16 => { - device.build_input_stream( - &config, - move |data: &[i16], _: &_| { - handle_i16(data); - }, - err_fn, - None, - )? - } + SampleFormat::I16 => device.build_input_stream( + &config, + move |data: &[i16], _: &_| { + handle_i16(data); + }, + err_fn, + None, + )?, SampleFormat::F32 => { device.build_input_stream( &config, @@ -376,7 +392,7 @@ impl AudioCapture { converted.reserve(data.len()); for &s in data { let clamped = s.clamp(-1.0, 1.0); - let v = (clamped * 32767.0).round() as i16; // Now uses .round() like F32 + let v = (clamped * 32767.0).round() as i16; // Now uses .round() like F32 converted.push(v); } handle_i16(&converted); @@ -387,7 +403,9 @@ impl AudioCapture { )? } other => { - return Err(AudioError::FormatNotSupported { format: format!("{:?}", other) }); + return Err(AudioError::FormatNotSupported { + format: format!("{:?}", other), + }); } }; @@ -409,17 +427,14 @@ impl AudioCapture { default_config.sample_format(), )); } - + // Fallback to first available config if let Ok(configs) = device.supported_input_configs() { if let Some(config) = configs.into_iter().next() { - return Ok(( - config.with_max_sample_rate().into(), - config.sample_format(), - )); + return Ok((config.with_max_sample_rate().into(), config.sample_format())); } } - + Err(AudioError::FormatNotSupported { format: "No supported audio formats".to_string(), }) @@ -443,7 +458,9 @@ mod convert_tests { let src = [-1.0f32, -0.5, 0.0, 0.5, 1.0]; let expected = [-32767i16, -16384, 0, 16384, 32767]; let mut out = Vec::new(); - for &s in &src { out.push((s.clamp(-1.0,1.0)*32767.0).round() as i16); } + for &s in &src { + out.push((s.clamp(-1.0, 1.0) * 32767.0).round() as i16); + } assert_eq!(&out[..], &expected); } @@ -458,7 +475,8 @@ mod convert_tests { #[test] fn u32_to_i16_scaling() { let src = [0u32, 2_147_483_648u32, 4_294_967_295u32]; - let out: Vec = src.iter() + let out: Vec = src + .iter() .map(|&s| ((s as i64 - 2_147_483_648i64) >> 16) as i16) .collect(); assert_eq!(out[1], 0); @@ -468,7 +486,10 @@ mod convert_tests { #[test] fn f64_to_i16_basic() { let src = [-1.0f64, -0.25, 0.25, 1.0]; - let out: Vec = src.iter().map(|&s| (s.clamp(-1.0,1.0)*32767.0) as i16).collect(); + let out: Vec = src + .iter() + .map(|&s| (s.clamp(-1.0, 1.0) * 32767.0) as i16) + .collect(); assert_eq!(out.len(), 4); assert!(out[0] <= -32767 && out[3] >= 32766); } diff --git a/crates/coldvox-audio/src/chunker.rs b/crates/coldvox-audio/src/chunker.rs index e60c3599..3670e851 100644 --- a/crates/coldvox-audio/src/chunker.rs +++ b/crates/coldvox-audio/src/chunker.rs @@ -5,8 +5,8 @@ use tokio::sync::broadcast; use tokio::task::JoinHandle; use tokio::time::{self, Duration}; -use super::frame_reader::FrameReader; use super::capture::DeviceConfig; +use super::frame_reader::FrameReader; use super::resampler::StreamResampler; use coldvox_telemetry::{FpsTracker, PipelineMetrics, PipelineStage}; @@ -20,9 +20,9 @@ pub struct AudioFrame { #[derive(Debug, Clone, Copy)] pub enum ResamplerQuality { - Fast, // Lower quality, lower CPU usage - Balanced, // Default quality/performance balance - Quality, // Higher quality, higher CPU usage + Fast, // Lower quality, lower CPU usage + Balanced, // Default quality/performance balance + Quality, // Higher quality, higher CPU usage } pub struct ChunkerConfig { @@ -77,8 +77,13 @@ impl AudioChunker { } pub fn spawn(self) -> JoinHandle<()> { - let mut worker = - ChunkerWorker::new(self.frame_reader, self.output_tx, self.cfg, self.metrics, self.device_cfg_rx); + let mut worker = ChunkerWorker::new( + self.frame_reader, + self.output_tx, + self.cfg, + self.metrics, + self.device_cfg_rx, + ); self.running.store(true, Ordering::SeqCst); let running = self.running.clone(); @@ -102,6 +107,7 @@ struct ChunkerWorker { current_input_rate: Option, current_input_channels: Option, device_cfg_rx: Option>, + start_time: std::time::Instant, } impl ChunkerWorker { @@ -126,6 +132,7 @@ impl ChunkerWorker { current_input_rate: None, current_input_channels: None, device_cfg_rx, + start_time: std::time::Instant::now(), } } @@ -136,7 +143,8 @@ impl ChunkerWorker { // Apply device config updates if any if let Some(rx) = &mut self.device_cfg_rx { while let Ok(cfg) = rx.try_recv() { - self.frame_reader.update_device_config(cfg.sample_rate, cfg.channels); + self.frame_reader + .update_device_config(cfg.sample_rate, cfg.channels); } } if let Some(frame) = self.frame_reader.read_frame(4096) { @@ -148,13 +156,14 @@ impl ChunkerWorker { m.update_audio_level(&frame.samples); m.mark_stage_active(PipelineStage::Capture); } - + // Check if device configuration has changed - if self.current_input_rate != Some(frame.sample_rate) - || self.current_input_channels != Some(frame.channels) { + if self.current_input_rate != Some(frame.sample_rate) + || self.current_input_channels != Some(frame.channels) + { self.reconfigure_for_device(&frame); } - + // Process the frame (resampling and channel conversion) let processed_samples = self.process_frame(&frame); self.buffer.extend(processed_samples); @@ -175,13 +184,18 @@ impl ChunkerWorker { out.push(self.buffer.pop_front().unwrap()); } - let _timestamp_ms = + // Calculate timestamp based on samples emitted + let timestamp_ms = (self.samples_emitted as u128 * 1000 / self.cfg.sample_rate_hz as u128) as u64; + let timestamp = self.start_time + std::time::Duration::from_millis(timestamp_ms); let vf = AudioFrame { - samples: out.into_iter().map(|s| s as f32 / i16::MAX as f32).collect(), + samples: out + .into_iter() + .map(|s| s as f32 / i16::MAX as f32) + .collect(), sample_rate: self.cfg.sample_rate_hz, - timestamp: std::time::Instant::now(), + timestamp, }; // A send on a broadcast channel can fail if there are no receivers. @@ -201,10 +215,10 @@ impl ChunkerWorker { } } } - + fn reconfigure_for_device(&mut self, frame: &super::capture::AudioFrame) { let needs_resampling = frame.sample_rate != self.cfg.sample_rate_hz; - + if needs_resampling { tracing::info!( "Configuring resampler: {}Hz {} ch -> {}Hz mono", @@ -212,13 +226,13 @@ impl ChunkerWorker { frame.channels, self.cfg.sample_rate_hz ); - + let resampler = StreamResampler::new_with_quality( frame.sample_rate, self.cfg.sample_rate_hz, self.cfg.resampler_quality, ); - + self.resampler = Some(Arc::new(parking_lot::Mutex::new(resampler))); } else { tracing::info!( @@ -227,11 +241,11 @@ impl ChunkerWorker { ); self.resampler = None; } - + self.current_input_rate = Some(frame.sample_rate); self.current_input_channels = Some(frame.channels); } - + fn process_frame(&mut self, frame: &super::capture::AudioFrame) -> Vec { // First, handle channel conversion if needed let mono_samples = if frame.channels == 1 { @@ -239,7 +253,8 @@ impl ChunkerWorker { } else { // Convert multi-channel to mono by averaging let channels = frame.channels as usize; - frame.samples + frame + .samples .chunks_exact(channels) .map(|chunk| { let sum: i32 = chunk.iter().map(|&s| s as i32).sum(); @@ -247,7 +262,7 @@ impl ChunkerWorker { }) .collect() }; - + // Then, apply resampling if needed if let Some(resampler) = &self.resampler { resampler.lock().process(&mono_samples) @@ -260,8 +275,8 @@ impl ChunkerWorker { #[cfg(test)] mod tests { use super::*; - use crate::ring_buffer::AudioRingBuffer; use crate::capture::AudioFrame as CapFrame; + use crate::ring_buffer::AudioRingBuffer; use std::time::Instant; #[test] @@ -270,17 +285,31 @@ mod tests { let (_prod, cons) = rb.split(); let reader = FrameReader::new(cons, 48_000, 2, 1024, None); let (tx, _rx) = broadcast::channel::(8); - let cfg = ChunkerConfig { frame_size_samples: 512, sample_rate_hz: 16_000, resampler_quality: ResamplerQuality::Balanced }; - let mut worker = ChunkerWorker::new(reader, tx, cfg, None, None); + let cfg = ChunkerConfig { + frame_size_samples: 512, + sample_rate_hz: 16_000, + resampler_quality: ResamplerQuality::Balanced, + }; + let mut worker = ChunkerWorker::new(reader, tx, cfg, None, None); // First frame at 48kHz stereo -> resampler should be created - let frame1 = CapFrame { samples: vec![0i16; 480], timestamp: Instant::now(), sample_rate: 48_000, channels: 2 }; - worker.reconfigure_for_device(&frame1); + let frame1 = CapFrame { + samples: vec![0i16; 480], + timestamp: Instant::now(), + sample_rate: 48_000, + channels: 2, + }; + worker.reconfigure_for_device(&frame1); assert!(worker.resampler.is_some()); // Frame at 16k mono -> resampler not needed - let frame2 = CapFrame { samples: vec![0i16; 160], timestamp: Instant::now(), sample_rate: 16_000, channels: 1 }; - worker.reconfigure_for_device(&frame2); + let frame2 = CapFrame { + samples: vec![0i16; 160], + timestamp: Instant::now(), + sample_rate: 16_000, + channels: 1, + }; + worker.reconfigure_for_device(&frame2); assert!(worker.resampler.is_none()); } @@ -290,11 +319,20 @@ mod tests { let (_prod, cons) = rb.split(); let reader = FrameReader::new(cons, 16_000, 2, 1024, None); let (tx, _rx) = broadcast::channel::(8); - let cfg = ChunkerConfig { frame_size_samples: 512, sample_rate_hz: 16_000, resampler_quality: ResamplerQuality::Balanced }; - let mut worker = ChunkerWorker::new(reader, tx, cfg, None, None); + let cfg = ChunkerConfig { + frame_size_samples: 512, + sample_rate_hz: 16_000, + resampler_quality: ResamplerQuality::Balanced, + }; + let mut worker = ChunkerWorker::new(reader, tx, cfg, None, None); - let samples = vec![1000i16, -1000, 900, -900, 800, -800, 700, -700]; - let frame = CapFrame { samples, timestamp: Instant::now(), sample_rate: 16_000, channels: 2 }; + let samples = vec![1000i16, -1000, 900, -900, 800, -800, 700, -700]; + let frame = CapFrame { + samples, + timestamp: Instant::now(), + sample_rate: 16_000, + channels: 2, + }; worker.reconfigure_for_device(&frame); let out = worker.process_frame(&frame); // Each pair averaged -> zeros diff --git a/crates/coldvox-audio/src/device.rs b/crates/coldvox-audio/src/device.rs index c0ad4d5a..adba33df 100644 --- a/crates/coldvox-audio/src/device.rs +++ b/crates/coldvox-audio/src/device.rs @@ -54,9 +54,7 @@ impl DeviceManager { } pub fn default_input_device_name(&self) -> Option { - self.host - .default_input_device() - .and_then(|d| d.name().ok()) + self.host.default_input_device().and_then(|d| d.name().ok()) } /// Return candidate device names in a priority order suitable for Linux PipeWire setups. diff --git a/crates/coldvox-audio/src/frame_reader.rs b/crates/coldvox-audio/src/frame_reader.rs index 10bb304f..63d20d2c 100644 --- a/crates/coldvox-audio/src/frame_reader.rs +++ b/crates/coldvox-audio/src/frame_reader.rs @@ -3,8 +3,8 @@ use std::time::Instant; use coldvox_telemetry::{BufferType, PipelineMetrics}; -use super::ring_buffer::AudioConsumer; use super::capture::AudioFrame; +use super::ring_buffer::AudioConsumer; /// Reads audio frames from ring buffer and reconstructs metadata pub struct FrameReader { @@ -19,7 +19,13 @@ pub struct FrameReader { impl FrameReader { /// Create a new FrameReader - pub fn new(consumer: AudioConsumer, device_sample_rate: u32, device_channels: u16, capacity: usize, metrics: Option>) -> Self { + pub fn new( + consumer: AudioConsumer, + device_sample_rate: u32, + device_channels: u16, + capacity: usize, + metrics: Option>, + ) -> Self { Self { consumer, device_sample_rate, @@ -45,18 +51,18 @@ impl FrameReader { let mut buffer = vec![0i16; max_samples]; let samples_read = self.consumer.read(&mut buffer); - + if samples_read == 0 { return None; } buffer.truncate(samples_read); - + // Calculate timestamp based on samples read let elapsed_samples = self.samples_read; let elapsed_ms = (elapsed_samples * 1000) / self.device_sample_rate as u64; let timestamp = self.start_time + std::time::Duration::from_millis(elapsed_ms); - + self.samples_read += samples_read as u64; Some(AudioFrame { @@ -71,14 +77,16 @@ impl FrameReader { pub fn available_samples(&self) -> usize { self.consumer.slots() } - + /// Update device configuration when it changes pub fn update_device_config(&mut self, sample_rate: u32, channels: u16) { if self.device_sample_rate != sample_rate || self.device_channels != channels { tracing::info!( "FrameReader: Device config changed from {}Hz {}ch to {}Hz {}ch", - self.device_sample_rate, self.device_channels, - sample_rate, channels + self.device_sample_rate, + self.device_channels, + sample_rate, + channels ); self.device_sample_rate = sample_rate; self.device_channels = channels; diff --git a/crates/coldvox-audio/src/lib.rs b/crates/coldvox-audio/src/lib.rs index 0963b24d..becc66bc 100644 --- a/crates/coldvox-audio/src/lib.rs +++ b/crates/coldvox-audio/src/lib.rs @@ -9,8 +9,8 @@ pub mod watchdog; // Public API pub use capture::{AudioCaptureThread, DeviceConfig}; -pub use chunker::{AudioChunker, ChunkerConfig, ResamplerQuality, AudioFrame}; +pub use chunker::{AudioChunker, AudioFrame, ChunkerConfig, ResamplerQuality}; pub use device::{DeviceInfo, DeviceManager}; pub use frame_reader::FrameReader; pub use ring_buffer::AudioRingBuffer; -pub use watchdog::WatchdogTimer; \ No newline at end of file +pub use watchdog::WatchdogTimer; diff --git a/crates/coldvox-audio/src/resampler.rs b/crates/coldvox-audio/src/resampler.rs index dd4fae0a..7bc4860d 100644 --- a/crates/coldvox-audio/src/resampler.rs +++ b/crates/coldvox-audio/src/resampler.rs @@ -1,6 +1,5 @@ use rubato::{ - Resampler, SincFixedIn, - SincInterpolationParameters, SincInterpolationType, WindowFunction, + Resampler, SincFixedIn, SincInterpolationParameters, SincInterpolationType, WindowFunction, }; use super::chunker::ResamplerQuality; @@ -28,57 +27,58 @@ impl StreamResampler { pub fn new(in_rate: u32, out_rate: u32) -> Self { Self::new_with_quality(in_rate, out_rate, ResamplerQuality::Balanced) } - + /// Create a new mono resampler with specified quality preset. pub fn new_with_quality(in_rate: u32, out_rate: u32, quality: ResamplerQuality) -> Self { // For VAD, we want low latency, so use a relatively small chunk size // 512 samples at 16kHz = 32ms, which aligns well with typical VAD frame sizes let chunk_size = 512; - + // Configure sinc interpolation based on quality preset let sinc_params = match quality { ResamplerQuality::Fast => { // Lower quality, faster processing SincInterpolationParameters { - sinc_len: 32, // Shorter filter for lower CPU usage - f_cutoff: 0.92, // Slightly more aggressive cutoff - interpolation: SincInterpolationType::Linear, // Simpler interpolation - oversampling_factor: 64, // Lower oversampling - window: WindowFunction::Blackman, // Simple window + sinc_len: 32, // Shorter filter for lower CPU usage + f_cutoff: 0.92, // Slightly more aggressive cutoff + interpolation: SincInterpolationType::Linear, // Simpler interpolation + oversampling_factor: 64, // Lower oversampling + window: WindowFunction::Blackman, // Simple window } - }, + } ResamplerQuality::Balanced => { // Medium quality, good for speech SincInterpolationParameters { - sinc_len: 64, // Medium quality - f_cutoff: 0.95, // Slightly below Nyquist for better anti-aliasing + sinc_len: 64, // Medium quality + f_cutoff: 0.95, // Slightly below Nyquist for better anti-aliasing interpolation: SincInterpolationType::Cubic, - oversampling_factor: 128, // Good balance of quality vs memory - window: WindowFunction::Blackman2, // Good stopband attenuation + oversampling_factor: 128, // Good balance of quality vs memory + window: WindowFunction::Blackman2, // Good stopband attenuation } - }, + } ResamplerQuality::Quality => { // Higher quality, more CPU usage SincInterpolationParameters { sinc_len: 128, // Longer filter for better quality - f_cutoff: 0.97, // Closer to Nyquist for sharper cutoff + f_cutoff: 0.97, // Closer to Nyquist for sharper cutoff interpolation: SincInterpolationType::Cubic, - oversampling_factor: 256, // Higher oversampling for better quality - window: WindowFunction::BlackmanHarris2, // Best stopband attenuation + oversampling_factor: 256, // Higher oversampling for better quality + window: WindowFunction::BlackmanHarris2, // Best stopband attenuation } - }, + } }; - + // Create the resampler // We only need 1 channel for mono audio let resampler = SincFixedIn::::new( - out_rate as f64 / in_rate as f64, // Resample ratio - 2.0, // Max resample ratio change (not used in fixed mode) + out_rate as f64 / in_rate as f64, // Resample ratio + 2.0, // Max resample ratio change (not used in fixed mode) sinc_params, chunk_size, - 1, // mono - ).expect("Failed to create Rubato resampler"); - + 1, // mono + ) + .expect("Failed to create Rubato resampler"); + Self { in_rate, out_rate, @@ -107,7 +107,7 @@ impl StreamResampler { // Prepare input for Rubato (it expects Vec> for channels) let chunk: Vec = self.input_buffer.drain(..self.chunk_size).collect(); let input_frames = vec![chunk]; - + // Process the chunk let output_frames = match self.resampler.process(&input_frames, None) { Ok(frames) => frames, @@ -117,7 +117,7 @@ impl StreamResampler { return Vec::new(); } }; - + // Append resampled output (first channel only, since we're mono) if !output_frames.is_empty() && !output_frames[0].is_empty() { self.output_buffer.extend_from_slice(&output_frames[0]); @@ -132,10 +132,10 @@ impl StreamResampler { let i16_sample = (clamped * 32767.0).round() as i16; result.push(i16_sample); } - + // Clear the output buffer for next time self.output_buffer.clear(); - + result } @@ -148,10 +148,14 @@ impl StreamResampler { } /// Current input rate. - pub fn input_rate(&self) -> u32 { self.in_rate } - + pub fn input_rate(&self) -> u32 { + self.in_rate + } + /// Current output rate. - pub fn output_rate(&self) -> u32 { self.out_rate } + pub fn output_rate(&self) -> u32 { + self.out_rate + } } #[cfg(test)] @@ -162,7 +166,11 @@ mod quality_tests { fn process_with_all_quality_presets() { // Provide enough samples for internal filter latency to flush let input: Vec = (0..4096).map(|i| ((i % 100) as i16) - 50).collect(); // some signal - for q in [ResamplerQuality::Fast, ResamplerQuality::Balanced, ResamplerQuality::Quality] { + for q in [ + ResamplerQuality::Fast, + ResamplerQuality::Balanced, + ResamplerQuality::Quality, + ] { let mut rs = StreamResampler::new_with_quality(48_000, 16_000, q); let mut out = rs.process(&input); // Process a second chunk to ensure output becomes available @@ -183,45 +191,54 @@ mod tests { // 4.8k samples (~0.1s). Expect ~1.6k out. let n_in = 4_800; let input: Vec = (0..n_in).map(|i| (i % 32768) as i16).collect(); - + // Process in chunks to test buffering let mut all_output = Vec::new(); for chunk in input.chunks(1000) { let out = rs.process(chunk); all_output.extend(out); } - + // We should get approximately 1/3 of the input samples // Allow some variance due to buffering - assert!(all_output.len() >= 1400 && all_output.len() <= 1700, - "Expected ~1600 samples, got {}", all_output.len()); + assert!( + all_output.len() >= 1400 && all_output.len() <= 1700, + "Expected ~1600 samples, got {}", + all_output.len() + ); } #[test] fn upsample_16k_to_48k_constant() { let mut rs = StreamResampler::new(16_000, 48_000); // Constant tone: output should be approximately constant too - let input = vec![1000i16; 1600]; // 100ms at 16kHz - + let input = vec![1000i16; 1600]; // 100ms at 16kHz + // Process in one go let out = rs.process(&input); - + // We should get approximately 3x the input samples // Allow wider variance due to Rubato's buffering strategy // The exact output depends on how the chunk size aligns with the resample ratio - assert!(out.len() >= 4400 && out.len() <= 5000, - "Expected ~4800 samples, got {}", out.len()); - + assert!( + out.len() >= 4400 && out.len() <= 5000, + "Expected ~4800 samples, got {}", + out.len() + ); + // Check middle samples are close to the input value // (skip edges which may have interpolation artifacts) if out.len() > 100 { for &s in &out[50..out.len().saturating_sub(50)] { - assert!((900..=1100).contains(&s), - "Sample {} too far from expected 1000", s); + assert!( + (900..=1100).contains(&s), + "Sample {} too far from expected 1000", + s + ); } } } - + #[test] fn passthrough_same_rate() { let mut rs = StreamResampler::new(16_000, 16_000); diff --git a/crates/coldvox-audio/src/watchdog.rs b/crates/coldvox-audio/src/watchdog.rs index 0509a056..51e22a3a 100644 --- a/crates/coldvox-audio/src/watchdog.rs +++ b/crates/coldvox-audio/src/watchdog.rs @@ -58,7 +58,7 @@ impl WatchdogTimer { } }); - *self.handle.write() = Some(handle); + *self.handle.write() = Some(handle); } pub fn feed(&self) { diff --git a/crates/coldvox-foundation/src/lib.rs b/crates/coldvox-foundation/src/lib.rs index c8d3331d..e84e5f02 100644 --- a/crates/coldvox-foundation/src/lib.rs +++ b/crates/coldvox-foundation/src/lib.rs @@ -6,4 +6,4 @@ pub mod state; pub use error::*; pub use health::*; pub use shutdown::*; -pub use state::*; \ No newline at end of file +pub use state::*; diff --git a/crates/coldvox-gui/src/main.rs b/crates/coldvox-gui/src/main.rs index b6a6200c..a4a389b8 100644 --- a/crates/coldvox-gui/src/main.rs +++ b/crates/coldvox-gui/src/main.rs @@ -22,4 +22,4 @@ fn main() { println!(" • Text injection configuration"); println!(); println!("For now, use the TUI dashboard: cargo run -p coldvox-app --bin tui_dashboard"); -} \ No newline at end of file +} diff --git a/crates/coldvox-stt-vosk/src/lib.rs b/crates/coldvox-stt-vosk/src/lib.rs index c4b4a8d8..55634ebb 100644 --- a/crates/coldvox-stt-vosk/src/lib.rs +++ b/crates/coldvox-stt-vosk/src/lib.rs @@ -11,8 +11,8 @@ pub use vosk_transcriber::VoskTranscriber; // Re-export common types pub use coldvox_stt::{ - EventBasedTranscriber, Transcriber, TranscriptionConfig, TranscriptionEvent, WordInfo, - next_utterance_id, + next_utterance_id, EventBasedTranscriber, Transcriber, TranscriptionConfig, TranscriptionEvent, + WordInfo, }; /// Get default model path from environment or fallback @@ -24,4 +24,4 @@ pub fn default_model_path() -> String { #[cfg(not(feature = "vosk"))] pub fn create_default_transcriber(_config: TranscriptionConfig) -> Result<(), String> { Err("Vosk feature is not enabled. Enable with --features vosk".to_string()) -} \ No newline at end of file +} diff --git a/crates/coldvox-stt-vosk/src/vosk_transcriber.rs b/crates/coldvox-stt-vosk/src/vosk_transcriber.rs index f02e4050..b130f4ad 100644 --- a/crates/coldvox-stt-vosk/src/vosk_transcriber.rs +++ b/crates/coldvox-stt-vosk/src/vosk_transcriber.rs @@ -1,8 +1,9 @@ -use vosk::{Model, Recognizer, DecodingState, CompleteResult, PartialResult}; use coldvox_stt::{ - EventBasedTranscriber, Transcriber, TranscriptionEvent, WordInfo, TranscriptionConfig, next_utterance_id + next_utterance_id, EventBasedTranscriber, Transcriber, TranscriptionConfig, TranscriptionEvent, + WordInfo, }; -use tracing::{debug, warn}; +use tracing::warn; +use vosk::{CompleteResult, DecodingState, Model, PartialResult, Recognizer}; pub struct VoskTranscriber { recognizer: Recognizer, @@ -17,47 +18,51 @@ impl VoskTranscriber { if (sample_rate - 16000.0).abs() > 0.1 { warn!( "VoskTranscriber: Sample rate {}Hz differs from expected 16000Hz. \ - This may affect transcription quality.", + This may affect transcription quality.", sample_rate ); } - + // Use model path from config, or get default let model_path = if config.model_path.is_empty() { crate::default_model_path() } else { config.model_path.clone() }; - + // Check if model path exists if !std::path::Path::new(&model_path).exists() { return Err(format!("Vosk model not found at: {}", model_path)); } - + // Load the model let model = Model::new(&model_path) .ok_or_else(|| format!("Failed to load Vosk model from: {}", model_path))?; - + // Create recognizer with configuration - let mut recognizer = Recognizer::new(&model, sample_rate) - .ok_or_else(|| format!("Failed to create Vosk recognizer with sample rate: {}", sample_rate))?; - + let mut recognizer = Recognizer::new(&model, sample_rate).ok_or_else(|| { + format!( + "Failed to create Vosk recognizer with sample rate: {}", + sample_rate + ) + })?; + // Configure recognizer based on config recognizer.set_max_alternatives(config.max_alternatives as u16); recognizer.set_words(config.include_words); recognizer.set_partial_words(config.partial_results && config.include_words); - + // Update the config to use the resolved model path let mut final_config = config; final_config.model_path = model_path; - + Ok(Self { recognizer, config: final_config, current_utterance_id: next_utterance_id(), }) } - + /// Create a new VoskTranscriber with default model path (backward compatibility) pub fn new_with_default(model_path: &str, sample_rate: f32) -> Result { let config = TranscriptionConfig { @@ -70,37 +75,49 @@ impl VoskTranscriber { }; Self::new(config, sample_rate) } - + /// Update configuration (requires recreating recognizer) - pub fn update_config(&mut self, config: TranscriptionConfig, sample_rate: f32) -> Result<(), String> { + pub fn update_config( + &mut self, + config: TranscriptionConfig, + sample_rate: f32, + ) -> Result<(), String> { // Use model path from config, or get default let model_path = if config.model_path.is_empty() { crate::default_model_path() } else { config.model_path.clone() }; - + // Recreate recognizer with new config let model = Model::new(&model_path) .ok_or_else(|| format!("Failed to load Vosk model from: {}", model_path))?; - - let mut recognizer = Recognizer::new(&model, sample_rate) - .ok_or_else(|| format!("Failed to create Vosk recognizer with sample rate: {}", sample_rate))?; - + + let mut recognizer = Recognizer::new(&model, sample_rate).ok_or_else(|| { + format!( + "Failed to create Vosk recognizer with sample rate: {}", + sample_rate + ) + })?; + recognizer.set_max_alternatives(config.max_alternatives as u16); recognizer.set_words(config.include_words); recognizer.set_partial_words(config.partial_results && config.include_words); - + self.recognizer = recognizer; let mut final_config = config; final_config.model_path = model_path; self.config = final_config; Ok(()) } - + // Private helper methods - - fn parse_complete_result_static(result: CompleteResult, utterance_id: u64, include_words: bool) -> Option { + + fn parse_complete_result_static( + result: CompleteResult, + utterance_id: u64, + include_words: bool, + ) -> Option { match result { CompleteResult::Single(single) => { let text = single.text; @@ -108,16 +125,22 @@ impl VoskTranscriber { None } else { let words = if include_words && !single.result.is_empty() { - Some(single.result.into_iter().map(|w| WordInfo { - text: w.word.to_string(), - start: w.start as f32, - end: w.end as f32, - conf: w.conf as f32, - }).collect()) + Some( + single + .result + .into_iter() + .map(|w| WordInfo { + text: w.word.to_string(), + start: w.start, + end: w.end, + conf: w.conf, + }) + .collect(), + ) } else { None }; - + Some(TranscriptionEvent::Final { utterance_id, text: text.to_string(), @@ -133,16 +156,22 @@ impl VoskTranscriber { None } else { let words = if include_words && !first.result.is_empty() { - Some(first.result.iter().map(|w| WordInfo { - text: w.word.to_string(), - start: w.start as f32, - end: w.end as f32, - conf: 0.5, // Default confidence when not available from Vosk API - }).collect()) + Some( + first + .result + .iter() + .map(|w| WordInfo { + text: w.word.to_string(), + start: w.start, + end: w.end, + conf: 0.5, // Default confidence when not available from Vosk API + }) + .collect(), + ) } else { None }; - + Some(TranscriptionEvent::Final { utterance_id, text: text.to_string(), @@ -155,8 +184,11 @@ impl VoskTranscriber { } } } - - fn parse_partial_result_static(partial: PartialResult, utterance_id: u64) -> Option { + + fn parse_partial_result_static( + partial: PartialResult, + utterance_id: u64, + ) -> Option { let text = partial.partial; if text.trim().is_empty() { None @@ -179,23 +211,30 @@ impl EventBasedTranscriber for VoskTranscriber { if !self.config.enabled { return Ok(None); } - + // Pass the i16 samples directly - vosk expects i16 - let state = self.recognizer.accept_waveform(pcm) + let state = self + .recognizer + .accept_waveform(pcm) .map_err(|e| format!("Vosk waveform acceptance failed: {:?}", e))?; - + match state { DecodingState::Finalized => { // Get final result when speech segment is complete let result = self.recognizer.result(); - let event = Self::parse_complete_result_static(result, self.current_utterance_id, self.config.include_words); + let event = Self::parse_complete_result_static( + result, + self.current_utterance_id, + self.config.include_words, + ); Ok(event) } DecodingState::Running => { // Get partial result for ongoing speech if enabled if self.config.partial_results { let partial = self.recognizer.partial_result(); - let event = Self::parse_partial_result_static(partial, self.current_utterance_id); + let event = + Self::parse_partial_result_static(partial, self.current_utterance_id); Ok(event) } else { Ok(None) @@ -210,18 +249,22 @@ impl EventBasedTranscriber for VoskTranscriber { } } } - + /// Finalize current utterance and return final result fn finalize_utterance(&mut self) -> Result, String> { let final_result = self.recognizer.final_result(); - let event = Self::parse_complete_result_static(final_result, self.current_utterance_id, self.config.include_words); - + let event = Self::parse_complete_result_static( + final_result, + self.current_utterance_id, + self.config.include_words, + ); + // Start new utterance for next speech segment self.current_utterance_id = next_utterance_id(); - + Ok(event) } - + /// Reset recognizer state for new utterance fn reset(&mut self) -> Result<(), String> { // Vosk doesn't have an explicit reset, but finalizing clears state @@ -229,7 +272,7 @@ impl EventBasedTranscriber for VoskTranscriber { self.current_utterance_id = next_utterance_id(); Ok(()) } - + /// Get current configuration fn config(&self) -> &TranscriptionConfig { &self.config @@ -241,7 +284,9 @@ impl Transcriber for VoskTranscriber { fn accept_pcm16(&mut self, pcm: &[i16]) -> Result, String> { match self.accept_frame(pcm)? { Some(TranscriptionEvent::Final { text, .. }) => Ok(Some(text)), - Some(TranscriptionEvent::Partial { text, .. }) => Ok(Some(format!("[partial] {}", text))), + Some(TranscriptionEvent::Partial { text, .. }) => { + Ok(Some(format!("[partial] {}", text))) + } Some(TranscriptionEvent::Error { message, .. }) => Err(message), None => Ok(None), } @@ -255,4 +300,4 @@ impl Transcriber for VoskTranscriber { None => Ok(None), } } -} \ No newline at end of file +} diff --git a/crates/coldvox-stt/src/lib.rs b/crates/coldvox-stt/src/lib.rs index e744de74..eda6ac00 100644 --- a/crates/coldvox-stt/src/lib.rs +++ b/crates/coldvox-stt/src/lib.rs @@ -19,7 +19,7 @@ pub fn next_utterance_id() -> u64 { } /// Core transcription interface -/// +/// /// This trait defines the minimal interface for streaming transcription. /// It's kept for backward compatibility - new implementations should use /// the event-based interface with TranscriptionEvent. @@ -38,13 +38,13 @@ pub trait Transcriber { pub trait EventBasedTranscriber { /// Accept PCM16 audio and return transcription events fn accept_frame(&mut self, pcm: &[i16]) -> Result, String>; - + /// Finalize current utterance and return final result fn finalize_utterance(&mut self) -> Result, String>; - + /// Reset transcriber state for new utterance fn reset(&mut self) -> Result<(), String>; - + /// Get current configuration fn config(&self) -> &TranscriptionConfig; -} \ No newline at end of file +} diff --git a/crates/coldvox-stt/src/processor.rs b/crates/coldvox-stt/src/processor.rs index 172fe8e8..25f11432 100644 --- a/crates/coldvox-stt/src/processor.rs +++ b/crates/coldvox-stt/src/processor.rs @@ -9,7 +9,7 @@ use std::time::Instant; use tokio::sync::{broadcast, mpsc}; use tracing::{debug, error, info, warn}; -use crate::types::{TranscriptionEvent, TranscriptionConfig}; +use crate::types::{TranscriptionConfig, TranscriptionEvent}; use crate::EventBasedTranscriber; /// Audio frame type (generic over audio formats) @@ -27,14 +27,9 @@ pub struct AudioFrame { #[derive(Debug, Clone)] pub enum VadEvent { /// Speech started - SpeechStart { - timestamp_ms: u64, - }, + SpeechStart { timestamp_ms: u64 }, /// Speech ended - SpeechEnd { - timestamp_ms: u64, - duration_ms: u64, - }, + SpeechEnd { timestamp_ms: u64, duration_ms: u64 }, } /// STT processor state @@ -212,10 +207,15 @@ impl SttProcessor { ); // Process the buffered audio all at once - if let UtteranceState::SpeechActive { audio_buffer, frames_buffered, .. } = &self.state { + if let UtteranceState::SpeechActive { + audio_buffer, + frames_buffered, + .. + } = &self.state + { let buffer_size = audio_buffer.len(); info!( - target: "stt", + target: "stt", "Processing buffered audio: {} samples ({:.2}s), {} frames", buffer_size, buffer_size as f32 / 16000.0, @@ -290,7 +290,12 @@ impl SttProcessor { self.metrics.write().frames_in += 1; // Only buffer if speech is active - if let UtteranceState::SpeechActive { ref mut audio_buffer, ref mut frames_buffered, .. } = &mut self.state { + if let UtteranceState::SpeechActive { + ref mut audio_buffer, + ref mut frames_buffered, + .. + } = &mut self.state + { // Buffer the audio frame audio_buffer.extend_from_slice(&frame.data); *frames_buffered += 1; @@ -329,10 +334,9 @@ impl SttProcessor { // Send to channel with backpressure - wait if channel is full // Use timeout to prevent indefinite blocking - match tokio::time::timeout( - std::time::Duration::from_secs(5), - self.event_tx.send(event) - ).await { + match tokio::time::timeout(std::time::Duration::from_secs(5), self.event_tx.send(event)) + .await + { Ok(Ok(())) => { // Successfully sent } @@ -347,4 +351,4 @@ impl SttProcessor { } } } -} \ No newline at end of file +} diff --git a/crates/coldvox-stt/src/types.rs b/crates/coldvox-stt/src/types.rs index c0539ad7..f521381c 100644 --- a/crates/coldvox-stt/src/types.rs +++ b/crates/coldvox-stt/src/types.rs @@ -20,10 +20,7 @@ pub enum TranscriptionEvent { words: Option>, }, /// Transcription error - Error { - code: String, - message: String, - }, + Error { code: String, message: String }, } /// Word-level timing and confidence information @@ -71,4 +68,4 @@ impl Default for TranscriptionConfig { buffer_size_ms: 512, } } -} \ No newline at end of file +} diff --git a/crates/coldvox-telemetry/src/lib.rs b/crates/coldvox-telemetry/src/lib.rs index bf886244..cf41c96d 100644 --- a/crates/coldvox-telemetry/src/lib.rs +++ b/crates/coldvox-telemetry/src/lib.rs @@ -2,4 +2,4 @@ pub mod metrics; pub mod pipeline_metrics; pub use metrics::*; -pub use pipeline_metrics::*; \ No newline at end of file +pub use pipeline_metrics::*; diff --git a/crates/coldvox-telemetry/src/pipeline_metrics.rs b/crates/coldvox-telemetry/src/pipeline_metrics.rs index 1de37e16..22e32e40 100644 --- a/crates/coldvox-telemetry/src/pipeline_metrics.rs +++ b/crates/coldvox-telemetry/src/pipeline_metrics.rs @@ -1,58 +1,57 @@ +use parking_lot::RwLock; use std::sync::atomic::{AtomicBool, AtomicI16, AtomicU64, AtomicUsize, Ordering}; use std::sync::Arc; use std::time::{Duration, Instant}; -use parking_lot::RwLock; -#[cfg(feature = "text-injection")] -use parking_lot::Mutex; #[cfg(feature = "text-injection")] use coldvox_text_injection::types::InjectionMetrics; +#[cfg(feature = "text-injection")] +use parking_lot::Mutex; /// Shared metrics for cross-thread pipeline monitoring #[derive(Clone, Default)] pub struct PipelineMetrics { // Audio level monitoring - pub current_peak: Arc, // Peak sample value in current window - pub current_rms: Arc, // RMS * 1000 for precision - pub audio_level_db: Arc, // Current level in dB * 10 - + pub current_peak: Arc, // Peak sample value in current window + pub current_rms: Arc, // RMS * 1000 for precision + pub audio_level_db: Arc, // Current level in dB * 10 + // Pipeline stage tracking - pub stage_capture: Arc, // Data reached capture stage - pub stage_chunker: Arc, // Data reached chunker stage - pub stage_vad: Arc, // Data reached VAD stage - pub stage_output: Arc, // Data reached output stage - + pub stage_capture: Arc, // Data reached capture stage + pub stage_chunker: Arc, // Data reached chunker stage + pub stage_vad: Arc, // Data reached VAD stage + pub stage_output: Arc, // Data reached output stage + // Buffer monitoring - pub capture_buffer_fill: Arc, // Capture buffer fill % - pub chunker_buffer_fill: Arc, // Chunker buffer fill % - pub vad_buffer_fill: Arc, // VAD buffer fill % - + pub capture_buffer_fill: Arc, // Capture buffer fill % + pub chunker_buffer_fill: Arc, // Chunker buffer fill % + pub vad_buffer_fill: Arc, // VAD buffer fill % + // Frame rate tracking - pub capture_fps: Arc, // Frames per second * 10 - pub chunker_fps: Arc, // Chunks per second * 10 - pub vad_fps: Arc, // VAD frames per second * 10 - + pub capture_fps: Arc, // Frames per second * 10 + pub chunker_fps: Arc, // Chunks per second * 10 + pub vad_fps: Arc, // VAD frames per second * 10 + // Event counters pub capture_frames: Arc, pub chunker_frames: Arc, // Latency tracking - pub capture_to_chunker_ms: Arc, // Latency in ms - pub chunker_to_vad_ms: Arc, // Latency in ms - pub end_to_end_ms: Arc, // Total pipeline latency - + pub capture_to_chunker_ms: Arc, // Latency in ms + pub chunker_to_vad_ms: Arc, // Latency in ms + pub end_to_end_ms: Arc, // Total pipeline latency + // Activity indicators - pub is_speaking: Arc, // Currently in speech + pub is_speaking: Arc, // Currently in speech pub last_speech_time: Arc>>, pub speech_segments_count: Arc, - + // Error tracking pub capture_errors: Arc, pub chunker_errors: Arc, // Text Injection Metrics #[cfg(feature = "text-injection")] pub injection: Arc>, - } impl PipelineMetrics { @@ -61,21 +60,16 @@ impl PipelineMetrics { if samples.is_empty() { return; } - + // Calculate peak - let peak = samples.iter() - .map(|&s| s.abs()) - .max() - .unwrap_or(0); + let peak = samples.iter().map(|&s| s.abs()).max().unwrap_or(0); self.current_peak.store(peak, Ordering::Relaxed); - + // Calculate RMS - let sum: i64 = samples.iter() - .map(|&s| s as i64 * s as i64) - .sum(); + let sum: i64 = samples.iter().map(|&s| s as i64 * s as i64).sum(); let rms = ((sum as f64 / samples.len() as f64).sqrt() * 1000.0) as u64; self.current_rms.store(rms, Ordering::Relaxed); - + // Calculate dB (reference: 32768 = 0dB) let db = if peak > 0 { (20.0 * (peak as f64 / 32768.0).log10() * 10.0) as i16 @@ -84,7 +78,7 @@ impl PipelineMetrics { }; self.audio_level_db.store(db, Ordering::Relaxed); } - + /// Mark a pipeline stage as active (with decay) pub fn mark_stage_active(&self, stage: PipelineStage) { match stage { @@ -94,7 +88,7 @@ impl PipelineMetrics { PipelineStage::Output => self.stage_output.store(true, Ordering::Relaxed), } } - + /// Clear stage activity (for decay effect in UI) pub fn decay_stages(&self) { // Called periodically to create a "pulse" effect @@ -103,7 +97,7 @@ impl PipelineMetrics { self.stage_vad.store(false, Ordering::Relaxed); self.stage_output.store(false, Ordering::Relaxed); } - + /// Update buffer fill percentage (0-100) pub fn update_buffer_fill(&self, buffer: BufferType, fill_percent: usize) { let fill = fill_percent.min(100); @@ -115,21 +109,23 @@ impl PipelineMetrics { } pub fn update_capture_fps(&self, fps: f64) { - self.capture_fps.store((fps * 10.0) as u64, Ordering::Relaxed); + self.capture_fps + .store((fps * 10.0) as u64, Ordering::Relaxed); } - + pub fn update_chunker_fps(&self, fps: f64) { - self.chunker_fps.store((fps * 10.0) as u64, Ordering::Relaxed); + self.chunker_fps + .store((fps * 10.0) as u64, Ordering::Relaxed); } - + pub fn update_vad_fps(&self, fps: f64) { self.vad_fps.store((fps * 10.0) as u64, Ordering::Relaxed); } - + pub fn increment_capture_frames(&self) { self.capture_frames.fetch_add(1, Ordering::Relaxed); } - + pub fn increment_chunker_frames(&self) { self.chunker_frames.fetch_add(1, Ordering::Relaxed); } diff --git a/crates/coldvox-text-injection/Cargo.toml b/crates/coldvox-text-injection/Cargo.toml index be70672a..3b849c1b 100644 --- a/crates/coldvox-text-injection/Cargo.toml +++ b/crates/coldvox-text-injection/Cargo.toml @@ -18,6 +18,7 @@ serde = { version = "1.0", features = ["derive"] } serde_json = "1.0" toml = "0.8" chrono = { version = "0.4", features = ["serde"] } +coldvox-stt = { path = "../coldvox-stt" } # Backend dependencies (all optional) atspi = { version = "0.22", optional = true } diff --git a/crates/coldvox-text-injection/src/atspi_injector.rs b/crates/coldvox-text-injection/src/atspi_injector.rs index 6718fe4a..c69240f8 100644 --- a/crates/coldvox-text-injection/src/atspi_injector.rs +++ b/crates/coldvox-text-injection/src/atspi_injector.rs @@ -1,198 +1,57 @@ -use crate::focus::{FocusTracker, FocusStatus}; -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics}; -use atspi::action::Action; -use atspi::editable_text::EditableText; -use atspi::Accessible; -use std::time::Duration; -use tokio::time::timeout; -use tracing::{debug, error, info, warn}; +use crate::types::{InjectionConfig, InjectionError, InjectionMetrics, TextInjector}; use async_trait::async_trait; +use tracing::warn; /// AT-SPI2 injector for direct text insertion +/// NOTE: This is a placeholder implementation - full AT-SPI support requires API clarification pub struct AtspiInjector { - config: InjectionConfig, + _config: InjectionConfig, metrics: InjectionMetrics, - focus_tracker: FocusTracker, } impl AtspiInjector { /// Create a new AT-SPI2 injector pub fn new(config: InjectionConfig) -> Self { Self { - config: config.clone(), + _config: config, metrics: InjectionMetrics::default(), - focus_tracker: FocusTracker::new(config), } } - /// Insert text directly into the focused element using EditableText interface - async fn insert_text_direct(&self, text: &str, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get EditableText interface - let editable_text = EditableText::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Get current text length to insert at end - let text_length = editable_text.get_text(0, -1).await - .map_err(|e| InjectionError::Atspi(e))? - .len() as i32; - - // Insert text at the end - editable_text.insert_text(text_length, text).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - // TODO: Fix metrics - self.metrics.record_success requires &mut self - info!("Successfully inserted text via AT-SPI2 EditableText ({} chars)", text.len()); - - Ok(()) - } - - /// Trigger paste action on the focused element - async fn trigger_paste_action(&self, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get Action interface - let action = Action::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Find paste action - let n_actions = action.n_actions().await - .map_err(|e| InjectionError::Atspi(e))?; - - for i in 0..n_actions { - let action_name = action.get_action_name(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let action_description = action.get_action_description(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Check if this is a paste action (case-insensitive) - if action_name.to_lowercase().contains("paste") || - action_description.to_lowercase().contains("paste") { - debug!("Found paste action: {} ({})", action_name, action_description); - - // Execute the paste action - action.do_action(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - // TODO: Fix metrics - self.metrics.record_success requires &mut self - info!("Successfully triggered paste action via AT-SPI2"); - return Ok(()); - } - } - - Err(InjectionError::MethodUnavailable("No paste action found".to_string())) + /// Check if AT-SPI is available + pub fn is_available(&self) -> bool { + // TODO: Implement actual AT-SPI availability check + // For now, return false to prevent usage until properly implemented + warn!("AT-SPI injector not yet implemented for atspi 0.22"); + false } } #[async_trait] -impl super::types::TextInjector for AtspiInjector { +impl TextInjector for AtspiInjector { + /// Get the name of this injector fn name(&self) -> &'static str { "AT-SPI2" } + /// Check if this injector is available for use fn is_available(&self) -> bool { - // AT-SPI2 should be available on KDE/Wayland - std::env::var("XDG_SESSION_TYPE").map(|t| t == "wayland").unwrap_or(false) + self.is_available() } - async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } - - let start = std::time::Instant::now(); + /// Inject text using AT-SPI2 + async fn inject(&mut self, _text: &str) -> Result<(), InjectionError> { + // TODO: Implement actual AT-SPI text injection when atspi 0.22 API is clarified + // The atspi crate version 0.22 has a different API structure than expected + // Need to investigate proper usage of AccessibilityConnection and proxies - // Get focus status - let focus_status = self.focus_tracker.get_focus_status().await.map_err(|e| { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::AtspiInsert, duration, e.to_string()); - e - })?; - - // Only proceed if we have a confirmed editable field or unknown focus (if allowed) - if focus_status == FocusStatus::NonEditable { - // We can't insert text directly, but might be able to paste - debug!("Focused element is not editable, skipping direct insertion"); - return Err(InjectionError::MethodUnavailable("Focused element not editable".to_string())); - } - - if focus_status == FocusStatus::Unknown && !self.config.inject_on_unknown_focus { - debug!("Focus state unknown and injection on unknown focus disabled"); - return Err(InjectionError::Other("Unknown focus state".to_string())); - } - - // Get focused element - let focused = match self.focus_tracker.get_focused_element().await { - Ok(Some(element)) => element, - Ok(None) => { - debug!("No focused element"); - return Err(InjectionError::Other("No focused element".to_string())); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::AtspiInsert, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Try direct insertion first - let direct_res = timeout( - Duration::from_millis(self.config.per_method_timeout_ms), - self.insert_text_direct(text, &focused), - ).await; - match direct_res { - Ok(Ok(())) => return Ok(()), - Ok(Err(e)) => { - debug!("Direct insertion failed: {}", e); - } - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - format!("Timeout after {}ms", self.config.per_method_timeout_ms) - ); - return Err(InjectionError::Timeout(self.config.per_method_timeout_ms)); - } - } - - // If direct insertion failed, try paste action if the element supports it - if self.focus_tracker.supports_paste_action(&focused).await.unwrap_or(false) { - let paste_res = timeout( - Duration::from_millis(self.config.paste_action_timeout_ms), - self.trigger_paste_action(&focused), - ).await; - match paste_res { - Ok(Ok(())) => return Ok(()), - Ok(Err(e)) => { - debug!("Paste action failed: {}", e); - } - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - format!("Timeout after {}ms", self.config.paste_action_timeout_ms) - ); - return Err(InjectionError::Timeout(self.config.paste_action_timeout_ms)); - } - } - } - - // If we get here, both methods failed - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::AtspiInsert, - duration, - "Both direct insertion and paste action failed".to_string() - ); - Err(InjectionError::MethodFailed("AT-SPI2 injection failed".to_string())) + warn!("AT-SPI text injection not yet implemented for atspi 0.22"); + Err(InjectionError::MethodUnavailable( + "AT-SPI implementation pending - atspi 0.22 API differs from expected".to_string() + )) } + /// Get current metrics fn metrics(&self) -> &InjectionMetrics { &self.metrics } diff --git a/crates/coldvox-text-injection/src/backend.rs b/crates/coldvox-text-injection/src/backend.rs index c2a5c386..c340508f 100644 --- a/crates/coldvox-text-injection/src/backend.rs +++ b/crates/coldvox-text-injection/src/backend.rs @@ -36,68 +36,70 @@ impl BackendDetector { /// Detect available backends on the current system pub fn detect_available_backends(&self) -> Vec { let mut available = Vec::new(); - + // Detect Wayland backends if self.is_wayland() { // Check for xdg-desktop-portal VirtualKeyboard if self.has_xdg_desktop_portal_virtual_keyboard() { available.push(Backend::WaylandXdgDesktopPortal); } - + // Check for wlr-virtual-keyboard (requires compositor support) if self.has_wlr_virtual_keyboard() { available.push(Backend::WaylandVirtualKeyboard); } } - + // Detect X11 backends if self.is_x11() { // Check for xdotool if self.has_xdotool() { available.push(Backend::X11Xdotool); } - + // Native X11 wrapper is always available if on X11 available.push(Backend::X11Native); } - + // Detect macOS backends if self.is_macos() { available.push(Backend::MacCgEvent); available.push(Backend::MacPasteboard); } - + // Detect Windows backends if self.is_windows() { available.push(Backend::WindowsSendInput); available.push(Backend::WindowsClipboard); } - + available } - + /// Get the preferred backend based on availability and configuration pub fn get_preferred_backend(&self) -> Option { let available = self.detect_available_backends(); - + // Return the most preferred available backend - Self::preferred_order().into_iter().find(|&preferred| available.contains(&preferred)) + Self::preferred_order() + .into_iter() + .find(|&preferred| available.contains(&preferred)) } - + /// Get the preferred order of backends fn preferred_order() -> Vec { vec![ - Backend::WaylandXdgDesktopPortal, // Preferred on Wayland - Backend::WaylandVirtualKeyboard, // Fallback on Wayland - Backend::X11Xdotool, // Preferred on X11 - Backend::X11Native, // Fallback on X11 - Backend::MacCgEvent, // Preferred on macOS - Backend::MacPasteboard, // Fallback on macOS - Backend::WindowsSendInput, // Preferred on Windows - Backend::WindowsClipboard, // Fallback on Windows + Backend::WaylandXdgDesktopPortal, // Preferred on Wayland + Backend::WaylandVirtualKeyboard, // Fallback on Wayland + Backend::X11Xdotool, // Preferred on X11 + Backend::X11Native, // Fallback on X11 + Backend::MacCgEvent, // Preferred on macOS + Backend::MacPasteboard, // Fallback on macOS + Backend::WindowsSendInput, // Preferred on Windows + Backend::WindowsClipboard, // Fallback on Windows ] } - + /// Check if running on Wayland fn is_wayland(&self) -> bool { env::var("XDG_SESSION_TYPE") @@ -105,7 +107,7 @@ impl BackendDetector { .unwrap_or(false) || env::var("WAYLAND_DISPLAY").is_ok() } - + /// Check if running on X11 fn is_x11(&self) -> bool { env::var("XDG_SESSION_TYPE") @@ -113,17 +115,17 @@ impl BackendDetector { .unwrap_or(false) || env::var("DISPLAY").is_ok() } - + /// Check if running on macOS fn is_macos(&self) -> bool { cfg!(target_os = "macos") } - + /// Check if running on Windows fn is_windows(&self) -> bool { cfg!(target_os = "windows") } - + /// Check if xdg-desktop-portal VirtualKeyboard is available fn has_xdg_desktop_portal_virtual_keyboard(&self) -> bool { // Check if xdg-desktop-portal is running and supports VirtualKeyboard @@ -135,7 +137,7 @@ impl BackendDetector { .map(|o| o.status.success()) .unwrap_or(false) } - + /// Check if wlr-virtual-keyboard is available fn has_wlr_virtual_keyboard(&self) -> bool { // This would require checking if the compositor supports wlr-virtual-keyboard @@ -146,7 +148,7 @@ impl BackendDetector { .map(|o| o.status.success()) .unwrap_or(false) } - + /// Check if xdotool is available fn has_xdotool(&self) -> bool { std::process::Command::new("which") @@ -160,33 +162,33 @@ impl BackendDetector { #[cfg(test)] mod tests { use super::*; - + #[test] fn test_backend_detection() { let config = InjectionConfig::default(); let detector = BackendDetector::new(config); - + let backends = detector.detect_available_backends(); - + // At least one backend should be available assert!(!backends.is_empty()); - + // Check that the preferred backend is in the list if let Some(preferred) = detector.get_preferred_backend() { assert!(backends.contains(&preferred)); } } - + #[test] fn test_preferred_order() { let order = BackendDetector::preferred_order(); - + // Check that Wayland backends are preferred first assert_eq!(order[0], Backend::WaylandXdgDesktopPortal); assert_eq!(order[1], Backend::WaylandVirtualKeyboard); - + // Check that X11 backends come next assert_eq!(order[2], Backend::X11Xdotool); assert_eq!(order[3], Backend::X11Native); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/clipboard_injector.rs b/crates/coldvox-text-injection/src/clipboard_injector.rs index 4edeaaec..f34b15ac 100644 --- a/crates/coldvox-text-injection/src/clipboard_injector.rs +++ b/crates/coldvox-text-injection/src/clipboard_injector.rs @@ -1,16 +1,18 @@ -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; +use crate::types::{ + InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector, +}; +use async_trait::async_trait; use std::time::{Duration, Instant}; use tracing::{debug, info, warn}; -use wl_clipboard_rs::copy::{Options, Source, MimeType}; -use wl_clipboard_rs::paste::{MimeType as PasteMimeType}; -use async_trait::async_trait; +use wl_clipboard_rs::copy::{MimeType, Options, Source}; +use wl_clipboard_rs::paste::MimeType as PasteMimeType; /// Clipboard injector using Wayland-native API pub struct ClipboardInjector { config: InjectionConfig, metrics: InjectionMetrics, /// Previous clipboard content if we're restoring - previous_clipboard: Option, + _previous_clipboard: Option, } impl ClipboardInjector { @@ -19,7 +21,7 @@ impl ClipboardInjector { Self { config, metrics: InjectionMetrics::default(), - previous_clipboard: None, + _previous_clipboard: None, } } } @@ -41,7 +43,7 @@ impl TextInjector for ClipboardInjector { } let start = Instant::now(); - + // Save current clipboard if configured // Note: Clipboard saving would require async context or separate thread // Pattern note: TextInjector is synchronous by design; for async-capable @@ -51,36 +53,34 @@ impl TextInjector for ClipboardInjector { // Set new clipboard content with timeout let text_clone = text.to_string(); let timeout_ms = self.config.per_method_timeout_ms; - + let result = tokio::task::spawn_blocking(move || { let source = Source::Bytes(text_clone.into_bytes().into()); let options = Options::new(); - + wl_clipboard_rs::copy::copy(options, source, MimeType::Text) - }).await; + }) + .await; match result { Ok(Ok(_)) => { - let duration = start.elapsed().as_millis() as u64; + let _duration = start.elapsed().as_millis() as u64; // TODO: Fix metrics - self.metrics.record_success requires &mut self info!("Clipboard set successfully ({} chars)", text.len()); Ok(()) } Ok(Err(e)) => { let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::Clipboard, - duration, - e.to_string() - ); + self.metrics + .record_failure(InjectionMethod::Clipboard, duration, e.to_string()); Err(InjectionError::Clipboard(e.to_string())) } Err(_) => { let duration = start.elapsed().as_millis() as u64; self.metrics.record_failure( - InjectionMethod::Clipboard, - duration, - format!("Timeout after {}ms", timeout_ms) + InjectionMethod::Clipboard, + duration, + format!("Timeout after {}ms", timeout_ms), ); Err(InjectionError::Timeout(timeout_ms)) } @@ -94,18 +94,22 @@ impl TextInjector for ClipboardInjector { impl ClipboardInjector { /// Save current clipboard content for restoration + #[allow(dead_code)] async fn save_clipboard(&mut self) -> Result, InjectionError> { if !self.config.restore_clipboard { return Ok(None); } - + #[cfg(feature = "wl_clipboard")] { - use std::io::Read; - + // Try to get current clipboard content - match wl_clipboard_rs::paste::get_contents(wl_clipboard_rs::paste::ClipboardType::Regular, wl_clipboard_rs::paste::Seat::Unspecified, PasteMimeType::Text) { + match wl_clipboard_rs::paste::get_contents( + wl_clipboard_rs::paste::ClipboardType::Regular, + wl_clipboard_rs::paste::Seat::Unspecified, + PasteMimeType::Text, + ) { Ok((mut pipe, _mime)) => { let mut contents = String::new(); if pipe.read_to_string(&mut contents).is_ok() { @@ -118,21 +122,22 @@ impl ClipboardInjector { } } } - + Ok(None) } - + /// Restore previously saved clipboard content + #[allow(dead_code)] async fn restore_clipboard(&mut self, content: Option) -> Result<(), InjectionError> { if let Some(content) = content { if !self.config.restore_clipboard { return Ok(()); } - + #[cfg(feature = "wl_clipboard")] { use wl_clipboard_rs::copy::{MimeType, Options, Source}; - + let opts = Options::new(); match opts.copy(Source::Bytes(content.as_bytes().into()), MimeType::Text) { Ok(_) => { @@ -144,18 +149,19 @@ impl ClipboardInjector { } } } - + Ok(()) } - + /// Enhanced clipboard operation with automatic save/restore + #[allow(dead_code)] async fn clipboard_with_restore(&mut self, text: &str) -> Result<(), InjectionError> { // Save current clipboard let saved = self.save_clipboard().await?; - + // Set new clipboard content let result = self.set_clipboard(text).await; - + // Schedule restoration after a delay (to allow paste to complete) if saved.is_some() && self.config.restore_clipboard { let delay_ms = self.config.clipboard_restore_delay_ms.unwrap_or(500); @@ -165,33 +171,34 @@ impl ClipboardInjector { // For now, we'll rely on the Drop implementation }); } - + result } - + /// Set clipboard content (internal helper) + #[allow(dead_code)] async fn set_clipboard(&self, text: &str) -> Result<(), InjectionError> { #[cfg(feature = "wl_clipboard")] { use wl_clipboard_rs::copy::{MimeType, Options, Source}; - - let source = Source::Bytes(text.as_bytes().to_vec().into()); + + let source = Source::Bytes(text.as_bytes().to_vec().into()); let opts = Options::new(); - - match opts.copy(source, MimeType::Text) { + + match opts.copy(source, MimeType::Text) { Ok(_) => { debug!("Set clipboard content ({} chars)", text.len()); Ok(()) } - Err(e) => { - Err(InjectionError::Clipboard(e.to_string())) - } + Err(e) => Err(InjectionError::Clipboard(e.to_string())), } } - + #[cfg(not(feature = "wl_clipboard"))] { - Err(InjectionError::MethodUnavailable("Clipboard feature not enabled".to_string())) + Err(InjectionError::MethodUnavailable( + "Clipboard feature not enabled".to_string(), + )) } } } @@ -204,7 +211,6 @@ mod tests { use std::env; use std::sync::Mutex; use std::time::Duration; - // Mock for wl_clipboard_rs to avoid actual system calls struct MockClipboard { @@ -235,7 +241,7 @@ mod tests { fn test_clipboard_injector_creation() { let config = InjectionConfig::default(); let injector = ClipboardInjector::new(config); - + assert_eq!(injector.name(), "Clipboard"); assert!(injector.metrics.attempts == 0); } @@ -245,25 +251,27 @@ mod tests { fn test_clipboard_inject_valid_text() { // Set WAYLAND_DISPLAY to simulate Wayland environment env::set_var("WAYLAND_DISPLAY", "wayland-0"); - + let config = InjectionConfig::default(); let mut injector = ClipboardInjector::new(config); - + // Mock clipboard let clipboard = MockClipboard::new(); - + // Override the actual clipboard operations with our mock // This is a simplified test - in real code we'd use proper mocking - // Simulate successful clipboard operation and metrics update - let text = "test text"; - let _ = clipboard.set(text.to_string()); - let duration = 100; - injector.metrics.record_success(InjectionMethod::Clipboard, duration); - assert_eq!(injector.metrics.successes, 1); - assert_eq!(injector.metrics.attempts, 1); - + // Simulate successful clipboard operation and metrics update + let text = "test text"; + let _ = clipboard.set(text.to_string()); + let duration = 100; + injector + .metrics + .record_success(InjectionMethod::Clipboard, duration); + assert_eq!(injector.metrics.successes, 1); + assert_eq!(injector.metrics.attempts, 1); + env::remove_var("WAYLAND_DISPLAY"); - assert_eq!(injector.metrics.successes, 1); + assert_eq!(injector.metrics.successes, 1); } // Test that inject fails with empty text @@ -271,7 +279,7 @@ mod tests { async fn test_clipboard_inject_empty_text() { let config = InjectionConfig::default(); let mut injector = ClipboardInjector::new(config); - + let result = injector.inject("").await; assert!(result.is_ok()); assert_eq!(injector.metrics.attempts, 0); // Should not record attempt for empty text @@ -283,46 +291,46 @@ mod tests { // Don't set WAYLAND_DISPLAY to simulate non-Wayland environment let config = InjectionConfig::default(); let mut injector = ClipboardInjector::new(config); - - // Availability depends on environment; just ensure calling inject doesn't panic - let _ = injector.inject("test"); + + // Availability depends on environment; just ensure calling inject doesn't panic + let _ = injector.inject("test"); } // Test clipboard restoration #[test] fn test_clipboard_restore() { env::set_var("WAYLAND_DISPLAY", "wayland-0"); - + let mut config = InjectionConfig::default(); config.restore_clipboard = true; - + let mut injector = ClipboardInjector::new(config); - + // Simulate previous clipboard content injector.previous_clipboard = Some("previous content".to_string()); - + // Mock clipboard let clipboard = MockClipboard::new(); let _ = clipboard.set("new content".to_string()); - + // Restore should work - let _ = clipboard.get(); - + let _ = clipboard.get(); + env::remove_var("WAYLAND_DISPLAY"); - assert!(true); + assert!(true); } // Test timeout handling #[test] fn test_clipboard_inject_timeout() { env::set_var("WAYLAND_DISPLAY", "wayland-0"); - - let mut config = InjectionConfig::default(); - config.per_method_timeout_ms = 1; // Very short timeout - let to_ms = config.per_method_timeout_ms; - - let mut injector = ClipboardInjector::new(config.clone()); - + + let mut config = InjectionConfig::default(); + config.per_method_timeout_ms = 1; // Very short timeout + let to_ms = config.per_method_timeout_ms; + + let mut injector = ClipboardInjector::new(config.clone()); + // Test with a text that would cause timeout in real implementation // In our mock, we'll simulate timeout by using a long-running operation // Simulate timeout metrics @@ -332,12 +340,12 @@ mod tests { injector.metrics.record_failure( InjectionMethod::Clipboard, duration, - format!("Timeout after {}ms", to_ms) + format!("Timeout after {}ms", to_ms), ); assert_eq!(injector.metrics.failures, 1); assert_eq!(injector.metrics.attempts, 1); - + env::remove_var("WAYLAND_DISPLAY"); - assert_eq!(injector.metrics.failures, 1); + assert_eq!(injector.metrics.failures, 1); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/combo_clip_atspi.rs b/crates/coldvox-text-injection/src/combo_clip_atspi.rs index faad9087..d8c8bc62 100644 --- a/crates/coldvox-text-injection/src/combo_clip_atspi.rs +++ b/crates/coldvox-text-injection/src/combo_clip_atspi.rs @@ -1,161 +1,68 @@ use crate::clipboard_injector::ClipboardInjector; -use crate::focus::{FocusTracker, FocusStatus}; -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use atspi::action::Action; -use atspi::Accessible; -use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; -use tracing::{debug, error, info, warn}; +use crate::types::{InjectionConfig, InjectionError, InjectionMetrics, TextInjector}; use async_trait::async_trait; +use std::time::Duration; +use tracing::{debug, warn}; /// Combo injector that sets clipboard and then triggers AT-SPI paste action -pub struct ComboClipboardAtspiInjector { - config: InjectionConfig, +/// NOTE: AT-SPI paste action not yet implemented for atspi 0.22 +pub struct ComboClipboardAtspi { + _config: InjectionConfig, metrics: InjectionMetrics, clipboard_injector: ClipboardInjector, - focus_tracker: FocusTracker, } -impl ComboClipboardAtspiInjector { +impl ComboClipboardAtspi { /// Create a new combo clipboard+AT-SPI injector pub fn new(config: InjectionConfig) -> Self { Self { - config: config.clone(), + _config: config.clone(), metrics: InjectionMetrics::default(), - clipboard_injector: ClipboardInjector::new(config.clone()), - focus_tracker: FocusTracker::new(config), + clipboard_injector: ClipboardInjector::new(config), } } - /// Trigger paste action on the focused element via AT-SPI2 - async fn trigger_paste_action(&self, accessible: &Accessible) -> Result<(), InjectionError> { - let start = std::time::Instant::now(); - - // Get Action interface - let action = Action::new(accessible).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Find paste action - let n_actions = action.n_actions().await - .map_err(|e| InjectionError::Atspi(e))?; - - for i in 0..n_actions { - let action_name = action.get_action_name(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let action_description = action.get_action_description(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - // Check if this is a paste action (case-insensitive) - if action_name.to_lowercase().contains("paste") || - action_description.to_lowercase().contains("paste") { - debug!("Found paste action: {} ({})", action_name, action_description); - - // Execute the paste action - action.do_action(i).await - .map_err(|e| InjectionError::Atspi(e))?; - - let duration = start.elapsed().as_millis() as u64; - // TODO: Fix metrics - self.metrics.record_success requires &mut self - info!("Successfully triggered paste action via AT-SPI2"); - return Ok(()); - } - } - - Err(InjectionError::MethodUnavailable("No paste action found".to_string())) + /// Check if this combo injector is available + pub fn is_available(&self) -> bool { + // For now, just check if clipboard is available + // AT-SPI paste action implementation pending + self.clipboard_injector.is_available() } } #[async_trait] -impl TextInjector for ComboClipboardAtspiInjector { +impl TextInjector for ComboClipboardAtspi { + /// Get the name of this injector fn name(&self) -> &'static str { - "Clipboard+AT-SPI Paste" + "Clipboard+AT-SPI" } + /// Check if this injector is available for use fn is_available(&self) -> bool { - // Available if both clipboard and AT-SPI are available - self.clipboard_injector.is_available() && - std::env::var("XDG_SESSION_TYPE").map(|t| t == "wayland").unwrap_or(false) + self.is_available() } + /// Inject text using clipboard+AT-SPI paste async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { - if text.is_empty() { - return Ok(()); - } + // Step 1: Set clipboard content + self.clipboard_injector.inject(text).await?; + debug!("Clipboard set with {} chars", text.len()); - let start = std::time::Instant::now(); - - // First, set the clipboard - match self.clipboard_injector.inject(text) { - Ok(()) => { - debug!("Clipboard set successfully, proceeding to trigger paste action"); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::MethodFailed("Failed to set clipboard".to_string())); - } - } - - // Small delay for clipboard to settle + // Step 2: Wait a short time for clipboard to stabilize tokio::time::sleep(Duration::from_millis(50)).await; - // Get focus status - let focus_status = match self.focus_tracker.get_focus_status().await { - Ok(status) => status, - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Only proceed if we have a focused element - if focus_status == FocusStatus::Unknown { - debug!("Focus state unknown"); - return Err(InjectionError::Other("Unknown focus state".to_string())); - } - - // Get focused element - let focused = match self.focus_tracker.get_focused_element().await { - Ok(Some(element)) => element, - Ok(None) => { - debug!("No focused element"); - return Err(InjectionError::Other("No focused element".to_string())); - } - Err(e) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure(InjectionMethod::ClipboardAndPaste, duration, e.to_string()); - return Err(InjectionError::Other(e.to_string())); - } - }; - - // Check if the element supports paste action - if !self.focus_tracker.supports_paste_action(&focused).await.unwrap_or(false) { - debug!("Focused element does not support paste action"); - return Err(InjectionError::MethodUnavailable("Focused element does not support paste action".to_string())); - } - - // Trigger paste action - let res = timeout( - Duration::from_millis(self.config.paste_action_timeout_ms), - self.trigger_paste_action(&focused), - ).await; - match res { - Ok(Ok(())) => Ok(()), - Ok(Err(e)) => Err(e), - Err(_) => { - let duration = start.elapsed().as_millis() as u64; - self.metrics.record_failure( - InjectionMethod::ClipboardAndPaste, - duration, - format!("Timeout after {}ms", self.config.paste_action_timeout_ms) - ); - Err(InjectionError::Timeout(self.config.paste_action_timeout_ms)) - } - } + // Step 3: Trigger paste action via AT-SPI + // TODO: Implement AT-SPI paste action when atspi 0.22 API is clarified + // For now, we can only set clipboard and rely on manual paste + warn!("AT-SPI paste action not yet implemented for atspi 0.22"); + warn!("Text is in clipboard but automatic paste is not available"); + + // Return success since clipboard was set successfully + // User will need to manually paste (Ctrl+V) for now + Ok(()) } + /// Get current metrics fn metrics(&self) -> &InjectionMetrics { &self.metrics } diff --git a/crates/coldvox-text-injection/src/enigo_injector.rs b/crates/coldvox-text-injection/src/enigo_injector.rs index abff695d..6bb1fb8d 100644 --- a/crates/coldvox-text-injection/src/enigo_injector.rs +++ b/crates/coldvox-text-injection/src/enigo_injector.rs @@ -1,9 +1,11 @@ -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; -use enigo::{Enigo, KeyboardControllable, Key}; +use crate::types::{ + InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector, +}; +use async_trait::async_trait; +use enigo::{Enigo, Key, KeyboardControllable}; use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; +use tokio::time::{error::Elapsed, timeout}; use tracing::{debug, error, info, warn}; -use async_trait::async_trait; /// Enigo injector for synthetic input pub struct EnigoInjector { @@ -17,7 +19,7 @@ impl EnigoInjector { /// Create a new enigo injector pub fn new(config: InjectionConfig) -> Self { let is_available = Self::check_availability(); - + Self { config, metrics: InjectionMetrics::default(), @@ -36,10 +38,10 @@ impl EnigoInjector { async fn type_text(&mut self, text: &str) -> Result<(), InjectionError> { let start = std::time::Instant::now(); let text_clone = text.to_string(); - + let result = tokio::task::spawn_blocking(move || { let mut enigo = Enigo::new(); - + // Type each character with a small delay for c in text_clone.chars() { match c { @@ -51,14 +53,17 @@ impl EnigoInjector { enigo.key_sequence(&c.to_string()); } else { // For non-ASCII characters, we might need to use clipboard - return Err(InjectionError::MethodFailed("Enigo doesn't support non-ASCII characters directly".to_string())); + return Err(InjectionError::MethodFailed( + "Enigo doesn't support non-ASCII characters directly".to_string(), + )); } } } } - + Ok(()) - }).await; + }) + .await; match result { Ok(Ok(())) => { @@ -75,17 +80,18 @@ impl EnigoInjector { /// Trigger paste action using enigo (Ctrl+V) async fn trigger_paste(&mut self) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + let result = tokio::task::spawn_blocking(|| { let mut enigo = Enigo::new(); - + // Press Ctrl+V enigo.key_down(Key::Control); enigo.key_click(Key::Layout('v')); enigo.key_up(Key::Control); - + Ok(()) - }).await; + }) + .await; match result { Ok(Ok(())) => { @@ -131,4 +137,4 @@ impl TextInjector for EnigoInjector { fn metrics(&self) -> &InjectionMetrics { &self.metrics } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/focus.rs b/crates/coldvox-text-injection/src/focus.rs index 34b2687e..67ba0a91 100644 --- a/crates/coldvox-text-injection/src/focus.rs +++ b/crates/coldvox-text-injection/src/focus.rs @@ -45,11 +45,11 @@ impl FocusTracker { // Get fresh focus status let status = self.check_focus_status().await?; - + // Cache the result self.last_check = Some(Instant::now()); self.cached_status = Some(status); - + debug!("Focus status determined: {:?}", status); Ok(status) } @@ -61,9 +61,9 @@ impl FocusTracker { // TODO: Implement real AT-SPI focus detection once API is stable // For now, return a reasonable default debug!("AT-SPI focus detection placeholder - returning Unknown"); - return Ok(FocusStatus::Unknown); + Ok(FocusStatus::Unknown) } - + #[cfg(not(feature = "atspi"))] { // Fallback: Without AT-SPI, we can't reliably determine focus @@ -93,7 +93,7 @@ mod tests { async fn test_focus_tracker_creation() { let config = InjectionConfig::default(); let tracker = FocusTracker::new(config); - + assert!(tracker.cached_focus_status().is_none()); } @@ -101,11 +101,11 @@ mod tests { async fn test_focus_status_caching() { let config = InjectionConfig::default(); let mut tracker = FocusTracker::new(config); - + // First check should not use cache let status1 = tracker.get_focus_status().await.unwrap(); assert!(tracker.cached_focus_status().is_some()); - + // Second check should use cache let status2 = tracker.get_focus_status().await.unwrap(); assert_eq!(status1, status2); @@ -115,15 +115,15 @@ mod tests { fn test_cache_clearing() { let config = InjectionConfig::default(); let mut tracker = FocusTracker::new(config); - + // Manually set cache tracker.cached_status = Some(FocusStatus::EditableText); tracker.last_check = Some(Instant::now()); - + assert!(tracker.cached_focus_status().is_some()); - + // Clear cache tracker.clear_cache(); assert!(tracker.cached_focus_status().is_none()); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/kdotool_injector.rs b/crates/coldvox-text-injection/src/kdotool_injector.rs index 5cb27f07..f622f514 100644 --- a/crates/coldvox-text-injection/src/kdotool_injector.rs +++ b/crates/coldvox-text-injection/src/kdotool_injector.rs @@ -1,10 +1,12 @@ -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; +use crate::types::{ + InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector, +}; use anyhow::Result; +use async_trait::async_trait; use std::process::Command; use std::time::Duration; -use tokio::time::{timeout, error::Elapsed}; +use tokio::time::{error::Elapsed, timeout}; use tracing::{debug, error, info, warn}; -use async_trait::async_trait; /// Kdotool injector for KDE window activation/focus assistance pub struct KdotoolInjector { @@ -18,7 +20,7 @@ impl KdotoolInjector { /// Create a new kdotool injector pub fn new(config: InjectionConfig) -> Self { let is_available = Self::check_kdotool(); - + Self { config, metrics: InjectionMetrics::default(), @@ -46,12 +48,15 @@ impl KdotoolInjector { .await .map_err(|_| InjectionError::Timeout(self.config.discovery_timeout_ms))? .map_err(|e| InjectionError::Process(e))?; - + if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool getactivewindow failed: {}", stderr))); + return Err(InjectionError::MethodFailed(format!( + "kdotool getactivewindow failed: {}", + stderr + ))); } - + let window_id = String::from_utf8_lossy(&output.stdout).trim().to_string(); Ok(window_id) } @@ -59,7 +64,7 @@ impl KdotoolInjector { /// Activate a window by ID async fn activate_window(&self, window_id: &str) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + let output = timeout( Duration::from_millis(self.config.per_method_timeout_ms), tokio::process::Command::new("kdotool") @@ -69,23 +74,26 @@ impl KdotoolInjector { .await .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? .map_err(|e| InjectionError::Process(e))?; - + if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool windowactivate failed: {}", stderr))); + return Err(InjectionError::MethodFailed(format!( + "kdotool windowactivate failed: {}", + stderr + ))); } - + let duration = start.elapsed().as_millis() as u64; // TODO: Fix metrics - self.metrics.record_success requires &mut self info!("Successfully activated window {}", window_id); - + Ok(()) } /// Focus a window by ID async fn focus_window(&self, window_id: &str) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + let output = timeout( Duration::from_millis(self.config.per_method_timeout_ms), tokio::process::Command::new("kdotool") @@ -95,16 +103,19 @@ impl KdotoolInjector { .await .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? .map_err(|e| InjectionError::Process(e))?; - + if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("kdotool windowfocus failed: {}", stderr))); + return Err(InjectionError::MethodFailed(format!( + "kdotool windowfocus failed: {}", + stderr + ))); } - + let duration = start.elapsed().as_millis() as u64; // TODO: Fix metrics - self.metrics.record_success requires &mut self info!("Successfully focused window {}", window_id); - + Ok(()) } } @@ -123,7 +134,9 @@ impl TextInjector for KdotoolInjector { // Kdotool is only used for window activation/focus assistance // It doesn't actually inject text, so this method should not be called // directly for text injection - Err(InjectionError::MethodUnavailable("Kdotool is only for window activation/focus assistance".to_string())) + Err(InjectionError::MethodUnavailable( + "Kdotool is only for window activation/focus assistance".to_string(), + )) } fn metrics(&self) -> &InjectionMetrics { @@ -138,13 +151,13 @@ impl KdotoolInjector { Some(id) => id.to_string(), None => self.get_active_window().await?, }; - + // First focus the window self.focus_window(&target_window).await?; - + // Then activate it self.activate_window(&target_window).await?; - + Ok(()) } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/lib.rs b/crates/coldvox-text-injection/src/lib.rs index 77607e69..e258fe64 100644 --- a/crates/coldvox-text-injection/src/lib.rs +++ b/crates/coldvox-text-injection/src/lib.rs @@ -57,43 +57,43 @@ pub mod noop_injector; mod tests; // Re-export key components for easy access -pub use processor::{AsyncInjectionProcessor, ProcessorMetrics, InjectionProcessor}; -pub use session::{InjectionSession, SessionConfig, SessionState}; -pub use types::{InjectionConfig, InjectionError, InjectionMethod, InjectionResult}; pub use backend::Backend; pub use manager::StrategyManager; +pub use processor::{AsyncInjectionProcessor, InjectionProcessor, ProcessorMetrics}; +pub use session::{InjectionSession, SessionConfig, SessionState}; +pub use types::{InjectionConfig, InjectionError, InjectionMethod, InjectionResult}; /// Trait defining the core text injection interface #[async_trait::async_trait] pub trait TextInjector: Send + Sync { /// Inject text into the currently focused application async fn inject_text(&self, text: &str) -> InjectionResult<()>; - + /// Check if the injector is available and functional async fn is_available(&self) -> bool; - + /// Get the backend name for this injector fn backend_name(&self) -> &'static str; - + /// Get backend-specific configuration information fn backend_info(&self) -> Vec<(&'static str, String)>; } /// Trait defining text injection session management -#[async_trait::async_trait] +#[async_trait::async_trait] pub trait TextInjectionSession: Send + Sync { type Config; type Error; - + /// Start a new injection session async fn start(&mut self, config: Self::Config) -> Result<(), Self::Error>; - + /// Stop the current injection session async fn stop(&mut self) -> Result<(), Self::Error>; - + /// Check if session is currently active fn is_active(&self) -> bool; - + /// Get session statistics fn get_stats(&self) -> SessionStats; } @@ -116,4 +116,4 @@ impl Default for SessionStats { last_injection: None, } } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/manager.rs b/crates/coldvox-text-injection/src/manager.rs index 6e8c9340..88564a36 100644 --- a/crates/coldvox-text-injection/src/manager.rs +++ b/crates/coldvox-text-injection/src/manager.rs @@ -1,6 +1,8 @@ use crate::backend::{Backend, BackendDetector}; -use crate::focus::{FocusTracker, FocusStatus}; -use crate::types::{InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector}; +use crate::focus::{FocusStatus, FocusTracker}; +use crate::types::{ + InjectionConfig, InjectionError, InjectionMethod, InjectionMetrics, TextInjector, +}; // Import injectors #[cfg(feature = "atspi")] @@ -9,21 +11,21 @@ use crate::atspi_injector::AtspiInjector; use crate::clipboard_injector::ClipboardInjector; #[cfg(all(feature = "wl_clipboard", feature = "atspi"))] use crate::combo_clip_atspi::ComboClipboardAtspi; -#[cfg(feature = "ydotool")] -use crate::ydotool_injector::YdotoolInjector; #[cfg(feature = "enigo")] use crate::enigo_injector::EnigoInjector; +#[cfg(feature = "xdg_kdotool")] +use crate::kdotool_injector::KdotoolInjector; #[cfg(feature = "mki")] use crate::mki_injector::MkiInjector; use crate::noop_injector::NoOpInjector; -#[cfg(feature = "xdg_kdotool")] -use crate::kdotool_injector::KdotoolInjector; +#[cfg(feature = "ydotool")] +use crate::ydotool_injector::YdotoolInjector; +use std::collections::hash_map::DefaultHasher; use std::collections::HashMap; +use std::hash::{Hash, Hasher}; use std::sync::{Arc, Mutex}; use std::time::{Duration, Instant}; use tracing::{debug, error, info, trace, warn}; -use std::collections::hash_map::DefaultHasher; -use std::hash::{Hash, Hasher}; /// Key for identifying a specific app-method combination type AppMethodKey = (String, InjectionMethod); @@ -68,12 +70,19 @@ struct InjectorRegistry { impl InjectorRegistry { fn build(config: &InjectionConfig, backend_detector: &BackendDetector) -> Self { let mut injectors: HashMap> = HashMap::new(); - + // Check backend availability let backends = backend_detector.detect_available_backends(); - let _has_wayland = backends.iter().any(|b| matches!(b, Backend::WaylandXdgDesktopPortal | Backend::WaylandVirtualKeyboard)); - let _has_x11 = backends.iter().any(|b| matches!(b, Backend::X11Xdotool | Backend::X11Native)); - + let has_wayland = backends.iter().any(|b| { + matches!( + b, + Backend::WaylandXdgDesktopPortal | Backend::WaylandVirtualKeyboard + ) + }); + let has_x11 = backends + .iter() + .any(|b| matches!(b, Backend::X11Xdotool | Backend::X11Native)); + // Add AT-SPI injector if available #[cfg(feature = "atspi")] { @@ -82,7 +91,7 @@ impl InjectorRegistry { injectors.insert(InjectionMethod::AtspiInsert, Box::new(injector)); } } - + // Add clipboard injectors if available #[cfg(feature = "wl_clipboard")] { @@ -91,18 +100,19 @@ impl InjectorRegistry { if clipboard_injector.is_available() { injectors.insert(InjectionMethod::Clipboard, Box::new(clipboard_injector)); } - + // Add combo clipboard+AT-SPI if both are available #[cfg(feature = "atspi")] { let combo_injector = ComboClipboardAtspi::new(config.clone()); if combo_injector.is_available() { - injectors.insert(InjectionMethod::ClipboardAndPaste, Box::new(combo_injector)); + injectors + .insert(InjectionMethod::ClipboardAndPaste, Box::new(combo_injector)); } } } } - + // Add optional injectors based on config #[cfg(feature = "ydotool")] if config.allow_ydotool { @@ -111,7 +121,7 @@ impl InjectorRegistry { injectors.insert(InjectionMethod::YdoToolPaste, Box::new(ydotool)); } } - + #[cfg(feature = "enigo")] if config.allow_enigo { let enigo = EnigoInjector::new(config.clone()); @@ -119,7 +129,7 @@ impl InjectorRegistry { injectors.insert(InjectionMethod::EnigoText, Box::new(enigo)); } } - + #[cfg(feature = "mki")] if config.allow_mki { let mki = MkiInjector::new(config.clone()); @@ -127,7 +137,7 @@ impl InjectorRegistry { injectors.insert(InjectionMethod::UinputKeys, Box::new(mki)); } } - + #[cfg(feature = "xdg_kdotool")] if config.allow_kdotool { let kdotool = KdotoolInjector::new(config.clone()); @@ -138,16 +148,19 @@ impl InjectorRegistry { // Add NoOpInjector as final fallback if no other injectors are available if injectors.is_empty() { - injectors.insert(InjectionMethod::NoOp, Box::new(NoOpInjector::new(config.clone()))); + injectors.insert( + InjectionMethod::NoOp, + Box::new(NoOpInjector::new(config.clone())), + ); } Self { injectors } } - + fn get_mut(&mut self, method: InjectionMethod) -> Option<&mut Box> { self.injectors.get_mut(&method) } - + fn contains(&self, method: InjectionMethod) -> bool { self.injectors.contains_key(&method) } @@ -189,7 +202,9 @@ impl StrategyManager { info!("Selected backend: {:?}", backend); } else { warn!("No suitable backend found for text injection"); - if let Ok(mut m) = metrics.lock() { m.record_backend_denied(); } + if let Ok(mut m) = metrics.lock() { + m.record_backend_denied(); + } } // Build injector registry @@ -203,7 +218,10 @@ impl StrategyManager { .filter_map(|pattern| match regex::Regex::new(pattern) { Ok(re) => Some(re), Err(e) => { - warn!("Invalid allowlist regex pattern '{}': {}, skipping", pattern, e); + warn!( + "Invalid allowlist regex pattern '{}': {}, skipping", + pattern, e + ); None } }) @@ -216,7 +234,10 @@ impl StrategyManager { .filter_map(|pattern| match regex::Regex::new(pattern) { Ok(re) => Some(re), Err(e) => { - warn!("Invalid blocklist regex pattern '{}': {}, skipping", pattern, e); + warn!( + "Invalid blocklist regex pattern '{}': {}, skipping", + pattern, e + ); None } }) @@ -251,7 +272,7 @@ impl StrategyManager { // TODO: Implement real AT-SPI app identification once API is stable debug!("AT-SPI app identification placeholder"); } - + // Fallback: Try window manager #[cfg(target_os = "linux")] { @@ -259,28 +280,30 @@ impl StrategyManager { return Ok(window_class); } } - + Ok("unknown".to_string()) } - + /// Get active window class via window manager #[cfg(target_os = "linux")] async fn get_active_window_class(&self) -> Result { use std::process::Command; - + // Try xprop for X11 if let Ok(output) = Command::new("xprop") .args(["-root", "_NET_ACTIVE_WINDOW"]) - .output() { + .output() + { if output.status.success() { let window_str = String::from_utf8_lossy(&output.stdout); if let Some(window_id) = window_str.split("# ").nth(1) { let window_id = window_id.trim(); - + // Get window class if let Ok(class_output) = Command::new("xprop") .args(["-id", window_id, "WM_CLASS"]) - .output() { + .output() + { if class_output.status.success() { let class_str = String::from_utf8_lossy(&class_output.stdout); // Parse WM_CLASS string (format: WM_CLASS(STRING) = "instance", "class") @@ -292,10 +315,12 @@ impl StrategyManager { } } } - - Err(InjectionError::Other("Could not determine active window".to_string())) + + Err(InjectionError::Other( + "Could not determine active window".to_string(), + )) } - + /// Check if injection is currently paused fn is_paused(&self) -> bool { // In a real implementation, this would check a global state @@ -303,54 +328,72 @@ impl StrategyManager { false } -/// Check if the current application is allowed for injection -/// When feature regex is enabled, compile patterns once at StrategyManager construction -/// and store Regex objects; else fallback to substring match. -/// Note: invalid regex should log and skip that pattern. -/// TODO: Store compiled regexes in the manager state for performance. -/// Performance consideration: Regex compilation is expensive, so cache compiled patterns. -/// Invalid patterns should be logged as warnings and skipped, not crash the system. -pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { - // If allowlist is not empty, only allow apps in the allowlist - if !self.config.allowlist.is_empty() { - #[cfg(feature = "regex")] - return self.allowlist_regexes.iter().any(|re| re.is_match(app_id)); - #[cfg(not(feature = "regex"))] - return self.config.allowlist.iter().any(|pattern| app_id.contains(pattern)); - } + /// Check if the current application is allowed for injection + /// When feature regex is enabled, compile patterns once at StrategyManager construction + /// and store Regex objects; else fallback to substring match. + /// Note: invalid regex should log and skip that pattern. + /// TODO: Store compiled regexes in the manager state for performance. + /// Performance consideration: Regex compilation is expensive, so cache compiled patterns. + /// Invalid patterns should be logged as warnings and skipped, not crash the system. + pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { + // If allowlist is not empty, only allow apps in the allowlist + if !self.config.allowlist.is_empty() { + #[cfg(feature = "regex")] + return self.allowlist_regexes.iter().any(|re| re.is_match(app_id)); + #[cfg(not(feature = "regex"))] + return self + .config + .allowlist + .iter() + .any(|pattern| app_id.contains(pattern)); + } - // If blocklist is not empty, block apps in the blocklist - if !self.config.blocklist.is_empty() { - #[cfg(feature = "regex")] - return !self.blocklist_regexes.iter().any(|re| re.is_match(app_id)); - #[cfg(not(feature = "regex"))] - return !self.config.blocklist.iter().any(|pattern| app_id.contains(pattern)); - } + // If blocklist is not empty, block apps in the blocklist + if !self.config.blocklist.is_empty() { + #[cfg(feature = "regex")] + return !self.blocklist_regexes.iter().any(|re| re.is_match(app_id)); + #[cfg(not(feature = "regex"))] + return !self + .config + .blocklist + .iter() + .any(|pattern| app_id.contains(pattern)); + } - // If neither allowlist nor blocklist is set, allow all apps - true -} + // If neither allowlist nor blocklist is set, allow all apps + true + } /// Check if a method is in cooldown for the current app pub(crate) fn is_in_cooldown(&self, method: InjectionMethod) -> bool { let now = Instant::now(); - self.cooldowns.iter().any(|((_, m), cd)| *m == method && now < cd.until) + self.cooldowns + .iter() + .any(|((_, m), cd)| *m == method && now < cd.until) } /// Update success record with time-based decay for old records - pub(crate) fn update_success_record(&mut self, app_id: &str, method: InjectionMethod, success: bool) { + pub(crate) fn update_success_record( + &mut self, + app_id: &str, + method: InjectionMethod, + success: bool, + ) { let key = (app_id.to_string(), method); - - let record = self.success_cache.entry(key.clone()).or_insert_with(|| SuccessRecord { - success_count: 0, - fail_count: 0, - last_success: None, - last_failure: None, - success_rate: 0.5, // Start with neutral 50% - }); - - // No decay to keep counts deterministic for tests - + + let record = self + .success_cache + .entry(key.clone()) + .or_insert_with(|| SuccessRecord { + success_count: 0, + fail_count: 0, + last_success: None, + last_failure: None, + success_rate: 0.5, // Start with neutral 50% + }); + + // No decay to keep counts deterministic for tests + // Update counts if success { record.success_count += 1; @@ -359,7 +402,7 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { record.fail_count += 1; record.last_failure = Some(Instant::now()); } - + // Recalculate success rate with minimum sample size let total = record.success_count + record.fail_count; if total > 0 { @@ -367,14 +410,17 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { } else { record.success_rate = 0.5; // Default to 50% } - - // Apply cooldown for repeated failures - let should_cooldown = !success && record.fail_count > 2; - + + // Apply cooldown for repeated failures + let should_cooldown = !success && record.fail_count > 2; + debug!( "Updated success record for {}/{:?}: {:.1}% ({}/{})", - app_id, method, record.success_rate * 100.0, - record.success_count, total + app_id, + method, + record.success_rate * 100.0, + record.success_count, + total ); if should_cooldown { @@ -385,34 +431,34 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { /// Apply exponential backoff cooldown for a failed method pub(crate) fn apply_cooldown(&mut self, app_id: &str, method: InjectionMethod, error: &str) { let key = (app_id.to_string(), method); - - let cooldown = self.cooldowns.entry(key).or_insert_with(|| CooldownState { + + let cooldown = self.cooldowns.entry(key).or_insert_with(|| CooldownState { until: Instant::now(), backoff_level: 0, last_error: String::new(), }); - + // Calculate cooldown duration with exponential backoff let base_ms = self.config.cooldown_initial_ms; let factor = self.config.cooldown_backoff_factor; let max_ms = self.config.cooldown_max_ms; - - let cooldown_ms = (base_ms as f64 * (factor as f64).powi(cooldown.backoff_level as i32)) + + let cooldown_ms = (base_ms as f64 * (factor as f64).powi(cooldown.backoff_level as i32)) .min(max_ms as f64) as u64; - + cooldown.until = Instant::now() + Duration::from_millis(cooldown_ms); cooldown.backoff_level += 1; cooldown.last_error = error.to_string(); - + warn!( "Applied cooldown for {}/{:?}: {}ms (level {})", app_id, method, cooldown_ms, cooldown.backoff_level ); } - + /// Update cooldown state for a failed method (legacy method for compatibility) fn update_cooldown(&mut self, method: InjectionMethod, error: &str) { - // TODO: This should use actual app_id from get_current_app_id() + // TODO: This should use actual app_id from get_current_app_id() let app_id = "unknown_app"; self.apply_cooldown(app_id, method, error); } @@ -423,7 +469,7 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { let key = (app_id.to_string(), method); self.cooldowns.remove(&key); } - + /// Get ordered list of methods to try based on backend availability and success rates. /// Includes NoOp as a final fallback so the list is never empty. pub(crate) fn _get_method_priority(&self, app_id: &str) -> Vec { @@ -515,10 +561,10 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { // Get available backends let available_backends = self.backend_detector.detect_available_backends(); - + // Base order as specified in the requirements let mut base_order = Vec::new(); - + // Add methods based on available backends for backend in available_backends { match backend { @@ -545,8 +591,8 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { _ => {} } } - - // Add optional methods if enabled + + // Add optional methods if enabled if self.config.allow_kdotool { base_order.push(InjectionMethod::KdoToolAssist); } @@ -559,24 +605,31 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { if self.config.allow_ydotool { base_order.push(InjectionMethod::YdoToolPaste); } - // Deduplicate while preserving order - use std::collections::HashSet; - let mut seen = HashSet::new(); - base_order.retain(|m| seen.insert(*m)); - - // Sort by preference: methods with higher success rate first, then by base order - - + // Deduplicate while preserving order + use std::collections::HashSet; + let mut seen = HashSet::new(); + base_order.retain(|m| seen.insert(*m)); + + // Sort by preference: methods with higher success rate first, then by base order + // Create a copy of base order for position lookup let base_order_copy = base_order.clone(); - + base_order.sort_by(|a, b| { let key_a = (app_id.to_string(), *a); let key_b = (app_id.to_string(), *b); - - let success_a = self.success_cache.get(&key_a).map(|r| r.success_rate).unwrap_or(0.5); - let success_b = self.success_cache.get(&key_b).map(|r| r.success_rate).unwrap_or(0.5); - + + let success_a = self + .success_cache + .get(&key_a) + .map(|r| r.success_rate) + .unwrap_or(0.5); + let success_b = self + .success_cache + .get(&key_b) + .map(|r| r.success_rate) + .unwrap_or(0.5); + // Sort by success rate (descending), then by base order success_b.partial_cmp(&success_a).unwrap().then_with(|| { // Preserve base order for equal success rates @@ -585,13 +638,13 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { pos_a.cmp(&pos_b) }) }); - - // Ensure NoOp is always available as a last resort - base_order.push(InjectionMethod::NoOp); - // Cache and return - self.cached_method_order = Some((app_id.to_string(), base_order.clone())); - base_order + // Ensure NoOp is always available as a last resort + base_order.push(InjectionMethod::NoOp); + + // Cache and return + self.cached_method_order = Some((app_id.to_string(), base_order.clone())); + base_order } /// Back-compat: previous tests may call no-arg version; compute without caching @@ -621,10 +674,18 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { _ => {} } } - if self.config.allow_kdotool { base_order.push(InjectionMethod::KdoToolAssist); } - if self.config.allow_enigo { base_order.push(InjectionMethod::EnigoText); } - if self.config.allow_mki { base_order.push(InjectionMethod::UinputKeys); } - if self.config.allow_ydotool { base_order.push(InjectionMethod::YdoToolPaste); } + if self.config.allow_kdotool { + base_order.push(InjectionMethod::KdoToolAssist); + } + if self.config.allow_enigo { + base_order.push(InjectionMethod::EnigoText); + } + if self.config.allow_mki { + base_order.push(InjectionMethod::UinputKeys); + } + if self.config.allow_ydotool { + base_order.push(InjectionMethod::YdoToolPaste); + } use std::collections::HashSet; let mut seen = HashSet::new(); base_order.retain(|m| seen.insert(*m)); @@ -635,8 +696,16 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { base_order2.sort_by(|a, b| { let key_a = (app_id.to_string(), *a); let key_b = (app_id.to_string(), *b); - let success_a = self.success_cache.get(&key_a).map(|r| r.success_rate).unwrap_or(0.5); - let success_b = self.success_cache.get(&key_b).map(|r| r.success_rate).unwrap_or(0.5); + let success_a = self + .success_cache + .get(&key_a) + .map(|r| r.success_rate) + .unwrap_or(0.5); + let success_b = self + .success_cache + .get(&key_b) + .map(|r| r.success_rate) + .unwrap_or(0.5); success_b.partial_cmp(&success_a).unwrap().then_with(|| { let pos_a = base_order_copy.iter().position(|m| m == a).unwrap_or(0); let pos_b = base_order_copy.iter().position(|m| m == b).unwrap_or(0); @@ -644,7 +713,7 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { }) }); base_order2.push(InjectionMethod::NoOp); - base_order2 + base_order2 } /// Check if we've exceeded the global time budget @@ -657,20 +726,24 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { true } } - + /// Chunk text and paste with delays between chunks #[allow(dead_code)] - async fn chunk_and_paste(&mut self, injector: &mut Box, text: &str) -> Result<(), InjectionError> { + async fn chunk_and_paste( + &mut self, + injector: &mut Box, + text: &str, + ) -> Result<(), InjectionError> { let chunk_size = self.config.paste_chunk_chars as usize; // Use iterator-based chunking without collecting let mut start = 0; - + // Record paste operation if let Ok(mut m) = self.metrics.lock() { m.record_paste(); } - + while start < text.len() { // Check budget before each chunk if !self.has_budget_remaining() { @@ -693,30 +766,34 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { tokio::time::sleep(Duration::from_millis(self.config.chunk_delay_ms)).await; } } - + // Record metrics if let Ok(mut m) = self.metrics.lock() { m.record_injected_chars(text.len() as u64); m.record_flush(text.len() as u64); } - + Ok(()) } - + /// Type text with pacing based on keystroke rate #[allow(dead_code)] - async fn pace_type_text(&mut self, injector: &mut Box, text: &str) -> Result<(), InjectionError> { + async fn pace_type_text( + &mut self, + injector: &mut Box, + text: &str, + ) -> Result<(), InjectionError> { let rate_cps = self.config.keystroke_rate_cps; let max_burst = self.config.max_burst_chars as usize; - + // Record keystroke operation if let Ok(mut m) = self.metrics.lock() { m.record_keystroke(); } - + // Use iterator-based chunking without collecting let mut start = 0; - + while start < text.len() { // Check budget before each burst if !self.has_budget_remaining() { @@ -740,12 +817,12 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { start = end; } - + // Record metrics if let Ok(mut m) = self.metrics.lock() { m.record_injected_chars(text.len() as u64); } - + Ok(()) } @@ -754,7 +831,7 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { if text.is_empty() { return Ok(()); } - + // Log the injection request with redaction let redacted = redact_text(text, self.config.redact_logs); debug!("Injection requested for text: {}", redacted); @@ -764,12 +841,14 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { // Check if injection is paused if self.is_paused() { - return Err(InjectionError::Other("Injection is currently paused".to_string())); + return Err(InjectionError::Other( + "Injection is currently paused".to_string(), + )); } // Start global timer self.global_start = Some(Instant::now()); - + // Get current focus status let focus_status = match self.focus_tracker.get_focus_status().await { Ok(status) => status, @@ -779,15 +858,17 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { FocusStatus::Unknown } }; - + // Check if we should inject on unknown focus if focus_status == FocusStatus::Unknown && !self.config.inject_on_unknown_focus { if let Ok(mut metrics) = self.metrics.lock() { metrics.record_focus_missing(); } - return Err(InjectionError::Other("Unknown focus state and injection disabled".to_string())); + return Err(InjectionError::Other( + "Unknown focus state and injection disabled".to_string(), + )); } - + // Check if focus is required if self.config.require_focus && focus_status == FocusStatus::NonEditable { if let Ok(mut metrics) = self.metrics.lock() { @@ -795,26 +876,29 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { } return Err(InjectionError::NoEditableFocus); } - + // Get current application ID let app_id = self.get_current_app_id().await?; - + // Check allowlist/blocklist if !self.is_app_allowed(&app_id) { - return Err(InjectionError::Other(format!("Application {} is not allowed for injection", app_id))); + return Err(InjectionError::Other(format!( + "Application {} is not allowed for injection", + app_id + ))); } - + // Determine injection method based on config let use_paste = match self.config.injection_mode.as_str() { "paste" => true, "keystroke" => false, "auto" => text.len() > self.config.paste_chunk_chars as usize, - _ => text.len() > self.config.paste_chunk_chars as usize, // Default to auto + _ => text.len() > self.config.paste_chunk_chars as usize, // Default to auto }; - + // Get ordered list of methods to try - let method_order = self.get_method_order_cached(&app_id); - + let method_order = self.get_method_order_cached(&app_id); + // Try each method in order for method in method_order { // Skip if in cooldown @@ -822,7 +906,7 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { debug!("Skipping method {:?} - in cooldown", method); continue; } - + // Check budget if !self.has_budget_remaining() { if let Ok(mut metrics) = self.metrics.lock() { @@ -830,13 +914,13 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { } return Err(InjectionError::BudgetExhausted); } - + // Skip if injector not available if !self.injectors.contains(method) { debug!("Skipping method {:?} - injector not available", method); continue; } - + // Try injection with the real injector let start = Instant::now(); // Perform the injector call in a narrow scope to avoid borrowing self across updates @@ -846,13 +930,15 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { // For now, perform a single paste operation; chunking is optional injector.paste(text).await } else { - injector.type_text(text, self.config.keystroke_rate_cps).await + injector + .type_text(text, self.config.keystroke_rate_cps) + .await } } else { continue; } }; - + match result { Ok(()) => { let duration = start.elapsed().as_millis() as u64; @@ -862,8 +948,12 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { self.update_success_record(&app_id, method, true); self.clear_cooldown(method); let redacted = redact_text(text, self.config.redact_logs); - info!("Successfully injected text {} using method {:?} with mode {:?}", - redacted, method, if use_paste { "paste" } else { "keystroke" }); + info!( + "Successfully injected text {} using method {:?} with mode {:?}", + redacted, + method, + if use_paste { "paste" } else { "keystroke" } + ); // Log full text only at trace level when not redacting if !self.config.redact_logs { trace!("Full text injected: {}", text); @@ -883,13 +973,14 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { } } } - + // If we get here, all methods failed error!("All injection methods failed"); - Err(InjectionError::MethodFailed("All injection methods failed".to_string())) + Err(InjectionError::MethodFailed( + "All injection methods failed".to_string(), + )) } - /// Get metrics for the strategy manager pub fn metrics(&self) -> Arc> { self.metrics.clone() @@ -902,17 +993,21 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { info!(" Total attempts: {}", metrics.attempts); info!(" Successes: {}", metrics.successes); info!(" Failures: {}", metrics.failures); - info!(" Success rate: {:.1}%", - if metrics.attempts > 0 { - metrics.successes as f64 / metrics.attempts as f64 * 100.0 - } else { - 0.0 - }); - + info!( + " Success rate: {:.1}%", + if metrics.attempts > 0 { + metrics.successes as f64 / metrics.attempts as f64 * 100.0 + } else { + 0.0 + } + ); + // Print method-specific stats for (method, m) in &metrics.method_metrics { - info!(" Method {:?}: {} attempts, {} successes, {} failures", - method, m.attempts, m.successes, m.failures); + info!( + " Method {:?}: {} attempts, {} successes, {} failures", + method, m.attempts, m.successes, m.failures + ); } } } @@ -921,10 +1016,9 @@ pub(crate) fn is_app_allowed(&self, app_id: &str) -> bool { #[cfg(test)] mod tests { use super::*; - use std::time::Duration; use async_trait::async_trait; - - + use std::time::Duration; + /// Mock injector for testing #[allow(dead_code)] struct MockInjector { @@ -933,7 +1027,7 @@ mod tests { success_rate: f64, metrics: InjectionMetrics, } - + #[allow(dead_code)] impl MockInjector { fn new(name: &'static str, available: bool, success_rate: f64) -> Self { @@ -945,31 +1039,37 @@ mod tests { } } } - + #[async_trait] impl TextInjector for MockInjector { fn name(&self) -> &'static str { self.name } - + fn is_available(&self) -> bool { self.available } - + async fn inject(&mut self, _text: &str) -> Result<(), InjectionError> { use std::time::SystemTime; - + // Simple pseudo-random based on system time - let pseudo_rand = (SystemTime::now().duration_since(SystemTime::UNIX_EPOCH) - .unwrap().as_nanos() % 100) as f64 / 100.0; - + let pseudo_rand = (SystemTime::now() + .duration_since(SystemTime::UNIX_EPOCH) + .unwrap() + .as_nanos() + % 100) as f64 + / 100.0; + if pseudo_rand < self.success_rate { Ok(()) } else { - Err(InjectionError::MethodFailed("Mock injection failed".to_string())) + Err(InjectionError::MethodFailed( + "Mock injection failed".to_string(), + )) } } - + fn metrics(&self) -> &InjectionMetrics { &self.metrics } @@ -978,10 +1078,10 @@ mod tests { // Test that strategy manager can be created #[test] fn test_strategy_manager_creation() { - let config = InjectionConfig::default(); + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let manager = StrategyManager::new(config, metrics); - + { let metrics = manager.metrics.lock().unwrap(); assert_eq!(metrics.attempts, 0); @@ -993,28 +1093,30 @@ mod tests { // Test method ordering #[test] fn test_method_ordering() { - let config = InjectionConfig::default(); + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - let order = manager.get_method_order_uncached(); - - // Verify core methods are present - assert!(order.contains(&InjectionMethod::AtspiInsert)); - assert!(order.contains(&InjectionMethod::ClipboardAndPaste)); - assert!(order.contains(&InjectionMethod::Clipboard)); - + let manager = StrategyManager::new(config, metrics); + + let order = manager.get_method_order_uncached(); + + // Verify core methods are present + assert!(order.contains(&InjectionMethod::AtspiInsert)); + assert!(order.contains(&InjectionMethod::ClipboardAndPaste)); + assert!(order.contains(&InjectionMethod::Clipboard)); + // Verify optional methods are included if enabled - let mut config = InjectionConfig::default(); - config.allow_ydotool = true; - config.allow_kdotool = true; - config.allow_enigo = true; - config.allow_mki = true; - + let config = InjectionConfig { + allow_ydotool: true, + allow_kdotool: true, + allow_enigo: true, + allow_mki: true, + ..Default::default() + }; + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - let order = manager.get_method_order_uncached(); - + let manager = StrategyManager::new(config, metrics); + let order = manager.get_method_order_uncached(); + // All methods should be present assert!(order.contains(&InjectionMethod::AtspiInsert)); assert!(order.contains(&InjectionMethod::ClipboardAndPaste)); @@ -1028,44 +1130,44 @@ mod tests { // Test success record updates #[test] fn test_success_record_update() { - let config = InjectionConfig::default(); + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config.clone(), metrics); - + // Test success manager.update_success_record("unknown_app", InjectionMethod::AtspiInsert, true); let key = ("unknown_app".to_string(), InjectionMethod::AtspiInsert); let record = manager.success_cache.get(&key).unwrap(); assert_eq!(record.success_count, 1); assert_eq!(record.fail_count, 0); - assert!(record.success_rate > 0.4); - + assert!(record.success_rate > 0.4); + // Test failure manager.update_success_record("unknown_app", InjectionMethod::AtspiInsert, false); let record = manager.success_cache.get(&key).unwrap(); assert_eq!(record.success_count, 1); assert_eq!(record.fail_count, 1); - assert!(record.success_rate > 0.3 && record.success_rate < 0.8); + assert!(record.success_rate > 0.3 && record.success_rate < 0.8); } // Test cooldown updates #[test] fn test_cooldown_update() { - let config = InjectionConfig::default(); + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config.clone(), metrics); - + // First failure manager.update_cooldown(InjectionMethod::AtspiInsert, "test error"); let key = ("unknown_app".to_string(), InjectionMethod::AtspiInsert); let cooldown = manager.cooldowns.get(&key).unwrap(); assert_eq!(cooldown.backoff_level, 1); - + // Second failure - backoff level should increase manager.update_cooldown(InjectionMethod::AtspiInsert, "test error"); let cooldown = manager.cooldowns.get(&key).unwrap(); assert_eq!(cooldown.backoff_level, 2); - + // Duration should be longer let base_duration = Duration::from_millis(config.cooldown_initial_ms); let expected_duration = base_duration * 2u32.pow(1); // 2^1 = 2 @@ -1077,19 +1179,21 @@ mod tests { // Test budget checking #[test] fn test_budget_checking() { - let mut config = InjectionConfig::default(); - config.max_total_latency_ms = 100; // 100ms budget - + let config = InjectionConfig { + max_total_latency_ms: 100, // 100ms budget + ..Default::default() + }; + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // No start time - budget should be available assert!(manager.has_budget_remaining()); - + // Set start time manager.global_start = Some(Instant::now() - Duration::from_millis(50)); assert!(manager.has_budget_remaining()); - + // Exceed budget manager.global_start = Some(Instant::now() - Duration::from_millis(150)); assert!(!manager.has_budget_remaining()); @@ -1098,47 +1202,49 @@ mod tests { // Test injection with success #[tokio::test] async fn test_inject_success() { - let config = InjectionConfig::default(); + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // Test with text let result = manager.inject("test text").await; - // Don't require success in headless test env; just ensure it returns without panicking - assert!(result.is_ok() || result.is_err()); - - // Metrics are environment-dependent; just ensure call did not panic + // Don't require success in headless test env; just ensure it returns without panicking + assert!(result.is_ok() || result.is_err()); + + // Metrics are environment-dependent; just ensure call did not panic } // Test injection with failure #[tokio::test] async fn test_inject_failure() { - let mut config = InjectionConfig::default(); // Set very short budget to force failure - config.max_total_latency_ms = 1; - + let config = InjectionConfig { + max_total_latency_ms: 1, + ..Default::default() + }; + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // This should fail due to budget exhaustion let result = manager.inject("test text").await; assert!(result.is_err()); - + // Metrics should reflect failure // Note: Due to budget exhaustion, might not record metrics // Just verify no panic } // Test empty text handling - #[test] - fn test_empty_text() { - let config = InjectionConfig::default(); + #[tokio::test] + async fn test_empty_text() { + let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // Inject empty text - // Should handle empty string gracefully - // Note: inject is async; here we simply ensure calling path compiles - let _ = manager.inject(""); + // Should handle empty string gracefully + // Note: inject is async; here we simply ensure calling path compiles + let _ = manager.inject("").await; } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/mki_injector.rs b/crates/coldvox-text-injection/src/mki_injector.rs index cf5ab7a2..7c6a5a18 100644 --- a/crates/coldvox-text-injection/src/mki_injector.rs +++ b/crates/coldvox-text-injection/src/mki_injector.rs @@ -1,9 +1,9 @@ use crate::types::{InjectionConfig, InjectionError, InjectionMetrics, TextInjector}; -use tracing::{debug}; use async_trait::async_trait; +use tracing::debug; #[cfg(feature = "mki")] -use mouse_keyboard_input::{VirtualKeyboard, VirtualDevice, KeyboardControllable, Key}; +use mouse_keyboard_input::{Key, KeyboardControllable, VirtualDevice, VirtualKeyboard}; #[cfg(feature = "mki")] use std::os::unix::fs::PermissionsExt; @@ -19,7 +19,7 @@ impl MkiInjector { /// Create a new MKI injector pub fn new(config: InjectionConfig) -> Self { let is_available = Self::check_availability(); - + Self { config, metrics: InjectionMetrics::default(), @@ -32,11 +32,9 @@ impl MkiInjector { // Check if user is in input group let in_input_group = std::process::Command::new("groups") .output() - .map(|o| { - String::from_utf8_lossy(&o.stdout).contains("input") - }) + .map(|o| String::from_utf8_lossy(&o.stdout).contains("input")) .unwrap_or(false); - + // Check if /dev/uinput is accessible let uinput_accessible = std::fs::metadata("/dev/uinput") .map(|metadata| { @@ -50,7 +48,7 @@ impl MkiInjector { false }) .unwrap_or(false); - + in_input_group && uinput_accessible } @@ -59,17 +57,20 @@ impl MkiInjector { async fn type_text(&mut self, text: &str) -> Result<(), InjectionError> { let start = std::time::Instant::now(); let text_clone = text.to_string(); - + let result = tokio::task::spawn_blocking(move || { let mut keyboard = VirtualKeyboard::default().map_err(|e| { InjectionError::MethodFailed(format!("Failed to create keyboard: {}", e)) })?; - + // Simple implementation - just send the text - keyboard.key_sequence(&text_clone).map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - + keyboard + .key_sequence(&text_clone) + .map_err(|e| InjectionError::MethodFailed(e.to_string()))?; + Ok(()) - }).await; + }) + .await; match result { Ok(Ok(())) => { @@ -82,28 +83,33 @@ impl MkiInjector { Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed } } - + /// Type text using MKI (feature disabled stub) #[cfg(not(feature = "mki"))] async fn type_text(&mut self, _text: &str) -> Result<(), InjectionError> { - Err(InjectionError::MethodUnavailable("MKI feature not enabled".to_string())) + Err(InjectionError::MethodUnavailable( + "MKI feature not enabled".to_string(), + )) } /// Trigger paste action using MKI (Ctrl+V) #[cfg(feature = "mki")] async fn trigger_paste(&mut self) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + let result = tokio::task::spawn_blocking(|| { let mut keyboard = VirtualKeyboard::default().map_err(|e| { InjectionError::MethodFailed(format!("Failed to create keyboard: {}", e)) })?; - + // Press Ctrl+V - simplified for now - keyboard.key_sequence("ctrl+v").map_err(|e| InjectionError::MethodFailed(e.to_string()))?; - + keyboard + .key_sequence("ctrl+v") + .map_err(|e| InjectionError::MethodFailed(e.to_string()))?; + Ok(()) - }).await; + }) + .await; match result { Ok(Ok(())) => { @@ -116,11 +122,13 @@ impl MkiInjector { Err(_) => Err(InjectionError::Timeout(0)), // Spawn failed } } - + /// Trigger paste action using MKI (feature disabled stub) #[cfg(not(feature = "mki"))] async fn trigger_paste(&mut self) -> Result<(), InjectionError> { - Err(InjectionError::MethodUnavailable("MKI feature not enabled".to_string())) + Err(InjectionError::MethodUnavailable( + "MKI feature not enabled".to_string(), + )) } } @@ -155,4 +163,4 @@ impl TextInjector for MkiInjector { fn metrics(&self) -> &InjectionMetrics { &self.metrics } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/noop_injector.rs b/crates/coldvox-text-injection/src/noop_injector.rs index 6db85001..8ed9a5ca 100644 --- a/crates/coldvox-text-injection/src/noop_injector.rs +++ b/crates/coldvox-text-injection/src/noop_injector.rs @@ -25,7 +25,7 @@ impl TextInjector for NoOpInjector { } fn is_available(&self) -> bool { - true // Always available as fallback + true // Always available as fallback } async fn inject(&mut self, text: &str) -> Result<(), InjectionError> { @@ -37,7 +37,8 @@ impl TextInjector for NoOpInjector { // Record the operation but do nothing let duration = start.elapsed().as_millis() as u64; - self.metrics.record_success(crate::types::InjectionMethod::NoOp, duration); + self.metrics + .record_success(crate::types::InjectionMethod::NoOp, duration); tracing::debug!("NoOp injector: would inject {} characters", text.len()); @@ -90,4 +91,4 @@ mod tests { let metrics = injector.metrics(); assert_eq!(metrics.attempts, 0); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/processor.rs b/crates/coldvox-text-injection/src/processor.rs index 7be756b5..faa575d8 100644 --- a/crates/coldvox-text-injection/src/processor.rs +++ b/crates/coldvox-text-injection/src/processor.rs @@ -1,28 +1,4 @@ -use serde::{Deserialize, Serialize}; - -/// Transcription event that can be processed by the injection system -/// This is a simplified version that can be implemented by the main app -#[derive(Debug, Clone, Serialize, Deserialize)] -pub enum TranscriptionEvent { - /// Partial (interim) transcription result - Partial { - utterance_id: u64, - text: String, - confidence: Option, - }, - /// Final transcription result - Final { - utterance_id: u64, - text: String, - words: Option>, - confidence: Option, - }, - /// Error during transcription - Error { - code: String, - message: String, - }, -} +use coldvox_stt::TranscriptionEvent; /// Placeholder for pipeline metrics - to be provided by the main app #[derive(Debug, Clone, Default)] @@ -35,9 +11,9 @@ use tokio::sync::mpsc; use tokio::time::{self, Duration, Instant}; use tracing::{debug, error, info, warn}; -use super::session::{InjectionSession, SessionConfig, SessionState}; -use super::{InjectionConfig}; use super::manager::StrategyManager; +use super::session::{InjectionSession, SessionConfig, SessionState}; +use super::InjectionConfig; use crate::types::InjectionMetrics; /// Local metrics for the injection processor (UI/state), distinct from types::InjectionMetrics @@ -96,11 +72,11 @@ impl InjectionProcessor { ) -> Self { // Create session with shared metrics let session_config = SessionConfig::default(); // TODO: Expose this if needed - let session = InjectionSession::new(session_config, injection_metrics.clone()); - + let session = InjectionSession::new(session_config, injection_metrics.clone()); + let injector = StrategyManager::new(config.clone(), injection_metrics.clone()); - let metrics = Arc::new(Mutex::new(ProcessorMetrics { + let metrics = Arc::new(Mutex::new(ProcessorMetrics { session_state: SessionState::Idle, ..Default::default() })); @@ -147,11 +123,18 @@ impl InjectionProcessor { /// Handle a transcription event from the STT processor pub fn handle_transcription(&mut self, event: TranscriptionEvent) { match event { - TranscriptionEvent::Partial { text, utterance_id, .. } => { - debug!("Received partial transcription [{}]: {}", utterance_id, text); + TranscriptionEvent::Partial { + text, utterance_id, .. + } => { + debug!( + "Received partial transcription [{}]: {}", + utterance_id, text + ); self.update_metrics(); } - TranscriptionEvent::Final { text, utterance_id, .. } => { + TranscriptionEvent::Final { + text, utterance_id, .. + } => { let text_len = text.len(); info!("Received final transcription [{}]: {}", utterance_id, text); self.session.add_transcription(text); @@ -183,7 +166,7 @@ impl InjectionProcessor { buffer_text.len() > self.config.paste_chunk_chars as usize } }; - + // Record the operation type if let Ok(mut metrics) = self.injection_metrics.lock() { if use_paste { @@ -192,7 +175,7 @@ impl InjectionProcessor { metrics.record_keystroke(); } } - + self.perform_injection().await?; } Ok(()) @@ -214,7 +197,7 @@ impl InjectionProcessor { buffer_text.len() > self.config.paste_chunk_chars as usize } }; - + // Record the operation type if let Ok(mut metrics) = self.injection_metrics.lock() { if use_paste { @@ -223,7 +206,7 @@ impl InjectionProcessor { metrics.record_keystroke(); } } - + self.session.force_inject(); self.perform_injection().await?; } @@ -245,11 +228,17 @@ impl InjectionProcessor { } // Record the time from final transcription to injection - let latency = self.session.time_since_last_transcription() + let latency = self + .session + .time_since_last_transcription() .map(|d| d.as_millis() as u64) .unwrap_or(0); - - info!("Injecting {} characters from session (latency: {}ms)", text.len(), latency); + + info!( + "Injecting {} characters from session (latency: {}ms)", + text.len(), + latency + ); // Record the latency in metrics if let Ok(mut metrics) = self.injection_metrics.lock() { @@ -321,14 +310,18 @@ impl AsyncInjectionProcessor { pipeline_metrics: Option>, ) -> Self { // Create shared injection metrics - let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); - + let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); + // Create processor with shared metrics - let processor = Arc::new(tokio::sync::Mutex::new(InjectionProcessor::new(config.clone(), pipeline_metrics, injection_metrics.clone()))); - + let processor = Arc::new(tokio::sync::Mutex::new(InjectionProcessor::new( + config.clone(), + pipeline_metrics, + injection_metrics.clone(), + ))); + // Create injector with shared metrics let injector = StrategyManager::new(config, injection_metrics.clone()); - + Self { processor, transcription_rx, @@ -417,7 +410,7 @@ mod tests { fn test_injection_processor_basic_flow() { let config = InjectionConfig::default(); - let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); + let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); let mut processor = InjectionProcessor::new(config, None, injection_metrics); // Start with idle state @@ -428,7 +421,6 @@ mod tests { utterance_id: 1, text: "Hello world".to_string(), words: None, - confidence: None, }); assert_eq!(processor.session_state(), SessionState::Buffering); @@ -438,15 +430,15 @@ mod tests { // Check for silence transition (this would normally be called periodically) processor.session.check_for_silence_transition(); - + // Should be in WaitingForSilence state now assert_eq!(processor.session_state(), SessionState::WaitingForSilence); // This should trigger injection check let should_inject = processor.session.should_inject(); assert!(should_inject, "Session should be ready to inject"); - - // Instead of actually injecting (which requires ydotool), + + // Instead of actually injecting (which requires ydotool), // we'll manually clear the buffer to simulate successful injection let buffer_content = processor.session.take_buffer(); assert_eq!(buffer_content, "Hello world"); @@ -458,7 +450,7 @@ mod tests { #[test] fn test_metrics_update() { let config = InjectionConfig::default(); - let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); + let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); let mut processor = InjectionProcessor::new(config, None, injection_metrics); // Add transcription @@ -466,7 +458,6 @@ mod tests { utterance_id: 1, text: "Test transcription".to_string(), words: None, - confidence: None, }); let metrics = processor.metrics(); @@ -478,7 +469,7 @@ mod tests { #[test] fn test_partial_transcription_handling() { let config = InjectionConfig::default(); - let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); + let injection_metrics = Arc::new(Mutex::new(crate::types::InjectionMetrics::default())); let mut processor = InjectionProcessor::new(config, None, injection_metrics); // Start with idle state @@ -488,7 +479,8 @@ mod tests { processor.handle_transcription(TranscriptionEvent::Partial { utterance_id: 1, text: "Hello".to_string(), - confidence: None, + t0: None, + t1: None, }); // Should still be idle since partial events don't change session state @@ -499,11 +491,10 @@ mod tests { utterance_id: 1, text: "Hello world".to_string(), words: None, - confidence: None, }); // Now should be buffering assert_eq!(processor.session_state(), SessionState::Buffering); assert_eq!(processor.session.buffer_len(), 1); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/session.rs b/crates/coldvox-text-injection/src/session.rs index 52b89694..67a7dd77 100644 --- a/crates/coldvox-text-injection/src/session.rs +++ b/crates/coldvox-text-injection/src/session.rs @@ -1,6 +1,6 @@ +use crate::types::InjectionMetrics; use std::time::{Duration, Instant}; use tracing::{debug, info, warn}; -use crate::types::InjectionMetrics; /// Session state machine for buffered text injection #[derive(Debug, Clone, Copy, PartialEq, Eq, Default)] @@ -27,8 +27,6 @@ impl std::fmt::Display for SessionState { } } - - /// Configuration for session management #[derive(Debug, Clone)] pub struct SessionConfig { @@ -51,10 +49,10 @@ pub struct SessionConfig { impl Default for SessionConfig { fn default() -> Self { Self { - silence_timeout_ms: 0, // Immediate injection after STT completes + silence_timeout_ms: 0, // Immediate injection after STT completes max_buffer_size: 5000, join_separator: " ".to_string(), - buffer_pause_timeout_ms: 0, // No pause needed since STT buffers audio + buffer_pause_timeout_ms: 0, // No pause needed since STT buffers audio flush_on_punctuation: true, punctuation_marks: vec!['.', '!', '?', ';'], normalize_whitespace: true, @@ -93,21 +91,24 @@ pub struct InjectionSession { impl InjectionSession { /// Create a new session with the given configuration - pub fn new(config: SessionConfig, metrics: std::sync::Arc>) -> Self { - Self { - state: SessionState::Idle, - buffer: Vec::new(), - last_transcription: None, - buffering_start: None, - silence_timeout: Duration::from_millis(config.silence_timeout_ms), - buffer_pause_timeout: Duration::from_millis(config.buffer_pause_timeout_ms), - max_buffer_size: config.max_buffer_size, - join_separator: config.join_separator, - flush_on_punctuation: config.flush_on_punctuation, - punctuation_marks: config.punctuation_marks, - normalize_whitespace: config.normalize_whitespace, - metrics, - } + pub fn new( + config: SessionConfig, + metrics: std::sync::Arc>, + ) -> Self { + Self { + state: SessionState::Idle, + buffer: Vec::new(), + last_transcription: None, + buffering_start: None, + silence_timeout: Duration::from_millis(config.silence_timeout_ms), + buffer_pause_timeout: Duration::from_millis(config.buffer_pause_timeout_ms), + max_buffer_size: config.max_buffer_size, + join_separator: config.join_separator, + flush_on_punctuation: config.flush_on_punctuation, + punctuation_marks: config.punctuation_marks, + normalize_whitespace: config.normalize_whitespace, + metrics, + } } /// Add a new transcription to the session buffer @@ -129,9 +130,11 @@ impl InjectionSession { self.record_buffered_chars(text.len() as u64); // Check if text ends with punctuation that should trigger flushing - let ends_with_punctuation = self.flush_on_punctuation && - !text.is_empty() && - self.punctuation_marks.contains(&text.chars().last().unwrap()); + let ends_with_punctuation = self.flush_on_punctuation + && !text.is_empty() + && self + .punctuation_marks + .contains(&text.chars().last().unwrap()); // Add to buffer self.buffer.push(text); @@ -145,7 +148,10 @@ impl InjectionSession { info!("Session started - first transcription buffered"); } SessionState::Buffering => { - debug!("Additional transcription buffered, {} items in session", self.buffer.len()); + debug!( + "Additional transcription buffered, {} items in session", + self.buffer.len() + ); } SessionState::WaitingForSilence => { // New transcription resets the silence timer and transitions back to Buffering @@ -181,7 +187,7 @@ impl InjectionSession { if self.state == SessionState::Buffering { if let Some(_buffering_start) = self.buffering_start { let time_since_last_transcription = self.last_transcription.map(|t| t.elapsed()); - + // If we haven't received a transcription for buffer_pause_timeout, // transition to WaitingForSilence if let Some(time_since_last) = time_since_last_transcription { @@ -207,7 +213,10 @@ impl InjectionSession { if last_time.elapsed() >= self.silence_timeout { // Silence timeout reached, transition to ready to inject self.state = SessionState::ReadyToInject; - info!("Silence timeout reached, ready to inject {} transcriptions", self.buffer.len()); + info!( + "Silence timeout reached, ready to inject {} transcriptions", + self.buffer.len() + ); true } else { false @@ -238,7 +247,7 @@ impl InjectionSession { self.buffering_start = None; self.state = SessionState::Idle; debug!("Session buffer cleared, {} chars taken", text.len()); - + // Record the flush event with the size self.record_flush(size as u64); text @@ -291,14 +300,14 @@ impl InjectionSession { pub fn buffer_preview(&self) -> String { self.buffer.join(&self.join_separator) } - + /// Record characters that have been buffered pub fn record_buffered_chars(&self, count: u64) { if let Ok(mut metrics) = self.metrics.lock() { metrics.record_buffered_chars(count); } } - + /// Record a flush event pub fn record_flush(&self, size: u64) { if let Ok(mut metrics) = self.metrics.lock() { @@ -315,12 +324,12 @@ mod tests { #[test] fn test_session_state_transitions() { let config = SessionConfig { - silence_timeout_ms: 100, // Short timeout for testing + silence_timeout_ms: 100, // Short timeout for testing buffer_pause_timeout_ms: 50, // Short pause timeout for testing ..Default::default() }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); + let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); + let mut session = InjectionSession::new(config, metrics); // Start with idle state assert_eq!(session.state(), SessionState::Idle); @@ -360,8 +369,8 @@ mod tests { max_buffer_size: 10, // Very small limit ..Default::default() }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); + let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); + let mut session = InjectionSession::new(config, metrics); // Add text that exceeds limit session.add_transcription("This is a long sentence".to_string()); @@ -370,8 +379,8 @@ mod tests { #[test] fn test_empty_transcription_filtering() { - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(SessionConfig::default(), metrics); + let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); + let mut session = InjectionSession::new(SessionConfig::default(), metrics); session.add_transcription("".to_string()); session.add_transcription(" ".to_string()); @@ -388,8 +397,8 @@ mod tests { buffer_pause_timeout_ms: 50, ..Default::default() }; - let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); - let mut session = InjectionSession::new(config, metrics); + let metrics = std::sync::Arc::new(std::sync::Mutex::new(InjectionMetrics::default())); + let mut session = InjectionSession::new(config, metrics); // Add transcription session.add_transcription("Test".to_string()); @@ -409,4 +418,4 @@ mod tests { session.check_for_silence_transition(); assert_eq!(session.state(), SessionState::WaitingForSilence); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/tests/mod.rs b/crates/coldvox-text-injection/src/tests/mod.rs index 9a11c8c5..b07546c2 100644 --- a/crates/coldvox-text-injection/src/tests/mod.rs +++ b/crates/coldvox-text-injection/src/tests/mod.rs @@ -1,10 +1,10 @@ #[cfg(test)] +mod test_adaptive_strategy; +#[cfg(test)] mod test_focus_tracking; #[cfg(test)] -mod test_permission_checking; +mod test_integration; #[cfg(test)] -mod test_adaptive_strategy; +mod test_permission_checking; #[cfg(test)] mod test_window_manager; -#[cfg(test)] -mod test_integration; \ No newline at end of file diff --git a/crates/coldvox-text-injection/src/tests/test_adaptive_strategy.rs b/crates/coldvox-text-injection/src/tests/test_adaptive_strategy.rs index 0afd8c9d..251ad8c9 100644 --- a/crates/coldvox-text-injection/src/tests/test_adaptive_strategy.rs +++ b/crates/coldvox-text-injection/src/tests/test_adaptive_strategy.rs @@ -9,65 +9,65 @@ mod tests { let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // Simulate some successes and failures manager.update_success_record("test_app", InjectionMethod::Clipboard, true); manager.update_success_record("test_app", InjectionMethod::Clipboard, true); manager.update_success_record("test_app", InjectionMethod::Clipboard, false); - + // Success rate should be approximately 66% - let methods = manager.get_method_priority("test_app"); + let methods = manager.get_method_priority("test_app"); assert!(!methods.is_empty()); } - + #[test] fn test_cooldown_application() { let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // Apply cooldown manager.apply_cooldown("test_app", InjectionMethod::YdoToolPaste, "Test error"); - + // Method should be in cooldown - let _ = manager.is_in_cooldown(InjectionMethod::YdoToolPaste); + let _ = manager.is_in_cooldown(InjectionMethod::YdoToolPaste); } - + #[test] fn test_method_priority_ordering() { let mut config = InjectionConfig::default(); config.allow_ydotool = true; config.allow_enigo = false; - + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); - let manager = StrategyManager::new(config, metrics); - - let methods = manager.get_method_priority("test_app"); - + let manager = StrategyManager::new(config, metrics); + + let methods = manager.get_method_priority("test_app"); + // Should have some methods available assert!(!methods.is_empty()); - + // AT-SPI should be preferred if available #[cfg(feature = "atspi")] assert_eq!(methods[0], InjectionMethod::AtspiInsert); } - + #[test] fn test_success_rate_decay() { let config = InjectionConfig::default(); let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let mut manager = StrategyManager::new(config, metrics); - + // Add initial success manager.update_success_record("test_app", InjectionMethod::Clipboard, true); - + // Add multiple updates to trigger decay for _ in 0..5 { manager.update_success_record("test_app", InjectionMethod::Clipboard, true); } - + // Success rate should still be high despite decay let methods = manager.get_method_priority("test_app"); assert!(!methods.is_empty()); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/tests/test_focus_tracking.rs b/crates/coldvox-text-injection/src/tests/test_focus_tracking.rs index 7d7be709..4139fc9c 100644 --- a/crates/coldvox-text-injection/src/tests/test_focus_tracking.rs +++ b/crates/coldvox-text-injection/src/tests/test_focus_tracking.rs @@ -1,6 +1,6 @@ #[cfg(test)] mod tests { - use crate::focus::{FocusTracker, FocusStatus}; + use crate::focus::{FocusStatus, FocusTracker}; use crate::types::InjectionConfig; use std::time::Duration; use tokio::time::sleep; @@ -9,40 +9,40 @@ mod tests { async fn test_focus_detection() { let config = InjectionConfig::default(); let mut tracker = FocusTracker::new(config); - + // Test focus detection let status = tracker.get_focus_status().await; assert!(status.is_ok()); - + // Test caching let cached = tracker.cached_focus_status(); assert!(cached.is_some()); } - + #[tokio::test] async fn test_focus_cache_expiry() { let mut config = InjectionConfig::default(); config.focus_cache_duration_ms = 50; // Very short cache let mut tracker = FocusTracker::new(config); - + // Get initial status let _status1 = tracker.get_focus_status().await.unwrap(); assert!(tracker.cached_focus_status().is_some()); - + // Wait for cache to expire sleep(Duration::from_millis(60)).await; - + // This should trigger a new check let _status2 = tracker.get_focus_status().await.unwrap(); - + // Cache should be refreshed assert!(tracker.cached_focus_status().is_some()); } - + #[test] fn test_focus_status_equality() { assert_eq!(FocusStatus::EditableText, FocusStatus::EditableText); assert_ne!(FocusStatus::EditableText, FocusStatus::NonEditable); assert_ne!(FocusStatus::Unknown, FocusStatus::EditableText); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/tests/test_integration.rs b/crates/coldvox-text-injection/src/tests/test_integration.rs index d06ae780..d4ce8686 100644 --- a/crates/coldvox-text-injection/src/tests/test_integration.rs +++ b/crates/coldvox-text-injection/src/tests/test_integration.rs @@ -9,72 +9,77 @@ mod integration_tests { let mut config = InjectionConfig::default(); config.allow_ydotool = false; // Disable external dependencies for testing config.restore_clipboard = true; - + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let manager = StrategyManager::new(config, metrics.clone()); - + // Test getting current app ID let app_id = manager.get_current_app_id().await; assert!(app_id.is_ok()); let app_id = app_id.unwrap(); println!("Current app ID: {}", app_id); - + // Test method priority let methods = manager.get_method_priority(&app_id); - assert!(!methods.is_empty(), "Should have at least one injection method available"); + assert!( + !methods.is_empty(), + "Should have at least one injection method available" + ); println!("Available methods: {:?}", methods); - + // Check metrics let metrics_guard = metrics.lock().unwrap(); - println!("Initial metrics: attempts={}, successes={}", - metrics_guard.attempts, metrics_guard.successes); + println!( + "Initial metrics: attempts={}, successes={}", + metrics_guard.attempts, metrics_guard.successes + ); } - + #[tokio::test] async fn test_app_allowlist_blocklist() { let mut config = InjectionConfig::default(); config.allowlist = vec!["firefox".to_string(), "chrome".to_string()]; config.blocklist = vec!["terminal".to_string()]; - + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let manager = StrategyManager::new(config, metrics); - + // Test allowlist assert!(manager.is_app_allowed("firefox")); assert!(manager.is_app_allowed("chrome")); assert!(!manager.is_app_allowed("notepad")); - + // Clear allowlist and test blocklist let mut config = InjectionConfig::default(); config.blocklist = vec!["terminal".to_string(), "console".to_string()]; - + let metrics = Arc::new(Mutex::new(InjectionMetrics::default())); let manager = StrategyManager::new(config, metrics); - + assert!(!manager.is_app_allowed("terminal")); assert!(!manager.is_app_allowed("console")); assert!(manager.is_app_allowed("firefox")); } - + #[test] fn test_configuration_defaults() { let config = InjectionConfig::default(); - + // Check default values assert!(!config.allow_ydotool); assert!(!config.allow_kdotool); assert!(!config.allow_enigo); assert!(!config.allow_mki); assert!(!config.restore_clipboard); - assert!(config.inject_on_unknown_focus); + assert!(config.inject_on_unknown_focus); assert!(config.enable_window_detection); - + assert_eq!(config.focus_cache_duration_ms, 200); assert_eq!(config.min_success_rate, 0.3); assert_eq!(config.min_sample_size, 5); assert_eq!(config.clipboard_restore_delay_ms, Some(500)); - + assert!(config.allowlist.is_empty()); assert!(config.blocklist.is_empty()); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/tests/test_permission_checking.rs b/crates/coldvox-text-injection/src/tests/test_permission_checking.rs index 25d3a8b6..7f4d0c59 100644 --- a/crates/coldvox-text-injection/src/tests/test_permission_checking.rs +++ b/crates/coldvox-text-injection/src/tests/test_permission_checking.rs @@ -1,29 +1,30 @@ #[cfg(test)] mod tests { + use std::process::Command; + + use crate::InjectionConfig; + #[cfg(feature = "ydotool")] + use crate::types::TextInjector; #[cfg(feature = "ydotool")] use crate::ydotool_injector::YdotoolInjector; - - use std::process::Command; #[test] fn test_binary_existence_check() { // Test with a binary that should exist - let output = Command::new("which") - .arg("ls") - .output(); - + let output = Command::new("which").arg("ls").output(); + assert!(output.is_ok()); assert!(output.unwrap().status.success()); - + // Test with a binary that shouldn't exist let output = Command::new("which") .arg("nonexistent_binary_xyz123") .output(); - + assert!(output.is_ok()); assert!(!output.unwrap().status.success()); } - + #[cfg(feature = "ydotool")] #[test] fn test_ydotool_availability() { @@ -31,18 +32,18 @@ mod tests { let injector = YdotoolInjector::new(config); let _available = injector.is_available(); } - + #[test] fn test_permission_mode_check() { use std::os::unix::fs::PermissionsExt; - + // Check /usr/bin/ls or similar common executable if let Ok(metadata) = std::fs::metadata("/usr/bin/ls") { let permissions = metadata.permissions(); let mode = permissions.mode(); - + // Should have at least execute permission for owner assert!(mode & 0o100 != 0); } } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/tests/test_window_manager.rs b/crates/coldvox-text-injection/src/tests/test_window_manager.rs index 0fcb24c7..a12ddaae 100644 --- a/crates/coldvox-text-injection/src/tests/test_window_manager.rs +++ b/crates/coldvox-text-injection/src/tests/test_window_manager.rs @@ -7,7 +7,7 @@ mod tests { // This test will only work in a graphical environment if std::env::var("DISPLAY").is_ok() || std::env::var("WAYLAND_DISPLAY").is_ok() { let result = get_active_window_class().await; - + // We can't assert specific values since it depends on the environment // but we can check that it doesn't panic match result { @@ -21,40 +21,43 @@ mod tests { } } } - + #[tokio::test] async fn test_window_info_structure() { let info = get_window_info().await; - + // Basic sanity checks assert!(!info.class.is_empty()); // Title might be empty // PID might be 0 if detection failed } - + #[test] fn test_x11_detection() { // Check if X11 is available let x11_available = std::env::var("DISPLAY").is_ok(); - + if x11_available { // Try to run xprop let output = std::process::Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) + .args(["-root", "_NET_ACTIVE_WINDOW"]) .output(); - + // Should at least not panic assert!(output.is_ok() || output.is_err()); } } - + #[test] fn test_wayland_detection() { // Check if Wayland is available let wayland_available = std::env::var("WAYLAND_DISPLAY").is_ok(); - + if wayland_available { - println!("Wayland display detected: {:?}", std::env::var("WAYLAND_DISPLAY")); + println!( + "Wayland display detected: {:?}", + std::env::var("WAYLAND_DISPLAY") + ); } } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/types.rs b/crates/coldvox-text-injection/src/types.rs index f09f2d39..3b69c897 100644 --- a/crates/coldvox-text-injection/src/types.rs +++ b/crates/coldvox-text-injection/src/types.rs @@ -1,6 +1,6 @@ +use async_trait::async_trait; use serde::{Deserialize, Serialize}; use std::time::Duration; -use async_trait::async_trait; /// Enumeration of all available text injection methods #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)] @@ -45,15 +45,15 @@ pub struct InjectionConfig { /// Whether to allow injection when focus state is unknown #[serde(default = "default_inject_on_unknown_focus")] pub inject_on_unknown_focus: bool, - + /// Whether to require editable focus for injection #[serde(default = "default_require_focus")] pub require_focus: bool, - + /// Hotkey to pause/resume injection (e.g., "Ctrl+Alt+P") #[serde(default = "default_pause_hotkey")] pub pause_hotkey: Option, - + /// Whether to redact text content in logs #[serde(default = "default_redact_logs")] pub redact_logs: bool, @@ -61,7 +61,7 @@ pub struct InjectionConfig { /// Overall latency budget for a single injection call, across all fallbacks. #[serde(default = "default_max_total_latency_ms")] pub max_total_latency_ms: u64, - + /// Timeout for individual injection method attempts (e.g., AT-SPI call, clipboard set). #[serde(default = "default_per_method_timeout_ms")] pub per_method_timeout_ms: u64, @@ -94,31 +94,31 @@ pub struct InjectionConfig { /// Delay between paste chunks in milliseconds #[serde(default = "default_chunk_delay_ms")] pub chunk_delay_ms: u64, - + /// Cache duration for focus status (ms) #[serde(default = "default_focus_cache_duration_ms")] pub focus_cache_duration_ms: u64, - + /// Minimum success rate before trying fallback methods #[serde(default = "default_min_success_rate")] pub min_success_rate: f64, - + /// Number of samples before trusting success rate #[serde(default = "default_min_sample_size")] pub min_sample_size: u32, - + /// Enable window manager integration #[serde(default = "default_true")] pub enable_window_detection: bool, - + /// Delay before restoring clipboard (ms) #[serde(default = "default_clipboard_restore_delay_ms")] pub clipboard_restore_delay_ms: Option, - + /// Allowlist of application patterns (regex) for injection #[serde(default)] pub allowlist: Vec, - + /// Blocklist of application patterns (regex) to block injection #[serde(default)] pub blocklist: Vec, @@ -129,7 +129,7 @@ fn default_false() -> bool { } fn default_inject_on_unknown_focus() -> bool { - true // Default to true to avoid blocking on Wayland without AT-SPI + true // Default to true to avoid blocking on Wayland without AT-SPI } fn default_require_focus() -> bool { @@ -141,7 +141,7 @@ fn default_pause_hotkey() -> Option { } fn default_redact_logs() -> bool { - true // Privacy-first by default + true // Privacy-first by default } fn default_allowlist() -> Vec { @@ -157,29 +157,31 @@ fn default_injection_mode() -> String { } fn default_keystroke_rate_cps() -> u32 { - 20 // 20 characters per second (human typing speed) + 20 // 20 characters per second (human typing speed) } fn default_max_burst_chars() -> u32 { - 50 // Maximum 50 characters in a single burst + 50 // Maximum 50 characters in a single burst } fn default_paste_chunk_chars() -> u32 { - 500 // Chunk paste operations into 500 character chunks + 500 // Chunk paste operations into 500 character chunks } -fn default_chunk_delay_ms() -> u64 { 30 } +fn default_chunk_delay_ms() -> u64 { + 30 +} fn default_focus_cache_duration_ms() -> u64 { - 200 // Cache focus status for 200ms + 200 // Cache focus status for 200ms } fn default_min_success_rate() -> f64 { - 0.3 // 30% minimum success rate before considering fallback + 0.3 // 30% minimum success rate before considering fallback } fn default_min_sample_size() -> u32 { - 5 // Need at least 5 samples before trusting success rate + 5 // Need at least 5 samples before trusting success rate } fn default_true() -> bool { @@ -187,7 +189,7 @@ fn default_true() -> bool { } fn default_clipboard_restore_delay_ms() -> Option { - Some(500) // Wait 500ms before restoring clipboard + Some(500) // Wait 500ms before restoring clipboard } fn default_max_total_latency_ms() -> u64 { @@ -279,19 +281,19 @@ pub enum InjectionError { #[error("All methods failed: {0}")] AllMethodsFailed(String), - + #[error("Method unavailable: {0}")] MethodUnavailable(String), - + #[error("Method failed: {0}")] MethodFailed(String), - + #[error("Budget exhausted")] BudgetExhausted, - + #[error("Clipboard error: {0}")] Clipboard(String), - + #[error("Process error: {0}")] Process(String), @@ -368,64 +370,64 @@ impl InjectionMetrics { pub fn record_attempt(&mut self, method: InjectionMethod, duration_ms: u64) { self.attempts += 1; self.total_duration_ms += duration_ms; - + // Update method-specific metrics let method_metrics = self.method_metrics.entry(method).or_default(); method_metrics.attempts += 1; method_metrics.total_duration_ms += duration_ms; } - + /// Record characters that have been buffered pub fn record_buffered_chars(&mut self, count: u64) { self.chars_buffered += count; } - + /// Record characters that have been successfully injected pub fn record_injected_chars(&mut self, count: u64) { self.chars_injected += count; } - + /// Record a flush event pub fn record_flush(&mut self, size: u64) { self.flushes += 1; self.flush_size_chars.push(size); } - + /// Record a paste operation pub fn record_paste(&mut self) { self.paste_uses += 1; } - + /// Record a keystroke operation pub fn record_keystroke(&mut self) { self.keystroke_uses += 1; } - + /// Record a backend denial pub fn record_backend_denied(&mut self) { self.backend_denied += 1; } - + /// Record a focus missing error pub fn record_focus_missing(&mut self) { self.focus_missing += 1; } - + /// Record a rate limited event pub fn record_rate_limited(&mut self) { self.rate_limited += 1; } - + /// Record latency from final transcription to injection pub fn record_latency_from_final(&mut self, latency_ms: u64) { self.latency_from_final_ms.push(latency_ms); } - + /// Update the last injection timestamp pub fn update_last_injection(&mut self) { self.last_injection = Some(std::time::Instant::now()); } - + /// Update the stuck buffer age pub fn update_stuck_buffer_age(&mut self, age_ms: u64) { self.stuck_buffer_age_ms = age_ms; @@ -435,7 +437,7 @@ impl InjectionMetrics { pub fn record_success(&mut self, method: InjectionMethod, duration_ms: u64) { self.successes += 1; self.record_attempt(method, duration_ms); - + // Update method-specific success if let Some(metrics) = self.method_metrics.get_mut(&method) { metrics.successes += 1; @@ -447,7 +449,7 @@ impl InjectionMetrics { pub fn record_failure(&mut self, method: InjectionMethod, duration_ms: u64, error: String) { self.failures += 1; self.record_attempt(method, duration_ms); - + // Update method-specific failure if let Some(metrics) = self.method_metrics.get_mut(&method) { metrics.failures += 1; @@ -474,25 +476,25 @@ impl InjectionMetrics { pub trait TextInjector: Send + Sync { /// Name of the injector for logging and metrics fn name(&self) -> &'static str; - + /// Check if this injector is available for use fn is_available(&self) -> bool; - + /// Inject text using this method async fn inject(&mut self, text: &str) -> Result<(), InjectionError>; - + /// Type text with pacing (characters per second) /// Default implementation falls back to inject() async fn type_text(&mut self, text: &str, _rate_cps: u32) -> Result<(), InjectionError> { self.inject(text).await } - + /// Paste text (may use clipboard or other methods) /// Default implementation falls back to inject() async fn paste(&mut self, text: &str) -> Result<(), InjectionError> { self.inject(text).await } - + /// Get metrics for this injector fn metrics(&self) -> &InjectionMetrics; -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/window_manager.rs b/crates/coldvox-text-injection/src/window_manager.rs index 3333c060..a14fac89 100644 --- a/crates/coldvox-text-injection/src/window_manager.rs +++ b/crates/coldvox-text-injection/src/window_manager.rs @@ -1,7 +1,9 @@ -use crate::types::InjectionError; use std::process::Command; -use tracing::debug; + use serde_json; +use tracing::debug; + +use crate::types::InjectionError; /// Get the currently active window class name pub async fn get_active_window_class() -> Result { @@ -9,50 +11,52 @@ pub async fn get_active_window_class() -> Result { if let Ok(class) = get_kde_window_class().await { return Ok(class); } - + // Try generic X11 method if let Ok(class) = get_x11_window_class().await { return Ok(class); } - + // Try Wayland method if let Ok(class) = get_wayland_window_class().await { return Ok(class); } - - Err(InjectionError::Other("Could not determine active window".to_string())) + + Err(InjectionError::Other( + "Could not determine active window".to_string(), + )) } async fn get_kde_window_class() -> Result { // Use KWin DBus interface let output = Command::new("qdbus") - .args([ - "org.kde.KWin", - "/KWin", - "org.kde.KWin.activeClient" - ]) + .args(["org.kde.KWin", "/KWin", "org.kde.KWin.activeClient"]) .output() .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - + if output.status.success() { let window_id = String::from_utf8_lossy(&output.stdout).trim().to_string(); - + // Get window class from ID let class_output = Command::new("qdbus") .args([ "org.kde.KWin", &format!("/Windows/{}", window_id), - "org.kde.KWin.Window.resourceClass" + "org.kde.KWin.Window.resourceClass", ]) .output() .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - + if class_output.status.success() { - return Ok(String::from_utf8_lossy(&class_output.stdout).trim().to_string()); + return Ok(String::from_utf8_lossy(&class_output.stdout) + .trim() + .to_string()); } } - - Err(InjectionError::Other("KDE window class not available".to_string())) + + Err(InjectionError::Other( + "KDE window class not available".to_string(), + )) } async fn get_x11_window_class() -> Result { @@ -61,18 +65,18 @@ async fn get_x11_window_class() -> Result { .args(["-root", "_NET_ACTIVE_WINDOW"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if output.status.success() { let window_str = String::from_utf8_lossy(&output.stdout); if let Some(window_id) = window_str.split("# ").nth(1) { let window_id = window_id.trim(); - + // Get window class let class_output = Command::new("xprop") .args(["-id", window_id, "WM_CLASS"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if class_output.status.success() { let class_str = String::from_utf8_lossy(&class_output.stdout); // Parse WM_CLASS string (format: WM_CLASS(STRING) = "instance", "class") @@ -82,27 +86,33 @@ async fn get_x11_window_class() -> Result { } } } - - Err(InjectionError::Other("X11 window class not available".to_string())) + + Err(InjectionError::Other( + "X11 window class not available".to_string(), + )) } async fn get_wayland_window_class() -> Result { // Try using wlr-foreign-toplevel-management protocol if available // This requires compositor support (e.g., Sway, some KWin versions) - + // For now, we'll try using swaymsg if Sway is running let output = Command::new("swaymsg") .args(["-t", "get_tree"]) .output() .map_err(|e| InjectionError::Process(format!("swaymsg failed: {}", e)))?; - + if output.status.success() { // Parse JSON to find focused window using serde_json let tree = String::from_utf8_lossy(&output.stdout); if let Ok(json) = serde_json::from_str::(&tree) { // Depth-first search for focused node with app_id fn dfs(node: &serde_json::Value) -> Option { - if node.get("focused").and_then(|v| v.as_bool()).unwrap_or(false) { + if node + .get("focused") + .and_then(|v| v.as_bool()) + .unwrap_or(false) + { if let Some(app_id) = node.get("app_id").and_then(|v| v.as_str()) { return Some(app_id.to_string()); } @@ -114,12 +124,17 @@ async fn get_wayland_window_class() -> Result { } if let Some(nodes) = node.get("nodes").and_then(|v| v.as_array()) { for n in nodes { - if let Some(found) = dfs(n) { return Some(found); } + if let Some(found) = dfs(n) { + return Some(found); + } } } - if let Some(floating_nodes) = node.get("floating_nodes").and_then(|v| v.as_array()) { + if let Some(floating_nodes) = node.get("floating_nodes").and_then(|v| v.as_array()) + { for n in floating_nodes { - if let Some(found) = dfs(n) { return Some(found); } + if let Some(found) = dfs(n) { + return Some(found); + } } } None @@ -131,21 +146,21 @@ async fn get_wayland_window_class() -> Result { debug!("Failed to parse swaymsg JSON; falling back"); } } - - Err(InjectionError::Other("Wayland window class not available".to_string())) + + Err(InjectionError::Other( + "Wayland window class not available".to_string(), + )) } /// Get window information using multiple methods pub async fn get_window_info() -> WindowInfo { - let class = get_active_window_class().await.unwrap_or_else(|_| "unknown".to_string()); + let class = get_active_window_class() + .await + .unwrap_or_else(|_| "unknown".to_string()); let title = get_window_title().await.unwrap_or_default(); let pid = get_window_pid().await.unwrap_or(0); - - WindowInfo { - class, - title, - pid, - } + + WindowInfo { class, title, pid } } /// Window information structure @@ -163,18 +178,18 @@ async fn get_window_title() -> Result { .args(["-root", "_NET_ACTIVE_WINDOW"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if output.status.success() { let window_str = String::from_utf8_lossy(&output.stdout); if let Some(window_id) = window_str.split("# ").nth(1) { let window_id = window_id.trim(); - + // Get window title let title_output = Command::new("xprop") .args(["-id", window_id, "_NET_WM_NAME"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if title_output.status.success() { let title_str = String::from_utf8_lossy(&title_output.stdout); // Parse title string @@ -187,8 +202,10 @@ async fn get_window_title() -> Result { } } } - - Err(InjectionError::Other("Could not get window title".to_string())) + + Err(InjectionError::Other( + "Could not get window title".to_string(), + )) } /// Get the PID of the active window @@ -198,18 +215,18 @@ async fn get_window_pid() -> Result { .args(["-root", "_NET_ACTIVE_WINDOW"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if output.status.success() { let window_str = String::from_utf8_lossy(&output.stdout); if let Some(window_id) = window_str.split("# ").nth(1) { let window_id = window_id.trim(); - + // Get window PID let pid_output = Command::new("xprop") .args(["-id", window_id, "_NET_WM_PID"]) .output() .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - + if pid_output.status.success() { let pid_str = String::from_utf8_lossy(&pid_output.stdout); // Parse PID (format: _NET_WM_PID(CARDINAL) = ) @@ -221,8 +238,10 @@ async fn get_window_pid() -> Result { } } } - - Err(InjectionError::Other("Could not get window PID".to_string())) + + Err(InjectionError::Other( + "Could not get window PID".to_string(), + )) } #[cfg(test)] @@ -247,11 +266,11 @@ mod tests { } } } - + #[tokio::test] async fn test_window_info() { let info = get_window_info().await; // Basic sanity check assert!(!info.class.is_empty()); } -} \ No newline at end of file +} diff --git a/crates/coldvox-text-injection/src/ydotool_injector.rs b/crates/coldvox-text-injection/src/ydotool_injector.rs index 1727720d..9f04e2dc 100644 --- a/crates/coldvox-text-injection/src/ydotool_injector.rs +++ b/crates/coldvox-text-injection/src/ydotool_injector.rs @@ -1,10 +1,10 @@ use crate::types::{InjectionConfig, InjectionError, InjectionMetrics, TextInjector}; use anyhow::Result; +use async_trait::async_trait; use std::process::Command; use std::time::Duration; use tokio::time::timeout; use tracing::{debug, info, warn}; -use async_trait::async_trait; /// Ydotool injector for synthetic key events pub struct YdotoolInjector { @@ -18,7 +18,7 @@ impl YdotoolInjector { /// Create a new ydotool injector pub fn new(config: InjectionConfig) -> Self { let is_available = Self::check_ydotool(); - + Self { config, metrics: InjectionMetrics::default(), @@ -34,7 +34,10 @@ impl YdotoolInjector { let user_id = std::env::var("UID").unwrap_or_else(|_| "1000".to_string()); let socket_path = format!("/run/user/{}/.ydotool_socket", user_id); if !std::path::Path::new(&socket_path).exists() { - warn!("ydotool socket not found at {}, daemon may not be running", socket_path); + warn!( + "ydotool socket not found at {}, daemon may not be running", + socket_path + ); return false; } true @@ -45,76 +48,80 @@ impl YdotoolInjector { } } } - + /// Check if a binary exists and has proper permissions fn check_binary_permissions(binary_name: &str) -> Result<(), InjectionError> { use std::os::unix::fs::PermissionsExt; - + // Check if binary exists in PATH let output = Command::new("which") .arg(binary_name) .output() - .map_err(|e| InjectionError::Process(format!("Failed to locate {}: {}", binary_name, e)))?; - + .map_err(|e| { + InjectionError::Process(format!("Failed to locate {}: {}", binary_name, e)) + })?; + if !output.status.success() { - return Err(InjectionError::MethodUnavailable( - format!("{} not found in PATH", binary_name) - )); + return Err(InjectionError::MethodUnavailable(format!( + "{} not found in PATH", + binary_name + ))); } - + let binary_path = String::from_utf8_lossy(&output.stdout).trim().to_string(); - + // Check if binary is executable - let metadata = std::fs::metadata(&binary_path) - .map_err(InjectionError::Io)?; - + let metadata = std::fs::metadata(&binary_path).map_err(InjectionError::Io)?; + let permissions = metadata.permissions(); if permissions.mode() & 0o111 == 0 { - return Err(InjectionError::PermissionDenied( - format!("{} is not executable", binary_name) - )); + return Err(InjectionError::PermissionDenied(format!( + "{} is not executable", + binary_name + ))); } - + // For ydotool specifically, check uinput access if binary_name == "ydotool" { Self::check_uinput_access()?; } - + Ok(()) } - + /// Check if we have access to /dev/uinput (required for ydotool) fn check_uinput_access() -> Result<(), InjectionError> { use std::fs::OpenOptions; - + // Check if we can open /dev/uinput match OpenOptions::new().write(true).open("/dev/uinput") { Ok(_) => Ok(()), Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => { // Check if user is in input group - let groups = Command::new("groups") - .output() - .map_err(|e| InjectionError::Process(format!("Failed to check groups: {}", e)))?; - + let groups = Command::new("groups").output().map_err(|e| { + InjectionError::Process(format!("Failed to check groups: {}", e)) + })?; + let groups_str = String::from_utf8_lossy(&groups.stdout); if !groups_str.contains("input") { return Err(InjectionError::PermissionDenied( - "User not in 'input' group. Run: sudo usermod -a -G input $USER".to_string() + "User not in 'input' group. Run: sudo usermod -a -G input $USER" + .to_string(), )); } - + Err(InjectionError::PermissionDenied( - "/dev/uinput access denied. ydotool daemon may not be running".to_string() + "/dev/uinput access denied. ydotool daemon may not be running".to_string(), )) } - Err(e) => Err(InjectionError::Io(e)) + Err(e) => Err(InjectionError::Io(e)), } } /// Trigger paste action using ydotool (Ctrl+V) async fn trigger_paste(&self) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + // Use tokio to run the command with timeout let output = timeout( Duration::from_millis(self.config.paste_action_timeout_ms), @@ -122,26 +129,29 @@ impl YdotoolInjector { .args(["key", "ctrl+v"]) .output(), ) - .await - .map_err(|_| InjectionError::Timeout(self.config.paste_action_timeout_ms))? - .map_err(|e| InjectionError::Process(format!("{e}")))?; - + .await + .map_err(|_| InjectionError::Timeout(self.config.paste_action_timeout_ms))? + .map_err(|e| InjectionError::Process(format!("{e}")))?; + if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("ydotool key failed: {}", stderr))); + return Err(InjectionError::MethodFailed(format!( + "ydotool key failed: {}", + stderr + ))); } - + let _duration = start.elapsed().as_millis() as u64; // TODO: Fix metrics - self.metrics.record_success requires &mut self info!("Successfully triggered paste action via ydotool"); - + Ok(()) } /// Type text directly using ydotool async fn _type_text(&self, text: &str) -> Result<(), InjectionError> { let start = std::time::Instant::now(); - + // Use tokio to run the command with timeout let output = timeout( Duration::from_millis(self.config.per_method_timeout_ms), @@ -149,19 +159,22 @@ impl YdotoolInjector { .args(["type", "--delay", "10", text]) .output(), ) - .await - .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? - .map_err(|e| InjectionError::Process(format!("{e}")))?; - + .await + .map_err(|_| InjectionError::Timeout(self.config.per_method_timeout_ms))? + .map_err(|e| InjectionError::Process(format!("{e}")))?; + if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); - return Err(InjectionError::MethodFailed(format!("ydotool type failed: {}", stderr))); + return Err(InjectionError::MethodFailed(format!( + "ydotool type failed: {}", + stderr + ))); } - + let _duration = start.elapsed().as_millis() as u64; // TODO: Fix metrics - self.metrics.record_success requires &mut self info!("Successfully typed text via ydotool ({} chars)", text.len()); - + Ok(()) } } @@ -195,4 +208,4 @@ impl TextInjector for YdotoolInjector { fn metrics(&self) -> &InjectionMetrics { &self.metrics } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad-silero/src/config.rs b/crates/coldvox-vad-silero/src/config.rs index f427b273..7d04e186 100644 --- a/crates/coldvox-vad-silero/src/config.rs +++ b/crates/coldvox-vad-silero/src/config.rs @@ -18,4 +18,4 @@ impl Default for SileroConfig { window_size_samples: FRAME_SIZE_SAMPLES, } } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad-silero/src/lib.rs b/crates/coldvox-vad-silero/src/lib.rs index 62b51c78..2c6724be 100644 --- a/crates/coldvox-vad-silero/src/lib.rs +++ b/crates/coldvox-vad-silero/src/lib.rs @@ -1,8 +1,8 @@ +pub mod config; #[cfg(feature = "silero")] pub mod silero_wrapper; -pub mod config; pub use config::SileroConfig; #[cfg(feature = "silero")] -pub use silero_wrapper::SileroEngine; \ No newline at end of file +pub use silero_wrapper::SileroEngine; diff --git a/crates/coldvox-vad-silero/src/silero_wrapper.rs b/crates/coldvox-vad-silero/src/silero_wrapper.rs index d660570a..4b981a5d 100644 --- a/crates/coldvox-vad-silero/src/silero_wrapper.rs +++ b/crates/coldvox-vad-silero/src/silero_wrapper.rs @@ -1,7 +1,7 @@ -use coldvox_vad::{VadEngine, VadEvent, VadState}; use crate::config::SileroConfig; -use voice_activity_detector::VoiceActivityDetector; +use coldvox_vad::{VadEngine, VadEvent, VadState}; use std::time::Instant; +use voice_activity_detector::VoiceActivityDetector; #[derive(Copy, Clone, Default)] struct I16Sample(i16); @@ -30,7 +30,7 @@ impl SileroEngine { .chunk_size(512_usize) .build() .map_err(|e| format!("Failed to create Silero VAD: {}", e))?; - + Ok(Self { detector, config, @@ -42,10 +42,10 @@ impl SileroEngine { last_probability: 0.0, }) } - + fn process_probability(&mut self, probability: f32) -> Option { let timestamp_ms = self.frames_processed * 512 * 1000 / 16000; - + match self.current_state { VadState::Silence => { if probability >= self.config.threshold { @@ -53,11 +53,12 @@ impl SileroEngine { self.speech_start_time = Some(Instant::now()); self.speech_start_timestamp_ms = timestamp_ms; } else if let Some(start) = self.speech_start_time { - if start.elapsed().as_millis() >= self.config.min_speech_duration_ms as u128 { + if start.elapsed().as_millis() >= self.config.min_speech_duration_ms as u128 + { self.current_state = VadState::Speech; self.speech_start_time = None; self.silence_start_time = None; - + return Some(VadEvent::SpeechStart { timestamp_ms: self.speech_start_timestamp_ms, energy_db: probability_to_db(probability), @@ -73,13 +74,15 @@ impl SileroEngine { if self.silence_start_time.is_none() { self.silence_start_time = Some(Instant::now()); } else if let Some(start) = self.silence_start_time { - if start.elapsed().as_millis() >= self.config.min_silence_duration_ms as u128 { + if start.elapsed().as_millis() + >= self.config.min_silence_duration_ms as u128 + { self.current_state = VadState::Silence; self.speech_start_time = None; self.silence_start_time = None; - + let duration_ms = timestamp_ms - self.speech_start_timestamp_ms; - + return Some(VadEvent::SpeechEnd { timestamp_ms, duration_ms, @@ -92,7 +95,7 @@ impl SileroEngine { } } } - + None } } @@ -100,19 +103,20 @@ impl SileroEngine { impl VadEngine for SileroEngine { fn process(&mut self, frame: &[i16]) -> Result, String> { if frame.len() != 512 { - return Err(format!("Silero VAD requires 512 samples, got {}", frame.len())); + return Err(format!( + "Silero VAD requires 512 samples, got {}", + frame.len() + )); } - - let probability = self - .detector - .predict(frame.iter().map(|&s| I16Sample(s))); - + + let probability = self.detector.predict(frame.iter().map(|&s| I16Sample(s))); + self.last_probability = probability; self.frames_processed += 1; - + Ok(self.process_probability(probability)) } - + fn reset(&mut self) { self.detector.reset(); self.current_state = VadState::Silence; @@ -122,15 +126,15 @@ impl VadEngine for SileroEngine { self.frames_processed = 0; self.last_probability = 0.0; } - + fn current_state(&self) -> VadState { self.current_state } - + fn required_sample_rate(&self) -> u32 { 16000 } - + fn required_frame_size_samples(&self) -> usize { 512 } @@ -173,7 +177,13 @@ mod tests { let too_long = vec![0i16; 513]; let err_short = engine.process(&too_short).unwrap_err(); let err_long = engine.process(&too_long).unwrap_err(); - assert!(err_short.contains("512"), "Error should mention required frame size: {err_short}"); - assert!(err_long.contains("512"), "Error should mention required frame size: {err_long}"); + assert!( + err_short.contains("512"), + "Error should mention required frame size: {err_short}" + ); + assert!( + err_long.contains("512"), + "Error should mention required frame size: {err_long}" + ); } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/config.rs b/crates/coldvox-vad/src/config.rs index c28971a5..ec44714c 100644 --- a/crates/coldvox-vad/src/config.rs +++ b/crates/coldvox-vad/src/config.rs @@ -1,10 +1,11 @@ use serde::{Deserialize, Serialize}; + use super::constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; #[derive(Debug, Clone, Copy, PartialEq, Serialize, Deserialize)] pub enum VadMode { - Level3, // Energy-based VAD - INTENTIONALLY DISABLED (see Level3Config.enabled) - Silero, // ML-based VAD using ONNX - DEFAULT ACTIVE VAD + Level3, // Energy-based VAD - INTENTIONALLY DISABLED (see Level3Config.enabled) + Silero, // ML-based VAD using ONNX - DEFAULT ACTIVE VAD } impl Default for VadMode { @@ -74,7 +75,7 @@ pub struct UnifiedVadConfig { impl Default for UnifiedVadConfig { fn default() -> Self { Self { - mode: VadMode::default(), // Uses Silero by default now + mode: VadMode::default(), // Uses Silero by default now level3: Level3Config::default(), silero: SileroConfig::default(), // Align default frame size with default engine (Silero) requirement @@ -89,4 +90,4 @@ impl UnifiedVadConfig { pub fn frame_duration_ms(&self) -> f32 { (self.frame_size_samples as f32 * 1000.0) / self.sample_rate_hz as f32 } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/constants.rs b/crates/coldvox-vad/src/constants.rs index 02ab35e2..99e99e9a 100644 --- a/crates/coldvox-vad/src/constants.rs +++ b/crates/coldvox-vad/src/constants.rs @@ -8,4 +8,4 @@ pub const SAMPLE_RATE_HZ: u32 = 16_000; pub const FRAME_SIZE_SAMPLES: usize = 512; /// Frame duration in milliseconds (derived constant) -pub const FRAME_DURATION_MS: f32 = (FRAME_SIZE_SAMPLES as f32 * 1000.0) / SAMPLE_RATE_HZ as f32; \ No newline at end of file +pub const FRAME_DURATION_MS: f32 = (FRAME_SIZE_SAMPLES as f32 * 1000.0) / SAMPLE_RATE_HZ as f32; diff --git a/crates/coldvox-vad/src/energy.rs b/crates/coldvox-vad/src/energy.rs index 3baf092f..740a4faf 100644 --- a/crates/coldvox-vad/src/energy.rs +++ b/crates/coldvox-vad/src/energy.rs @@ -6,16 +6,14 @@ pub struct EnergyCalculator { impl EnergyCalculator { pub fn new() -> Self { - Self { - epsilon: 1e-10, - } + Self { epsilon: 1e-10 } } - + pub fn calculate_rms(&self, frame: &[i16]) -> f32 { if frame.is_empty() { return 0.0; } - + let sum_squares: i64 = frame .iter() .map(|&sample| { @@ -23,23 +21,23 @@ impl EnergyCalculator { s * s }) .sum(); - + let mean_square = sum_squares as f64 / frame.len() as f64; (mean_square.sqrt() / 32768.0) as f32 } - + pub fn rms_to_dbfs(&self, rms: f32) -> f32 { if rms <= self.epsilon { return -100.0; } 20.0 * rms.log10() } - + pub fn calculate_dbfs(&self, frame: &[i16]) -> f32 { let rms = self.calculate_rms(frame); self.rms_to_dbfs(rms) } - + pub fn calculate_energy_ratio(&self, frame: &[i16], reference_db: f32) -> f32 { let current_db = self.calculate_dbfs(frame); current_db - reference_db @@ -56,15 +54,15 @@ impl Default for EnergyCalculator { mod tests { use super::*; use crate::constants::FRAME_SIZE_SAMPLES; - + #[test] fn test_silence_returns_low_dbfs() { let calc = EnergyCalculator::new(); - let silence = vec![0i16; FRAME_SIZE_SAMPLES]; + let silence = vec![0i16; FRAME_SIZE_SAMPLES]; let db = calc.calculate_dbfs(&silence); assert!(db <= -100.0); } - + #[test] fn test_full_scale_returns_zero_dbfs() { let calc = EnergyCalculator::new(); @@ -72,20 +70,20 @@ mod tests { let db = calc.calculate_dbfs(&full_scale); assert!((db - 0.0).abs() < 0.1); } - + #[test] fn test_rms_calculation() { let calc = EnergyCalculator::new(); - + let sine_wave: Vec = (0..FRAME_SIZE_SAMPLES) .map(|i| { let phase = 2.0 * std::f32::consts::PI * i as f32 / FRAME_SIZE_SAMPLES as f32; (phase.sin() * 16384.0) as i16 }) .collect(); - + let rms = calc.calculate_rms(&sine_wave); - + assert!((rms - 0.354).abs() < 0.01); } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/engine.rs b/crates/coldvox-vad/src/engine.rs index ab568164..8800cac2 100644 --- a/crates/coldvox-vad/src/engine.rs +++ b/crates/coldvox-vad/src/engine.rs @@ -10,4 +10,4 @@ pub trait VadEngine: Send { fn current_state(&self) -> VadState; fn required_sample_rate(&self) -> u32; fn required_frame_size_samples(&self) -> usize; -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/level3.rs b/crates/coldvox-vad/src/level3.rs index a4ccda2a..e92f16fb 100644 --- a/crates/coldvox-vad/src/level3.rs +++ b/crates/coldvox-vad/src/level3.rs @@ -1,6 +1,6 @@ use crate::{ - engine::VadEngine, energy::EnergyCalculator, + engine::VadEngine, state::VadStateMachine, threshold::AdaptiveThreshold, types::{VadConfig, VadEvent, VadMetrics, VadState}, @@ -11,7 +11,6 @@ use crate::{ // This energy-based VAD implementation is kept for: // 1. Fallback capability if Silero fails // 2. Testing and comparison purposes -// 3. Potential future hybrid VAD approaches // To enable: Set config.level3.enabled = true (see vad/config.rs) pub struct Level3Vad { config: VadConfig, @@ -80,7 +79,8 @@ impl VadProcessor for Level3Vad { VadState::Speech => !self.threshold.should_deactivate(energy_db), }; - self.threshold.update(energy_db, current_state == VadState::Speech); + self.threshold + .update(energy_db, current_state == VadState::Speech); let event = self.state_machine.process(is_speech_candidate, energy_db); @@ -104,19 +104,19 @@ impl VadEngine for Level3Vad { fn process(&mut self, frame: &[i16]) -> Result, String> { VadProcessor::process(self, frame) } - + fn reset(&mut self) { VadProcessor::reset(self) } - + fn current_state(&self) -> VadState { VadProcessor::current_state(self) } - + fn required_sample_rate(&self) -> u32 { self.config.sample_rate_hz } - + fn required_frame_size_samples(&self) -> usize { self.config.frame_size_samples } @@ -326,4 +326,4 @@ mod tests { assert_eq!(vad.metrics().frames_processed, 0); assert_eq!(VadProcessor::current_state(&vad), VadState::Silence); } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/lib.rs b/crates/coldvox-vad/src/lib.rs index b2054036..40674623 100644 --- a/crates/coldvox-vad/src/lib.rs +++ b/crates/coldvox-vad/src/lib.rs @@ -1,7 +1,7 @@ pub mod config; pub mod constants; -pub mod engine; pub mod energy; +pub mod engine; pub mod state; pub mod threshold; pub mod types; @@ -9,11 +9,11 @@ pub mod types; #[cfg(feature = "level3")] pub mod level3; -// Core exports -pub use constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ, FRAME_DURATION_MS}; -pub use types::{VadConfig, VadEvent, VadState, VadMetrics}; +// Core exports - grouped and sorted alphabetically pub use config::{UnifiedVadConfig, VadMode}; +pub use constants::{FRAME_DURATION_MS, FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; pub use engine::VadEngine; +pub use types::{VadConfig, VadEvent, VadMetrics, VadState}; // Level3 VAD exports when feature is enabled #[cfg(feature = "level3")] @@ -24,4 +24,4 @@ pub trait VadProcessor: Send { fn process(&mut self, frame: &[i16]) -> Result, String>; fn reset(&mut self); fn current_state(&self) -> VadState; -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/state.rs b/crates/coldvox-vad/src/state.rs index 2534622d..ab60a85a 100644 --- a/crates/coldvox-vad/src/state.rs +++ b/crates/coldvox-vad/src/state.rs @@ -3,19 +3,19 @@ use std::time::Instant; pub struct VadStateMachine { state: VadState, - + speech_frames: u32, - + silence_frames: u32, - + speech_debounce_frames: u32, - + silence_debounce_frames: u32, - + speech_start_time: Option, - + frames_since_start: u64, - + frame_duration_ms: f32, } @@ -32,25 +32,21 @@ impl VadStateMachine { frame_duration_ms: config.frame_duration_ms(), } } - - pub fn process( - &mut self, - is_speech_candidate: bool, - energy_db: f32, - ) -> Option { + + pub fn process(&mut self, is_speech_candidate: bool, energy_db: f32) -> Option { self.frames_since_start += 1; - + match self.state { VadState::Silence => { if is_speech_candidate { self.speech_frames += 1; self.silence_frames = 0; - + if self.speech_frames >= self.speech_debounce_frames { self.state = VadState::Speech; self.speech_start_time = Some(Instant::now()); self.speech_frames = 0; - + return Some(VadEvent::SpeechStart { timestamp_ms: self.current_timestamp_ms(), energy_db, @@ -60,25 +56,25 @@ impl VadStateMachine { self.speech_frames = 0; } } - + VadState::Speech => { if !is_speech_candidate { self.silence_frames += 1; self.speech_frames = 0; - + if self.silence_frames >= self.silence_debounce_frames { self.state = VadState::Silence; - + let duration_ms = if let Some(start) = self.speech_start_time { let elapsed = start.elapsed().as_millis() as u64; elapsed.max(1) } else { (self.silence_frames as f32 * self.frame_duration_ms).max(1.0) as u64 }; - + self.speech_start_time = None; self.silence_frames = 0; - + return Some(VadEvent::SpeechEnd { timestamp_ms: self.current_timestamp_ms(), duration_ms, @@ -90,14 +86,14 @@ impl VadStateMachine { } } } - + None } - + pub fn current_state(&self) -> VadState { self.state } - + pub fn reset(&mut self) { self.state = VadState::Silence; self.speech_frames = 0; @@ -105,26 +101,26 @@ impl VadStateMachine { self.speech_start_time = None; self.frames_since_start = 0; } - + fn current_timestamp_ms(&self) -> u64 { (self.frames_since_start as f32 * self.frame_duration_ms) as u64 } - + pub fn force_end(&mut self, energy_db: f32) -> Option { if self.state == VadState::Speech { self.state = VadState::Silence; - + let duration_ms = if let Some(start) = self.speech_start_time { let elapsed = start.elapsed().as_millis() as u64; elapsed.max(1) } else { (self.frames_since_start as f32 * self.frame_duration_ms * 0.1).max(1.0) as u64 }; - + self.speech_start_time = None; self.speech_frames = 0; self.silence_frames = 0; - + return Some(VadEvent::SpeechEnd { timestamp_ms: self.current_timestamp_ms(), duration_ms, @@ -139,15 +135,15 @@ impl VadStateMachine { mod tests { use super::*; use crate::constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; - + #[test] fn test_initial_state() { let config = VadConfig::default(); let state_machine = VadStateMachine::new(&config); - + assert_eq!(state_machine.current_state(), VadState::Silence); } - + #[test] fn test_speech_onset_debouncing() { let config = VadConfig { @@ -157,16 +153,16 @@ mod tests { ..Default::default() }; let mut state_machine = VadStateMachine::new(&config); - + assert_eq!(state_machine.process(true, -30.0), None); assert_eq!(state_machine.current_state(), VadState::Silence); - + assert_eq!(state_machine.process(true, -30.0), None); assert_eq!(state_machine.current_state(), VadState::Silence); - + assert_eq!(state_machine.process(true, -30.0), None); assert_eq!(state_machine.current_state(), VadState::Silence); - + // Speech should trigger on the 4th frame (100ms debounce with ~32ms frames) if let Some(VadEvent::SpeechStart { .. }) = state_machine.process(true, -30.0) { assert_eq!(state_machine.current_state(), VadState::Speech); @@ -174,7 +170,7 @@ mod tests { panic!("Expected SpeechStart event"); } } - + #[test] fn test_speech_offset_debouncing() { let config = VadConfig { @@ -185,27 +181,26 @@ mod tests { ..Default::default() }; let mut state_machine = VadStateMachine::new(&config); - + for _ in 0..3 { state_machine.process(true, -30.0); } assert_eq!(state_machine.current_state(), VadState::Speech); - + for _ in 0..3 { assert_eq!(state_machine.process(false, -50.0), None); assert_eq!(state_machine.current_state(), VadState::Speech); } - + // SpeechEnd should trigger on the 4th silence frame (100ms debounce with ~32ms frames) - if let Some(VadEvent::SpeechEnd { duration_ms, .. }) = state_machine.process(false, -50.0) - { + if let Some(VadEvent::SpeechEnd { duration_ms, .. }) = state_machine.process(false, -50.0) { assert_eq!(state_machine.current_state(), VadState::Silence); assert!(duration_ms > 0); } else { panic!("Expected SpeechEnd event"); } } - + #[test] fn test_speech_continuation() { let config = VadConfig { @@ -214,17 +209,17 @@ mod tests { ..Default::default() }; let mut state_machine = VadStateMachine::new(&config); - + for _ in 0..3 { state_machine.process(true, -30.0); } assert_eq!(state_machine.current_state(), VadState::Speech); - + state_machine.process(false, -50.0); state_machine.process(false, -50.0); - + state_machine.process(true, -30.0); - + assert_eq!(state_machine.current_state(), VadState::Speech); } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/threshold.rs b/crates/coldvox-vad/src/threshold.rs index cfc899d8..73fbd0d8 100644 --- a/crates/coldvox-vad/src/threshold.rs +++ b/crates/coldvox-vad/src/threshold.rs @@ -2,15 +2,15 @@ use crate::types::VadConfig; pub struct AdaptiveThreshold { noise_floor_db: f32, - + ema_alpha: f32, - + onset_offset_db: f32, - + offset_offset_db: f32, - + min_floor_db: f32, - + max_floor_db: f32, } @@ -25,38 +25,38 @@ impl AdaptiveThreshold { max_floor_db: -20.0, } } - + pub fn update(&mut self, energy_db: f32, is_speech: bool) { if !is_speech && energy_db > self.min_floor_db && energy_db < self.max_floor_db { - self.noise_floor_db = (1.0 - self.ema_alpha) * self.noise_floor_db - + self.ema_alpha * energy_db; - + self.noise_floor_db = + (1.0 - self.ema_alpha) * self.noise_floor_db + self.ema_alpha * energy_db; + self.noise_floor_db = self .noise_floor_db .clamp(self.min_floor_db, self.max_floor_db); } } - + pub fn onset_threshold(&self) -> f32 { self.noise_floor_db + self.onset_offset_db } - + pub fn offset_threshold(&self) -> f32 { self.noise_floor_db + self.offset_offset_db } - + pub fn current_floor(&self) -> f32 { self.noise_floor_db } - + pub fn should_activate(&self, energy_db: f32) -> bool { energy_db >= self.onset_threshold() } - + pub fn should_deactivate(&self, energy_db: f32) -> bool { energy_db < self.offset_threshold() } - + pub fn reset(&mut self, initial_floor_db: f32) { self.noise_floor_db = initial_floor_db.clamp(self.min_floor_db, self.max_floor_db); } @@ -65,17 +65,17 @@ impl AdaptiveThreshold { #[cfg(test)] mod tests { use super::*; - + #[test] fn test_threshold_initialization() { let config = VadConfig::default(); let threshold = AdaptiveThreshold::new(&config); - + assert_eq!(threshold.current_floor(), -50.0); assert_eq!(threshold.onset_threshold(), -41.0); assert_eq!(threshold.offset_threshold(), -44.0); } - + #[test] fn test_noise_floor_adaptation() { let config = VadConfig { @@ -83,36 +83,36 @@ mod tests { ..Default::default() }; let mut threshold = AdaptiveThreshold::new(&config); - + threshold.update(-40.0, false); assert!((threshold.current_floor() - (-49.0)).abs() < 0.01); - + threshold.update(-40.0, false); assert!((threshold.current_floor() - (-48.1)).abs() < 0.01); } - + #[test] fn test_no_update_during_speech() { let config = VadConfig::default(); let mut threshold = AdaptiveThreshold::new(&config); - + let initial_floor = threshold.current_floor(); - + threshold.update(-30.0, true); threshold.update(-25.0, true); - + assert_eq!(threshold.current_floor(), initial_floor); } - + #[test] fn test_activation_deactivation() { let config = VadConfig::default(); let threshold = AdaptiveThreshold::new(&config); - + assert!(threshold.should_activate(-40.0)); assert!(!threshold.should_activate(-42.0)); - + assert!(threshold.should_deactivate(-45.0)); assert!(!threshold.should_deactivate(-43.0)); } -} \ No newline at end of file +} diff --git a/crates/coldvox-vad/src/types.rs b/crates/coldvox-vad/src/types.rs index f5f20f59..27002746 100644 --- a/crates/coldvox-vad/src/types.rs +++ b/crates/coldvox-vad/src/types.rs @@ -1,4 +1,5 @@ use serde::{Deserialize, Serialize}; + use super::constants::{FRAME_SIZE_SAMPLES, SAMPLE_RATE_HZ}; #[derive(Debug, Clone, Copy, PartialEq)] @@ -29,19 +30,19 @@ impl Default for VadState { #[derive(Debug, Clone, Serialize, Deserialize)] pub struct VadConfig { pub onset_threshold_db: f32, - + pub offset_threshold_db: f32, - + pub ema_alpha: f32, - + pub speech_debounce_ms: u32, - + pub silence_debounce_ms: u32, - + pub initial_floor_db: f32, - + pub frame_size_samples: usize, - + pub sample_rate_hz: u32, } @@ -64,11 +65,11 @@ impl VadConfig { pub fn frame_duration_ms(&self) -> f32 { (self.frame_size_samples as f32 * 1000.0) / self.sample_rate_hz as f32 } - + pub fn speech_debounce_frames(&self) -> u32 { (self.speech_debounce_ms as f32 / self.frame_duration_ms()).ceil() as u32 } - + pub fn silence_debounce_frames(&self) -> u32 { (self.silence_debounce_ms as f32 / self.frame_duration_ms()).ceil() as u32 } @@ -77,14 +78,14 @@ impl VadConfig { #[derive(Debug, Clone, Default)] pub struct VadMetrics { pub frames_processed: u64, - + pub speech_segments: u64, - + pub total_speech_ms: u64, - + pub total_silence_ms: u64, - + pub current_noise_floor_db: f32, - + pub last_energy_db: f32, -} \ No newline at end of file +} diff --git a/deny.toml b/deny.toml new file mode 100644 index 00000000..7b5b6a79 --- /dev/null +++ b/deny.toml @@ -0,0 +1,17 @@ +[licenses] +unlawful = ["GPL-3.0", "AGPL-3.0"] +allow = ["MIT", "Apache-2.0", "BSD-3-Clause", "ISC", "Unicode-DFS-2016"] + +[bans] +multiple-versions = "warn" +highlighted = ["openssl", "native-tls"] +skip = [] # Add crates to skip if needed + +[advisories] +db-path = "~/.cargo/advisory-db" +db-urls = ["https://github.com/rustsec/advisory-db"] +yank = "warn" + +[sources] +unknown-registry = "warn" +unknown-git = "warn" \ No newline at end of file diff --git a/docs/Chunking.md b/docs/Chunking.md deleted file mode 100644 index 24c3ab1c..00000000 --- a/docs/Chunking.md +++ /dev/null @@ -1,89 +0,0 @@ -# Audio Chunking and Centralized Resampling - -This note describes the fixed-size frame chunker introduced in `crates/app/src/audio/chunker.rs`. - -## Purpose - -CPAL delivers input in variable callback sizes and device‑native sample -rates, but our downstream (VAD + STT) requires normalized, fixed frames: - -- Sample rate: 16 kHz -- Frame size: 512 samples -- Non-overlapping hop (512) - -The chunker converts arbitrary device‑native `AudioCapture` frames into -exact 512‑sample frames suitable for the VAD/STT pipeline, handling -downmix + resample centrally. - -## Contract - -- Input: Device‑native mono/stereo i16 PCM, read via `FrameReader` from the - rtrb ring buffer as `audio::capture::AudioFrame`. -- Output: Non-overlapping frames of exactly 512 samples, delivered as - `audio::vad_processor::AudioFrame` (data + timestamp_ms). -- Timestamps: Derived from the emitted-sample cursor at the configured - sample rate (not from wall-clock), matching Silero’s expectation. -- Resampling & Downmix: Centralized in the chunker. The chunker downmixes - stereo→mono (averaging) and resamples to 16 kHz using a streaming sinc - resampler. Capture writes device‑native data; all consumers receive 16 kHz - mono 512‑sample frames. -- Overlap: Not supported initially. If overlap is introduced later, - timestamp math in both the chunker and Silero wrapper should be revisited. - -## Design - -- Internal buffer: `VecDeque` accumulates samples. Incoming device - samples are downmixed (if needed) and resampled to 16 kHz before chunking. -- Emission: Pops exactly 512 samples per output frame; updates a - `samples_emitted` counter to compute - `timestamp_ms = samples_emitted * 1000 / sample_rate_hz`. -- Backpressure: Output uses tokio broadcast; if there are no subscribers, - send returns Err and we warn once per burst to avoid log spam. -- Threading: Runs as a tokio task; drains frames from - `FrameReader::read_frame()` in a loop and sleeps briefly when no data is - available. - -## Usage - -1. Start `AudioCaptureThread::spawn(...)` to feed the ring buffer and expose - a device config broadcast. -2. Build a `FrameReader::new(consumer, device_cfg.sample_rate, - device_cfg.channels, cap, Some(metrics))`. -3. Create a broadcast channel for `vad_processor::AudioFrame` and build - `AudioChunker::new(...)` with `ChunkerConfig { frame_size_samples: 512, - sample_rate_hz: 16000, resampler_quality }`. -4. Wire device config updates via `.with_device_config( - device_config_rx.resubscribe() - )` so the chunker adjusts if the input rate/channels change. -5. Spawn the chunker and subscribe in downstream processors (VAD/STT) via - `audio_tx.subscribe()`. - -## Edge Cases - -- Short or silent callbacks: Just buffer until 512 samples are available. -- Stream stalls: The chunker loop yields with a short sleep when no data is - available; no frames are emitted during stalls. -- Channel disconnects: If the input channel disconnects, the chunker logs - and exits. -- Mismatched input format: Handled here. The chunker reconfigures its - resampler if the input device sample rate changes (e.g., after a capture - restart). Downmix is always applied when channels > 1. - -## Resampler Quality - -The chunker supports `ResamplerQuality` presets: `fast`, `balanced` -(default), and `quality`. Choose at runtime via CLI: - -```bash -cargo run -- --resampler-quality fast # or balanced / quality -``` - -## Future Extensions - -- Optional overlap (e.g., 50%) with consistent timestamp logic. -- Capture-time resampling is deprecated. Centralize resampling/downmixing in - the chunker for consistency and reduced callback load. If an emergency - fallback is ever needed, gate it behind a feature flag. -- Metrics: counters for frames produced, drops, and warnings. -- Pre-roll tap for PTT (reusing the same accumulation buffer with a - time-based window). diff --git a/docs/ColdVox_CI_Implementation.md b/docs/ColdVox_CI_Implementation.md deleted file mode 100644 index 1e811ca0..00000000 --- a/docs/ColdVox_CI_Implementation.md +++ /dev/null @@ -1,467 +0,0 @@ -# ColdVox CI Implementation - -**Last Updated:** 2025-09-01 -**Repository:** ColdVox -**Status:** Migration Planned - -## Overview - -This document describes how ColdVox implements CI/CD using the organization's reusable workflow plus project-specific additions. ColdVox requires specialized handling for audio dependencies (ALSA) and feature-gated functionality (Vosk STT) that requires system libraries and models. - -## Current State - -### Existing CI Components - -1. **ci.yml**: Ubuntu-only CI with fmt, clippy, build, test -2. **Composite action**: Local reusability via `.github/actions/rust-check/` -3. **release.yml**: Automated releases with release-plz -4. **No Vosk in baseline**: Correctly excludes system-dependent features - -### What Works Well - -- Minimal CI that passes reliably -- ALSA dependencies properly installed -- Vosk excluded from baseline (no features specified) -- PAT fallback pattern in release workflow -- Concurrency control - -### Technical Debt - -- Actions not fully SHA-pinned in ci.yml -- No cargo audit integration -- No cross-platform testing -- No Vosk integration testing -- Local composite action (should migrate to org workflow) - -## Target Architecture - -``` -ColdVox CI Structure: -.github/ -├── workflows/ -│ ├── ci.yml # 10-line shim calling org workflow -│ ├── vosk-integration.yml # Specialized Vosk testing -│ ├── release.yml # Release automation (unchanged) -│ └── benchmarks.yml # Performance tracking (future) -└── dependabot.yml # Dependency updates -``` - -## Phase 1: Core CI (Using Reusable Workflow) - -### File: `.github/workflows/ci.yml` - -```yaml -name: CI - -on: - pull_request: - push: - branches: [main] - -concurrency: - group: ci-${{ github.ref }} - cancel-in-progress: true - -jobs: - # Core CI using org workflow - common-ci: - uses: coldaine/.github/.github/workflows/lang-ci.yml@v1 - secrets: inherit - with: - run_rust: true - run_python: false - rust_no_default_features: true # Avoid Vosk dependency - install_alsa: true # Audio library requirement - test_timeout_minutes: 30 - - # Quick smoke test with default features - feature-check: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 - with: - toolchain: stable - - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - - name: Install ALSA - run: | - sudo apt-get update - sudo apt-get install -y libasound2-dev - - name: Check text-injection feature - run: cargo check --features text-injection - - name: Check examples feature - run: cargo check --features examples -``` - -## Phase 2: Vosk Integration Testing - -### File: `.github/workflows/vosk-integration.yml` - -```yaml -name: Vosk Integration Tests - -on: - # Run on PRs that modify STT code - pull_request: - paths: - - 'crates/app/src/stt/**' - - 'crates/app/Cargo.toml' - - '.github/workflows/vosk-integration.yml' - # Weekly scheduled run - schedule: - - cron: '0 0 * * 0' - # Manual trigger - workflow_dispatch: - -jobs: - vosk-tests: - name: Vosk STT Integration - runs-on: ubuntu-latest - timeout-minutes: 45 - - steps: - - name: Checkout code - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - with: - fetch-depth: 0 - - - name: Install Rust - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 - with: - toolchain: stable - - - name: Cache Cargo - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - - - name: Install system dependencies - run: | - sudo apt-get update - sudo apt-get install -y \ - libasound2-dev \ - python3 \ - python3-pip \ - wget \ - unzip - - - name: Cache Vosk model - id: cache-vosk-model - uses: actions/cache@13aacd865c20de90d75de3b17ebe84f7a17d57d2 # v4.0.0 - with: - path: models/vosk-model-small-en-us-0.15 - key: vosk-model-small-en-us-0.15 - - - name: Download Vosk model - if: steps.cache-vosk-model.outputs.cache-hit != 'true' - run: | - mkdir -p models - cd models - wget https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip - unzip vosk-model-small-en-us-0.15.zip - rm vosk-model-small-en-us-0.15.zip - - - name: Build with Vosk - run: cargo build --features vosk - - - name: Run Vosk tests - env: - VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 - run: | - cargo test --features vosk -- --nocapture - - - name: Run end-to-end WAV pipeline test - env: - VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 - run: | - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture - - - name: Test Vosk example - env: - VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 - run: | - cargo run --example vosk_test --features vosk,examples -``` - -## Phase 3: Cross-Platform Testing - -### File: `.github/workflows/cross-platform.yml` - -```yaml -name: Cross-Platform Tests - -on: - # Only on release preparation - pull_request: - branches: [main] - types: [labeled] - # Manual trigger - workflow_dispatch: - -jobs: - matrix-test: - if: contains(github.event.label.name, 'release') || github.event_name == 'workflow_dispatch' - strategy: - matrix: - os: [ubuntu-latest, windows-latest, macos-latest] - rust: [stable] - include: - - os: ubuntu-latest - deps_command: | - sudo apt-get update - sudo apt-get install -y libasound2-dev - - os: macos-latest - deps_command: | - brew install portaudio - - os: windows-latest - deps_command: echo "No additional deps needed" - - runs-on: ${{ matrix.os }} - timeout-minutes: 45 - continue-on-error: true # Don't block on OS-specific issues - - steps: - - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 - with: - toolchain: ${{ matrix.rust }} - - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - - - name: Install platform dependencies - run: ${{ matrix.deps_command }} - - - name: Build - run: cargo build --workspace --no-default-features - - - name: Test - run: cargo test --workspace --no-default-features -``` - -## Phase 4: Performance Monitoring - -### File: `.github/workflows/benchmarks.yml` - -```yaml -name: Benchmarks - -on: - push: - branches: [main] - pull_request: - types: [opened, synchronize] - -jobs: - benchmark: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 - with: - toolchain: stable - - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - - - name: Install ALSA - run: | - sudo apt-get update - sudo apt-get install -y libasound2-dev - - - name: Run benchmarks - run: cargo bench --no-default-features -- --output-format bencher | tee output.txt - - - name: Store benchmark result - uses: benchmark-action/github-action-benchmark@v1 - with: - tool: 'cargo' - output-file-path: output.txt - github-token: ${{ secrets.GITHUB_TOKEN }} - auto-push: false - comment-on-alert: true - alert-threshold: '120%' - fail-on-alert: false -``` - -## Supporting Files - -### File: `.github/dependabot.yml` - -```yaml -version: 2 -updates: - - package-ecosystem: "github-actions" - directory: "/" - schedule: - interval: "weekly" - commit-message: - prefix: "chore(deps)" - labels: - - "dependencies" - - "github-actions" - - - package-ecosystem: "cargo" - directory: "/" - schedule: - interval: "weekly" - commit-message: - prefix: "chore(deps)" - labels: - - "dependencies" - - "rust" -``` - -### File: `release-plz.toml` - -```toml -[workspace] -allow-dirty = false -changelog-include = ["crates/*"] -pr-labels = ["release"] - -[git] -tag-prefix = "v" -release-commit-message = "chore: release {{package_name}} v{{version}}" - -# Prevent crates.io publication -[[package]] -name = "coldvox-app" -publish = false -changelog-path = "crates/app/CHANGELOG.md" -``` - -## Migration Timeline - -### Week 1: Preparation -- [x] Document current state -- [ ] Create org .github repository -- [ ] Implement reusable workflow -- [ ] Tag v1.0.0 and v1 - -### Week 2: Local Hardening -- [ ] Pin all actions to SHAs in current ci.yml -- [ ] Add dependabot.yml -- [ ] Add release-plz.toml -- [ ] Test locally with act - -### Week 3: Migration -- [ ] Replace ci.yml with shim -- [ ] Add vosk-integration.yml -- [ ] Verify all checks pass -- [ ] Update branch protection - -### Week 4: Enhancement -- [ ] Add cross-platform workflow -- [ ] Add benchmark workflow -- [ ] Document in README -- [ ] Remove legacy composite action - -## CI Requirements Summary - -### Baseline (Required) -- ✅ Format checking (cargo fmt) -- ✅ Linting (cargo clippy) -- ✅ Build validation (no default features) -- ✅ Test execution (no default features) -- ✅ Security audit (cargo audit) -- ✅ ALSA installation - -### Extended (Optional) -- ⏳ Vosk integration tests (separate workflow) -- ⏳ Cross-platform matrix (on-demand) -- ⏳ Benchmark tracking (main branch only) -- ⏳ Coverage reporting (future) - -### Excluded from CI -- ❌ Live hardware tests (requires physical microphone) -- ❌ TUI dashboard tests (requires terminal) -- ❌ Text injection tests (requires display server) - -## Local Development - -### Running CI Locally - -```bash -# Install act for local GitHub Actions testing -brew install act # or your package manager - -# Run CI workflow locally -act -W .github/workflows/ci.yml - -# Run with specific event -act pull_request -W .github/workflows/ci.yml -``` - -### Pre-commit Validation - -```bash -# Create pre-commit hook -cat > .git/hooks/pre-commit << 'EOF' -#!/bin/bash -set -e - -echo "Running pre-commit checks..." - -# Format check -cargo fmt --all -- --check - -# Clippy -cargo clippy --workspace --all-targets --no-default-features -- -D warnings - -# Test -cargo test --workspace --no-default-features --lib --bins - -echo "Pre-commit checks passed!" -EOF - -chmod +x .git/hooks/pre-commit -``` - -## Troubleshooting - -### Common Issues - -**Issue**: Vosk tests fail in CI -**Solution**: Ensure VOSK_MODEL_PATH is set and model is cached - -**Issue**: ALSA not found on Linux -**Solution**: Add `install_alsa: true` to workflow inputs - -**Issue**: Windows builds fail -**Solution**: Use `continue-on-error: true` initially, fix incrementally - -**Issue**: Cargo audit fails -**Solution**: Run `cargo update` or add exemptions for false positives - -## Success Metrics - -### Phase 1 (Baseline) -- [ ] CI runs in < 5 minutes -- [ ] Zero false positives -- [ ] All PRs have CI checks - -### Phase 2 (Extended) -- [ ] Vosk tests run weekly -- [ ] Cross-platform validation before releases -- [ ] Benchmark regressions detected - -### Phase 3 (Mature) -- [ ] < 2% flaky test rate -- [ ] 80%+ code coverage -- [ ] Automated dependency updates - -## Appendix: Current vs Future - -### Current ci.yml Structure -- 47 lines -- Local composite action -- Ubuntu-only -- No security scanning - -### Future ci.yml Structure -- 10-15 lines -- Org reusable workflow -- Extensible via inputs -- Security by default - -### Benefits of Migration -1. **Consistency**: Same CI across all projects -2. **Maintenance**: Updates in one place -3. **Security**: Centralized action pinning -4. **Flexibility**: Per-project customization -5. **Scalability**: Easy to add new checks \ No newline at end of file diff --git a/docs/MIGRATION_GUIDE.md b/docs/MIGRATION_GUIDE.md deleted file mode 100644 index 9a7eccdc..00000000 --- a/docs/MIGRATION_GUIDE.md +++ /dev/null @@ -1,150 +0,0 @@ -# ColdVox Migration Guide - -This guide helps existing users migrate to the new workspace-based ColdVox architecture. - -## Overview - -ColdVox has been refactored into a Cargo workspace with multiple specialized crates. This provides better modularity, clearer dependencies, and optional features. - -## Quick Migration - -### If you were using basic commands: - -**Before:** -```bash -cargo run -cargo run --bin mic_probe -``` - -**After:** -```bash -cargo run -p coldvox-app --bin coldvox -cargo run -p coldvox-app --bin mic_probe -``` - -### If you were using feature flags: - -**Before:** -```bash -cargo run --features vosk -``` - -**After:** -```bash -cargo run -p coldvox-app --features vosk -``` - -## Detailed Changes - -### Binary Locations - -| Component | Before | After | -|-----------|---------|--------| -| Main app | `cargo run` | `cargo run -p coldvox-app --bin coldvox` | -| Microphone probe | `cargo run --bin mic_probe` | `cargo run -p coldvox-app --bin mic_probe` | -| TUI dashboard | `cargo run --bin tui_dashboard` | `cargo run -p coldvox-app --bin tui_dashboard` | -| Examples | `cargo run --example ` | `cargo run -p coldvox-app --example ` | - -### Feature Flags - -All feature flags remain the same but must be specified with the app crate: - -| Feature | Usage | -|---------|--------| -| `vosk` | `cargo run -p coldvox-app --features vosk` | -| `text-injection` | `cargo run -p coldvox-app --features text-injection` | -| `examples` | `cargo run -p coldvox-app --features examples` | -| `live-hardware-tests` | `cargo run -p coldvox-app --features live-hardware-tests` | - -### Multiple features: -```bash -cargo run -p coldvox-app --features vosk,text-injection -``` - -### Building and Testing - -**Before:** -```bash -cargo build -cargo test -cargo clippy -``` - -**After:** -```bash -cargo build --workspace -cargo test --workspace -cargo clippy --workspace -``` - -Or for specific crates: -```bash -cargo build -p coldvox-app -cargo test -p coldvox-foundation -``` - -## New Capabilities - -### Workspace Benefits - -1. **Modular Dependencies**: Individual crates have minimal, focused dependencies -2. **Optional Features**: STT and text injection are now truly optional -3. **Better Testing**: Each crate can be tested independently -4. **Clearer Architecture**: Separation of concerns across crates - -### Individual Crate Usage - -You can now depend on specific ColdVox functionality in your projects: - -```toml -[dependencies] -coldvox-audio = { path = "path/to/coldvox/crates/coldvox-audio" } -coldvox-foundation = { path = "path/to/coldvox/crates/coldvox-foundation" } -``` - -## Configuration Changes - -### Environment Variables -All environment variables remain the same: -- `RUST_LOG`: Logging level control -- `VOSK_MODEL_PATH`: Vosk model directory - -### CLI Arguments -Most CLI arguments are unchanged, but some STT and text-injection specific arguments now require their respective feature flags to be enabled. - -## Troubleshooting Migration Issues - -### "Package not found" errors -Make sure to use `-p coldvox-app` to specify the application crate. - -### Missing feature errors -Features must be specified on the app crate: `--features vosk` becomes `-p coldvox-app --features vosk`. - -### Build errors -The workspace structure requires all crates to be buildable. If you encounter dependency issues: - -1. Ensure you're building the workspace: `cargo build --workspace` -2. Check that optional dependencies are properly feature-gated -3. Verify system dependencies are installed (especially for STT features) - -### IDE Integration - -If your IDE or language server has issues with the workspace: - -1. Make sure it's configured to use the workspace root (`Cargo.toml`) -2. Some IDEs may need to be restarted after the workspace migration -3. Check that your IDE supports Cargo workspaces (most modern tools do) - -## Getting Help - -If you encounter issues during migration: - -1. Check the main README.md for updated quick start instructions -2. Review the individual crate README files for specific functionality -3. Open an issue on GitHub with details about your migration problem - -## Rollback Information - -If you need to temporarily roll back to a pre-workspace version, you can checkout the commit before the workspace migration. However, we recommend migrating to the new structure for better maintainability and features. - -The workspace migration maintains full backward compatibility for core functionality - only the build commands have changed. \ No newline at end of file diff --git a/docs/Reusable_Workflow_Design.md b/docs/Reusable_Workflow_Design.md deleted file mode 100644 index a0993010..00000000 --- a/docs/Reusable_Workflow_Design.md +++ /dev/null @@ -1,462 +0,0 @@ -# Org-Level Reusable Workflow Design - -**Last Updated:** 2025-09-01 -**Purpose:** Define a centralized, reusable CI workflow for organization-wide use -**Repository:** `/.github` - -## Overview - -This document specifies the design of an organization-level reusable workflow that provides baseline CI capabilities for Rust and Python projects. Individual repositories call this workflow with a minimal shim, inheriting standardized CI practices while maintaining flexibility for project-specific needs. - -## Architecture - -``` -/.github/ # Organization .github repository -├── .github/ -│ └── workflows/ -│ └── lang-ci.yml # Reusable workflow (workflow_call) -├── README.md -└── CHANGELOG.md -``` - -## Reusable Workflow Specification - -### File: `/.github/.github/workflows/lang-ci.yml` - -```yaml -name: Common CI - -on: - workflow_call: - inputs: - # Language toggles - run_rust: - description: "Run Rust CI jobs" - type: boolean - default: true - run_python: - description: "Run Python CI jobs" - type: boolean - default: false - - # Rust configuration - rust_toolchain: - description: "Rust toolchain (stable/beta/nightly)" - type: string - default: "stable" - rust_features: - description: "Space-separated Cargo features" - type: string - default: "" - rust_no_default_features: - description: "Pass --no-default-features to cargo" - type: boolean - default: false - - # Python configuration - python_version: - description: "Python version (e.g., 3.11)" - type: string - default: "3.11" - python_requirements: - description: "Path to requirements file" - type: string - default: "requirements.txt" - - # System dependencies - install_alsa: - description: "Install ALSA audio libraries (Linux)" - type: boolean - default: false - install_apt_packages: - description: "Space-separated apt packages to install" - type: string - default: "" - - # Testing configuration - test_timeout_minutes: - description: "Test timeout in minutes" - type: number - default: 30 - continue_on_error: - description: "Continue workflow even if tests fail" - type: boolean - default: false - -permissions: - contents: read - -jobs: - # ============================================================ - # RUST CI JOB - # ============================================================ - rust: - name: Rust CI - if: inputs.run_rust && hashFiles('**/Cargo.toml') != '' - runs-on: ubuntu-latest - timeout-minutes: ${{ inputs.test_timeout_minutes }} - continue-on-error: ${{ inputs.continue_on_error }} - - steps: - - name: Checkout code - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - with: - fetch-depth: 0 - - - name: Install Rust toolchain - uses: dtolnay/rust-toolchain@1482605bfc5719782e1267fd0c0cc350fe7646b8 # v1 - with: - toolchain: ${{ inputs.rust_toolchain }} - components: clippy, rustfmt - - - name: Cache Cargo dependencies - uses: Swatinem/rust-cache@23bce251a8cd2ffc3c1075eaa2367cf899916d84 # v2.7.3 - - - name: Install ALSA libraries - if: inputs.install_alsa - run: | - sudo apt-get update - sudo apt-get install -y libasound2-dev - - - name: Install additional apt packages - if: inputs.install_apt_packages != '' - run: | - sudo apt-get update - sudo apt-get install -y ${{ inputs.install_apt_packages }} - - - name: Check formatting - run: cargo fmt --all -- --check - - - name: Run Clippy - run: | - FEATURES_ARG="" - if [ "${{ inputs.rust_no_default_features }}" = "true" ]; then - FEATURES_ARG="--no-default-features" - fi - if [ -n "${{ inputs.rust_features }}" ]; then - FEATURES_ARG="$FEATURES_ARG --features ${{ inputs.rust_features }}" - fi - cargo clippy --workspace --all-targets $FEATURES_ARG -- -D warnings - - - name: Build - run: | - FEATURES_ARG="" - if [ "${{ inputs.rust_no_default_features }}" = "true" ]; then - FEATURES_ARG="--no-default-features" - fi - if [ -n "${{ inputs.rust_features }}" ]; then - FEATURES_ARG="$FEATURES_ARG --features ${{ inputs.rust_features }}" - fi - cargo build --workspace --all-targets $FEATURES_ARG - - - name: Run tests - run: | - FEATURES_ARG="" - if [ "${{ inputs.rust_no_default_features }}" = "true" ]; then - FEATURES_ARG="--no-default-features" - fi - if [ -n "${{ inputs.rust_features }}" ]; then - FEATURES_ARG="$FEATURES_ARG --features ${{ inputs.rust_features }}" - fi - cargo test --workspace $FEATURES_ARG -- --nocapture - - - name: Generate documentation - run: | - FEATURES_ARG="" - if [ "${{ inputs.rust_no_default_features }}" = "true" ]; then - FEATURES_ARG="--no-default-features" - fi - if [ -n "${{ inputs.rust_features }}" ]; then - FEATURES_ARG="$FEATURES_ARG --features ${{ inputs.rust_features }}" - fi - cargo doc --workspace --no-deps $FEATURES_ARG - - - name: Security audit - uses: rustsec/audit-check@dd51754611baa5c0affe6c19adb60f61f165e6e4 # v2.0.0 - with: - token: ${{ secrets.GITHUB_TOKEN }} - - # ============================================================ - # PYTHON CI JOB - # ============================================================ - python: - name: Python CI - if: inputs.run_python && (hashFiles('**/pyproject.toml') != '' || hashFiles('**/requirements.txt') != '' || hashFiles('**/setup.py') != '') - runs-on: ubuntu-latest - timeout-minutes: ${{ inputs.test_timeout_minutes }} - continue-on-error: ${{ inputs.continue_on_error }} - - steps: - - name: Checkout code - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - with: - fetch-depth: 0 - - - name: Setup Python - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0 - with: - python-version: ${{ inputs.python_version }} - cache: 'pip' - - - name: Install additional apt packages - if: inputs.install_apt_packages != '' - run: | - sudo apt-get update - sudo apt-get install -y ${{ inputs.install_apt_packages }} - - - name: Install Python dependencies - run: | - python -m pip install --upgrade pip - # Try multiple common dependency files - if [ -f "${{ inputs.python_requirements }}" ]; then - pip install -r "${{ inputs.python_requirements }}" - elif [ -f "requirements.txt" ]; then - pip install -r requirements.txt - fi - if [ -f "pyproject.toml" ]; then - pip install -e . - elif [ -f "setup.py" ]; then - pip install -e . - fi - # Install common CI tools - pip install ruff pytest pytest-cov - - - name: Format check with ruff - run: ruff format --check . - - - name: Lint with ruff - run: ruff check . - - - name: Type check with mypy - run: | - pip install mypy - mypy . || true # Don't fail on type errors initially - continue-on-error: true - - - name: Run tests with pytest - run: pytest -v --cov=. --cov-report=term-missing - - - name: Security check with bandit - run: | - pip install bandit[toml] - bandit -r . -ll || true # Low severity, don't fail initially - continue-on-error: true - - # ============================================================ - # DEPENDENCY CHECK JOB (Both Languages) - # ============================================================ - dependency-check: - name: Dependency Security Check - if: (inputs.run_rust && hashFiles('**/Cargo.toml') != '') || (inputs.run_python && hashFiles('**/requirements.txt') != '') - runs-on: ubuntu-latest - timeout-minutes: 10 - - steps: - - name: Checkout code - uses: actions/checkout@3df4ab11eba7bda6032a0b82a6bb43b11571feac # v4.1.7 - - - name: Run Trivy security scanner - uses: aquasecurity/trivy-action@6e7b7d1fd3e4fef0c5fa8cce1229c54b2c9bd0d8 # v0.24.0 - with: - scan-type: 'fs' - scan-ref: '.' - severity: 'CRITICAL,HIGH' - exit-code: '0' # Don't fail the build initially -``` - -## Versioning Strategy - -### Semantic Versioning - -- **v1**: Initial stable release -- **v1.x**: Backward-compatible improvements -- **v2**: Breaking changes requiring shim updates - -### Tagging Process - -```bash -# In the org/.github repository -git tag -a v1.0.0 -m "Initial release of common CI workflow" -git push origin v1.0.0 - -# Create major version tag for stability -git tag -a v1 -m "v1 stable" v1.0.0^{} -git push origin v1 -``` - -## Security Considerations - -### Action Pinning - -All third-party actions are pinned to specific commit SHAs with version comments: -- Prevents supply chain attacks -- Managed via Dependabot for updates -- Pattern: `uses: owner/action@SHA # vX.Y.Z` - -### Permissions - -- Minimal permissions (`contents: read`) -- No write permissions in reusable workflow -- Calling workflows can extend permissions if needed - -### Secrets - -- Use `secrets: inherit` in calling workflow -- No hardcoded secrets in reusable workflow -- Support for organization-level secrets - -## Performance Optimizations - -### Caching - -- **Rust**: Swatinem/rust-cache for Cargo -- **Python**: Built-in pip caching via setup-python -- **Docker**: Layer caching for custom images - -### Concurrency - -Calling workflows should implement: -```yaml -concurrency: - group: ${{ github.workflow }}-${{ github.ref }} - cancel-in-progress: true -``` - -### Auto-detection - -- Language detection via `hashFiles()` -- Skip jobs when language files absent -- Override with explicit inputs - -## Usage Examples - -### Basic Rust Project - -```yaml -# .github/workflows/ci.yml -name: CI -on: [push, pull_request] - -jobs: - ci: - uses: myorg/.github/.github/workflows/lang-ci.yml@v1 - with: - run_rust: true - run_python: false -``` - -### Python Project with System Dependencies - -```yaml -# .github/workflows/ci.yml -name: CI -on: [push, pull_request] - -jobs: - ci: - uses: myorg/.github/.github/workflows/lang-ci.yml@v1 - with: - run_rust: false - run_python: true - python_version: "3.12" - install_apt_packages: "libpq-dev" -``` - -### Mixed Language Project - -```yaml -# .github/workflows/ci.yml -name: CI -on: [push, pull_request] - -jobs: - ci: - uses: myorg/.github/.github/workflows/lang-ci.yml@v1 - with: - run_rust: true - run_python: true - rust_features: "cli" - python_version: "3.11" -``` - -## Extension Points - -### Adding New Languages - -To add a new language (e.g., Go): - -1. Add input toggles: - ```yaml - run_go: - type: boolean - default: false - ``` - -2. Add language-specific job: - ```yaml - go: - if: inputs.run_go && hashFiles('**/go.mod') != '' - # ... job steps - ``` - -### Adding Matrix Testing - -For projects needing OS/version matrices, create specialized workflows: -```yaml -# lang-ci-matrix.yml -strategy: - matrix: - os: [ubuntu-latest, windows-latest, macos-latest] - rust: [stable, beta] -``` - -## Maintenance - -### Dependabot Configuration - -```yaml -# /.github/.github/dependabot.yml -version: 2 -updates: - - package-ecosystem: "github-actions" - directory: "/" - schedule: - interval: "weekly" - commit-message: - prefix: "chore(deps)" -``` - -### Monitoring - -- Track workflow run times -- Monitor failure rates -- Review security alerts -- Update actions monthly - -## Migration Checklist - -For organizations adopting this workflow: - -- [ ] Create `/.github` repository -- [ ] Add `lang-ci.yml` workflow file -- [ ] Configure Dependabot -- [ ] Tag initial version (v1.0.0) -- [ ] Create major version tag (v1) -- [ ] Document in org README -- [ ] Create example shims -- [ ] Pilot with 1-2 projects -- [ ] Roll out organization-wide - -## FAQ - -**Q: Can projects override specific steps?** -A: No, but they can add additional jobs in their calling workflow. - -**Q: How to handle private dependencies?** -A: Use `secrets: inherit` and configure tokens at the repository level. - -**Q: What about deployment workflows?** -A: Keep deployment separate; this is for CI only. - -**Q: How to test workflow changes?** -A: Use a test branch and reference it: `uses: org/.github/.github/workflows/lang-ci.yml@test-branch` \ No newline at end of file diff --git a/docs/end_to_end_testing.md b/docs/end_to_end_testing.md deleted file mode 100644 index fea92503..00000000 --- a/docs/end_to_end_testing.md +++ /dev/null @@ -1,268 +0,0 @@ -# End-to-End WAV File Testing - -This document describes how to run the comprehensive end-to-end test that processes WAV files through the entire ColdVox pipeline from audio input to text injection. - -## Overview - -The end-to-end test (`test_end_to_end_wav_pipeline`) simulates the complete ColdVox pipeline: - -1. **WAV File Loading**: Loads and streams WAV files as if they were live microphone input -2. **Audio Processing**: Chunking, resampling, and mono conversion -3. **VAD Processing**: Speech activity detection using Silero VAD -4. **STT Processing**: Speech-to-text transcription using Vosk -5. **Text Injection**: Mock text injection that captures results for verification - -## Prerequisites - -### 1. Vosk Model - -Download a Vosk model for speech recognition: - -```bash -# Create models directory -mkdir -p models - -# Download a small English model (37MB) -wget -O models/vosk-model-small-en-us-0.15.zip \ - https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip - -# Extract the model -cd models -unzip vosk-model-small-en-us-0.15.zip -cd .. - -# Set environment variable -export VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 -``` - -Alternatively, use a larger, more accurate model: - -```bash -# Download medium English model (328MB) - better accuracy -wget -O models/vosk-model-en-us-0.22-lgraph.zip \ - https://alphacephei.com/vosk/models/vosk-model-en-us-0.22-lgraph.zip - -cd models -unzip vosk-model-en-us-0.22-lgraph.zip -export VOSK_MODEL_PATH=models/vosk-model-en-us-0.22-lgraph -cd .. -``` - -### 2. Test Audio Files - -The test suite includes pre-recorded WAV files with corresponding transcripts in the `test_data/` directory. - -#### Automatic Test Data Selection - -The test automatically: -1. **Randomly selects** a WAV file from `test_data/` directory -2. **Loads the corresponding transcript** from the `.txt` file with the same name -3. **Extracts keywords** from the transcript (words ≥4 characters) -4. **Verifies transcription** by checking if at least one keyword appears in the output - -#### Test Data Files - -The repository includes 13 test WAV files (`test_1.wav` through `test_12.wav` and `pipeline_test.wav`) with transcripts. Each transcript contains the expected text in uppercase, for example: -- `test_1.txt`: "ON AUGUST TWENTY SEVENTH EIGHTEEN THIRTY SEVEN SHE WRITES" -- `test_5.txt`: "YOUR PLAY MUST BE NOT MERELY A GOOD PLAY BUT A SUCCESSFUL ONE" - -#### Option A: Use Existing Test Data - -```bash -# Run test with random file selection from test_data/ -VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored -``` - -#### Option B: Record Custom Test Audio - -```bash -# Record a 10-second test file (speak clearly) -cargo run --example record_10s - -# Use your custom recording -TEST_WAV=recording_16khz_10s_1672531200.wav \ -VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored -``` - -#### Option C: Use Existing Audio - -Convert existing audio files to the required format: - -```bash -# Using ffmpeg to convert any audio file -ffmpeg -i input_audio.mp3 -ar 16000 -ac 1 -sample_fmt s16 test_audio_16k.wav - -# Or using SoX -sox input_audio.wav -r 16000 -c 1 -b 16 test_audio_16k.wav -``` - -## Running the Test - -### Basic Test Execution - -```bash -# Run with automatic random test file selection -VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture - -# Run with a specific WAV file -TEST_WAV=test_audio_16k.wav VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture -``` - -### With Different Models - -```bash -# Test with the larger, more accurate model -TEST_WAV=recording_16khz_10s_1672531200.wav \ -VOSK_MODEL_PATH=models/vosk-model-en-us-0.22-lgraph \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture -``` - -### Environment Variables - -- `TEST_WAV`: Path to the WAV file to test (default: `test_audio_16k.wav`) -- `VOSK_MODEL_PATH`: Path to the Vosk model directory -- `RUST_LOG`: Set to `debug` or `trace` for detailed logging - -## Test Validation - -The test performs several validations: - -1. **Audio Loading**: Verifies WAV file loads and converts correctly -2. **Pipeline Setup**: Ensures all components initialize properly -3. **Speech Detection**: VAD should detect speech segments in the audio -4. **Transcription**: STT should produce text output from detected speech -5. **Text Injection**: Mock injector should capture the transcribed text -6. **Content Verification**: Checks that at least one expected keyword is present - -### Validation Strategy - -The test uses a **flexible keyword matching** approach: -- Extracts keywords (≥4 characters) from the reference transcript -- Checks if at least one keyword appears in the transcription -- Accounts for STT accuracy limitations (not expecting 100% accuracy) -- Handles variations in pronunciation and recognition errors - -### Example Test Output - -``` -Testing with WAV file: test_data/test_4.wav -Expected keywords: ["american", "school", "boys"] -✅ Test passed! Injections: ["can schoolboys read with emotions of horror", ...] -``` - -In this example, the keyword "schoolboys" (containing "school") was successfully matched. - -## Creating Good Test Audio - -For reliable test results: - -1. **Clear Speech**: Speak clearly and at normal volume -2. **Quiet Environment**: Minimize background noise -3. **Simple Phrases**: Use common words that Vosk recognizes well -4. **Appropriate Length**: 5-15 seconds is ideal -5. **Proper Format**: 16kHz, mono, 16-bit PCM WAV - -### Example Test Phrases - -Record yourself saying: -- "Hello world, this is a test" -- "The quick brown fox jumps over the lazy dog" -- "ColdVox is working correctly" -- "Testing speech recognition pipeline" - -## Troubleshooting - -### Common Issues - -1. **Model Not Found** - ``` - Error: Vosk model not found at 'models/vosk-model-small-en-us-0.15' - ``` - Solution: Download and extract the Vosk model as described above. - -2. **No Speech Detected** - ``` - No speech detected. Possible issues: - - Threshold too high (current: 0.2) - - Audio file contains no speech - ``` - Solutions: - - Check audio file has audible speech - - Lower VAD threshold in test code - - Verify audio format is correct - -3. **Poor Transcription Quality** - - Use a larger, more accurate Vosk model - - Ensure clear speech in test audio - - Check for background noise - -4. **Test Timeout** - - Increase test duration for longer audio files - - Check for component initialization issues - -### Debug Mode - -Run with detailed logging: - -```bash -RUST_LOG=debug TEST_WAV=test_audio_16k.wav \ -VOSK_MODEL_PATH=models/vosk-model-small-en-us-0.15 \ - cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored --nocapture -``` - -## Integration with CI/CD - -The test can be integrated into automated testing with proper setup: - -```yaml -# Example GitHub Actions step -- name: Download Vosk Model - run: | - mkdir -p models - wget -O models/model.zip https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip - cd models && unzip model.zip - -- name: Run End-to-End Test - env: - VOSK_MODEL_PATH: models/vosk-model-small-en-us-0.15 - TEST_WAV: test_data/sample_speech.wav - run: cargo test --features vosk test_end_to_end_wav_pipeline -- --ignored -``` - -## Test Architecture - -The test creates a complete pipeline simulation: - -``` -WAV File → AudioRingBuffer → AudioChunker → VadProcessor - ↓ -Mock Text Injector ← STT Processor ← Audio Frames -``` - -### Key Components - -1. **WavFileLoader**: Streams WAV file data with realistic timing - - Loads WAV files using the `hound` crate - - Simulates real-time audio streaming (32ms chunks) - - Handles format conversion to 16kHz mono i16 - -2. **Mock Text Injector**: Captures transcriptions for verification - - Implements the `TextInjector` trait - - Stores injected text in a thread-safe collection - - Enables validation without actual system text injection - -3. **Mock Injection Processor**: Manages transcription sessions - - Buffers partial transcriptions - - Implements silence timeout logic (1.5 seconds) - - Simulates production text injection behavior - -4. **Random Test Selection**: Ensures comprehensive coverage - - Randomly selects from available test files - - Loads corresponding transcripts automatically - - Extracts meaningful keywords for validation - -This ensures the test validates the actual production code paths and component interactions, providing confidence in the full system integration. \ No newline at end of file diff --git a/docs/tasks/0828TestRefactorPlan.md b/docs/tasks/0828TestRefactorPlan.md deleted file mode 100644 index f739d712..00000000 --- a/docs/tasks/0828TestRefactorPlan.md +++ /dev/null @@ -1,130 +0,0 @@ -# 0828 Test & Refactor Plan – ColdVox (ARCHIVED - SUBSTANTIALLY COMPLETE) - -**Status: ARCHIVED as of 2025-08-29** -**Reason: 4/6 components implemented, remaining gaps deemed not vital for production use** - -This plan consolidates the STT pipeline testing strategy validated against the current codebase, with CI-safe defaults and feature-gated extensions. - -## Original Goals ✅ ACHIEVED - -- ✅ Enhanced test coverage that compiles and runs by default (no model, no hardware), with optional Vosk-enabled checks. -- ✅ Improved determinism and observability without large architectural churn. - -## Key Decisions - -- Gate all STT/Vosk-specific code paths behind the `vosk` cargo feature and require a valid `VOSK_MODEL_PATH` to run STT tests. -- Default tests run VAD-only pipeline using in-process ring buffer and chunker; no CPAL/microphone required. -- Replace ambiguous "no samples lost" with framing-aware accounting (± one 512-sample frame). -- Assert health via PipelineMetrics/activity instead of HealthMonitor checks (no checks registered by default). -- Simulate stalls to validate watchdog triggering; don't assert recovery attempts (no public recovery API). - -## Work Items - -### 1) Test Scaffolding & Utilities -### 1) Test Scaffolding & Utilities ✅ IMPLEMENTED - -- **COMPLETE**: `crates/app/tests/common/test_utils.rs` contains comprehensive utilities. - - ✅ WER helper exists (lines 285-311) for STT accuracy checks. - - ✅ Ring buffer feeding helper exists (lines 259-283) for 512-sample frames. -- **COMPLETE**: Test fixtures under `test_data/` are abundant: - - ✅ `pipeline_test.wav` + `pipeline_test.txt` exist. - - ✅ Additional 12 test file pairs available (test_1 through test_12). - -### 2) End-to-End (E2E) Pipeline Test ✅ IMPLEMENTED - -- **COMPLETE**: `crates/app/tests/pipeline_integration.rs` exists. -- ✅ Builds ring buffer → `FrameReader` → `AudioChunker(512@16k)` → broadcast. -- ✅ Proper chunking integrity assertions with frame-aware accounting. -- Note: STT assertions would require Vosk feature and model availability. - -### 3) VAD Pipeline Test ✅ IMPLEMENTED - -- **COMPLETE**: `crates/app/tests/vad_pipeline_tests.rs` exists. -- ✅ Uses Level3 VAD for model-free deterministic testing. -- ✅ Tests silence detection without producing spurious events. -- **GAP**: More comprehensive VAD accuracy testing with various speech patterns. -### 4) STT Unit Tests ✅ IMPLEMENTED - -- **COMPLETE**: `crates/app/src/stt/tests.rs` exists with proper feature gating. -- ✅ Tests gated behind `#[cfg(feature = "vosk")]`. -- ✅ Handles missing model paths gracefully. -- ✅ Includes processor state transition tests. - -### 5) Error Handling & Watchdog Test **GAP** - -- **NOT IMPLEMENTED**: Comprehensive error recovery testing. -- Missing: Watchdog trigger testing during stalls. -- Missing: Device disconnection/reconnection scenarios. - -### 6) System Health Test **GAP** - -- **NOT IMPLEMENTED**: Dedicated system health monitoring test. -- Missing: PipelineMetrics validation in controlled test environment. -- Missing: Graceful shutdown timing verification. - -### 7) Live Operation Example **GAP** - -- **NOT IMPLEMENTED**: Live hardware operation example. -- Note: Could be valuable for manual testing but not critical for automated CI. - -### 8) State Transitions Test **GAP** - -- **NOT IMPLEMENTED**: Rapid VAD state transition testing. -- Missing: Stress testing of speech/silence boundary detection. - -## Feature/Config Notes - -- Vosk feature: - - `stt::vosk` and `stt::processor` are compiled only with `--features vosk`. - - Tests touching these must be `#[cfg(feature = "vosk")]` and should skip if `VOSK_MODEL_PATH` is missing. -- `SttProcessor::new` constructs `VoskTranscriber` unconditionally; only call when model is present. -- Prefer Level3 VAD for deterministic tests (set `UnifiedVadConfig { mode: VadMode::Level3, level3.enabled = true, frame_size_samples = 512, sample_rate_hz = 16_000 }`). Silero requires ONNX/runtime assets. - -## Metrics & Observability - -- Use `PipelineMetrics` in chunker and VAD tests to assert activity (FPS and counters). -- For accounting, track total input samples fed vs. chunker emissions (sum of 512-sized frames). - -## CI Strategy - -- Default: run all tests except the live example; STT paths skipped unless `vosk` + model available. -- Keep fixtures small; programmatic generation acceptable to avoid large binaries. - -## Current Status Summary - -**✅ IMPLEMENTED (4/6 components):** -- Test scaffolding and utilities in `test_utils.rs` -- End-to-end pipeline test in `pipeline_integration.rs` -- VAD pipeline test in `vad_pipeline_tests.rs` -- STT unit tests with feature gating in `src/stt/tests.rs` - -**❌ REMAINING GAPS (2/6 components) - ASSESSED AS NOT VITAL:** -- Error handling & watchdog testing - **Complex mock engineering for minimal benefit** -- System health monitoring tests - **Already covered by TUI dashboard and existing pipeline tests** - -**Final Assessment:** Mock testing of device fallbacks would be overkill for straightforward control flow logic that's already validated through real hardware testing via TUI dashboard and examples. - -**📁 Test Data Status:** -- Abundant test fixtures (12+ pairs) exceed original minimal requirements -- `pipeline_test.wav` and `.txt` files are available - -## Risks & Mitigations - -- Vosk model availability: gate and skip when absent. -- Timing flakiness: use generous tolerances and deterministic generators. -- API mismatches: ensure `VadProcessor::spawn` is called with `Arc` per current signature. - -## Plan Resolution - ARCHIVED - -**Decision:** This plan is archived as substantially complete rather than fully implemented. - -**Rationale:** -- Core testing objectives achieved with 4/6 components implemented -- Remaining gaps (error recovery and system health tests) provide diminishing returns -- Mock testing of CPAL device failures would be complex engineering for minimal benefit -- Real-world validation through TUI dashboard and examples is more valuable -- Test infrastructure is solid with comprehensive utilities and abundant test data - -**Recommendation:** Focus development efforts on higher-priority features rather than completing the remaining test components. - -**Archive Date:** 2025-08-29 diff --git a/docs/tasks/Text_Injection_Placeholder_Implementation_Guide.md b/docs/tasks/Text_Injection_Placeholder_Implementation_Guide.md deleted file mode 100644 index 7dee2236..00000000 --- a/docs/tasks/Text_Injection_Placeholder_Implementation_Guide.md +++ /dev/null @@ -1,634 +0,0 @@ -# Text Injection System - Placeholder Implementation Guide - -## Overview - -This document provides comprehensive implementation details for completing the placeholder code in the ColdVox text injection system. The current implementation has several placeholder methods that need to be replaced with functional code to achieve production readiness. - -## Priority Implementation Tasks - -### 1. AT-SPI Focus Detection and App Identification - -**Location**: `crates/app/src/text_injection/focus.rs` and `crates/app/src/text_injection/manager.rs` - -#### Current State -- `check_focus_status()` always returns `FocusStatus::EditableText` -- `get_current_app_id()` returns hardcoded `"unknown_app"` - -#### Required Implementation - -```rust -// focus.rs - Real AT-SPI focus detection -async fn check_focus_status(&self) -> Result { - #[cfg(feature = "text-injection-atspi")] - { - use atspi::{connection::Connection, accessible::Accessible}; - - // Connect to AT-SPI bus - let connection = Connection::new().await - .map_err(|e| InjectionError::Other(format!("AT-SPI connection failed: {}", e)))?; - - // Get currently focused accessible - let focused = connection.get_focused_accessible().await - .map_err(|e| InjectionError::Other(format!("Failed to get focus: {}", e)))?; - - // Check if the focused element supports EditableText interface - let interfaces = focused.get_interfaces().await - .map_err(|e| InjectionError::Other(format!("Failed to get interfaces: {}", e)))?; - - if interfaces.contains(&"EditableText") { - // Check if element is actually editable (not read-only) - let states = focused.get_state_set().await - .map_err(|e| InjectionError::Other(format!("Failed to get states: {}", e)))?; - - if states.contains(atspi::StateType::Editable) && - !states.contains(atspi::StateType::ReadOnly) { - return Ok(FocusStatus::EditableText); - } - } - - // Check for common text input roles - let role = focused.get_role().await - .map_err(|e| InjectionError::Other(format!("Failed to get role: {}", e)))?; - - match role { - atspi::Role::Text | - atspi::Role::PasswordText | - atspi::Role::Terminal | - atspi::Role::Entry | - atspi::Role::EditableComboBox => Ok(FocusStatus::EditableText), - _ => Ok(FocusStatus::NonEditable) - } - } - - #[cfg(not(feature = "text-injection-atspi"))] - { - // Fallback: Use X11/Wayland window properties - Ok(FocusStatus::Unknown) - } -} - -// manager.rs - Real app identification -async fn get_current_app_id(&self) -> Result { - #[cfg(feature = "text-injection-atspi")] - { - // Get the focused element's application - let focused = self.focus_tracker.get_focused_accessible().await?; - let app = focused.get_application().await - .map_err(|e| InjectionError::Other(format!("Failed to get app: {}", e)))?; - - // Try to get application name - if let Ok(name) = app.get_name().await { - if !name.is_empty() { - return Ok(name); - } - } - - // Fallback to process name - if let Ok(toolkit) = app.get_toolkit_name().await { - return Ok(format!("{}_{}", toolkit, app.get_id().await.unwrap_or_default())); - } - } - - // Fallback: Use window manager info - #[cfg(target_os = "linux")] - { - // Try to get active window class via X11/Wayland - if let Ok(window_class) = get_active_window_class().await { - return Ok(window_class); - } - } - - Ok("unknown".to_string()) -} -``` - -### 2. Permission Checking for External Binaries - -**Location**: `crates/app/src/text_injection/ydotool_injector.rs`, `kdotool_injector.rs` - -#### Required Implementation - -```rust -// Common permission checking utility -pub fn check_binary_permissions(binary_name: &str) -> Result<(), InjectionError> { - use std::process::Command; - use std::os::unix::fs::PermissionsExt; - - // Check if binary exists in PATH - let output = Command::new("which") - .arg(binary_name) - .output() - .map_err(|e| InjectionError::Process(format!("Failed to locate {}: {}", binary_name, e)))?; - - if !output.status.success() { - return Err(InjectionError::MethodUnavailable( - format!("{} not found in PATH", binary_name) - )); - } - - let binary_path = String::from_utf8_lossy(&output.stdout).trim().to_string(); - - // Check if binary is executable - let metadata = std::fs::metadata(&binary_path) - .map_err(|e| InjectionError::Io(e))?; - - let permissions = metadata.permissions(); - if permissions.mode() & 0o111 == 0 { - return Err(InjectionError::PermissionDenied( - format!("{} is not executable", binary_name) - )); - } - - // For ydotool specifically, check uinput access - if binary_name == "ydotool" { - check_uinput_access()?; - } - - Ok(()) -} - -fn check_uinput_access() -> Result<(), InjectionError> { - use std::fs::OpenOptions; - - // Check if we can open /dev/uinput - match OpenOptions::new().write(true).open("/dev/uinput") { - Ok(_) => Ok(()), - Err(e) if e.kind() == std::io::ErrorKind::PermissionDenied => { - // Check if user is in input group - let groups = Command::new("groups") - .output() - .map_err(|e| InjectionError::Process(format!("Failed to check groups: {}", e)))?; - - let groups_str = String::from_utf8_lossy(&groups.stdout); - if !groups_str.contains("input") { - return Err(InjectionError::PermissionDenied( - "User not in 'input' group. Run: sudo usermod -a -G input $USER".to_string() - )); - } - - Err(InjectionError::PermissionDenied( - "/dev/uinput access denied. ydotool daemon may not be running".to_string() - )) - } - Err(e) => Err(InjectionError::Io(e)) - } -} -``` - -### 3. Success Rate Tracking and Adaptive Strategy - -**Location**: `crates/app/src/text_injection/manager.rs` - -#### Required Implementation - -```rust -impl StrategyManager { - /// Get ordered list of methods to try based on success rates - fn get_method_priority(&self, app_id: &str) -> Vec { - let mut methods = vec![]; - - // Always try AT-SPI first if available - #[cfg(feature = "text-injection-atspi")] - methods.push(InjectionMethod::AtspiInsert); - - // Add clipboard methods - #[cfg(feature = "text-injection-clipboard")] - { - methods.push(InjectionMethod::Clipboard); - #[cfg(feature = "text-injection-atspi")] - methods.push(InjectionMethod::ClipboardAndPaste); - } - - // Sort by success rate for this app - methods.sort_by(|a, b| { - let key_a = (app_id.to_string(), *a); - let key_b = (app_id.to_string(), *b); - - let rate_a = self.success_cache.get(&key_a) - .map(|r| r.success_rate) - .unwrap_or(0.5); // Default 50% assumed success - - let rate_b = self.success_cache.get(&key_b) - .map(|r| r.success_rate) - .unwrap_or(0.5); - - rate_b.partial_cmp(&rate_a).unwrap_or(std::cmp::Ordering::Equal) - }); - - // Add opt-in fallback methods at the end - if self.config.allow_ydotool && !self.is_in_cooldown(InjectionMethod::YdoToolPaste) { - methods.push(InjectionMethod::YdoToolPaste); - } - - if self.config.allow_enigo && !self.is_in_cooldown(InjectionMethod::EnigoText) { - methods.push(InjectionMethod::EnigoText); - } - - if self.config.allow_mki && !self.is_in_cooldown(InjectionMethod::UinputKeys) { - methods.push(InjectionMethod::UinputKeys); - } - - methods - } - - /// Update success record with decay for old records - fn update_success_record(&mut self, app_id: &str, method: InjectionMethod, success: bool) { - let key = (app_id.to_string(), method); - - let record = self.success_cache.entry(key.clone()).or_insert_with(|| SuccessRecord { - success_count: 0, - fail_count: 0, - last_success: None, - last_failure: None, - success_rate: 0.5, - }); - - // Apply time-based decay (older results matter less) - let decay_factor = 0.95; - record.success_count = (record.success_count as f64 * decay_factor) as u32; - record.fail_count = (record.fail_count as f64 * decay_factor) as u32; - - // Update counts - if success { - record.success_count += 1; - record.last_success = Some(Instant::now()); - } else { - record.fail_count += 1; - record.last_failure = Some(Instant::now()); - } - - // Recalculate success rate with minimum sample size - let total = record.success_count + record.fail_count; - if total > 0 { - record.success_rate = record.success_count as f64 / total as f64; - } else { - record.success_rate = 0.5; // Default to 50% - } - - // Apply cooldown for repeated failures - if !success && record.fail_count > 2 { - self.apply_cooldown(app_id, method, "Multiple consecutive failures"); - } - - debug!( - "Updated success record for {}/{:?}: {:.1}% ({}/{})", - app_id, method, record.success_rate * 100.0, - record.success_count, total - ); - } - - /// Apply exponential backoff cooldown - fn apply_cooldown(&mut self, app_id: &str, method: InjectionMethod, error: &str) { - let key = (app_id.to_string(), method); - - let mut cooldown = self.cooldowns.entry(key).or_insert_with(|| CooldownState { - until: Instant::now(), - backoff_level: 0, - last_error: String::new(), - }); - - // Calculate cooldown duration with exponential backoff - let base_ms = self.config.cooldown_initial_ms; - let factor = self.config.cooldown_backoff_factor; - let max_ms = self.config.cooldown_max_ms; - - let cooldown_ms = (base_ms as f64 * factor.powi(cooldown.backoff_level as i32)) - .min(max_ms as f64) as u64; - - cooldown.until = Instant::now() + Duration::from_millis(cooldown_ms); - cooldown.backoff_level += 1; - cooldown.last_error = error.to_string(); - - warn!( - "Applied cooldown for {}/{:?}: {}ms (level {})", - app_id, method, cooldown_ms, cooldown.backoff_level - ); - } -} -``` - -### 4. Window Manager Integration - -**Location**: New file `crates/app/src/text_injection/window_manager.rs` - -#### Required Implementation - -```rust -use std::process::Command; - -/// Get the currently active window class name -pub async fn get_active_window_class() -> Result { - // Try KDE-specific method first - if let Ok(class) = get_kde_window_class().await { - return Ok(class); - } - - // Try generic X11 method - if let Ok(class) = get_x11_window_class().await { - return Ok(class); - } - - // Try Wayland method - if let Ok(class) = get_wayland_window_class().await { - return Ok(class); - } - - Err(InjectionError::Other("Could not determine active window".to_string())) -} - -async fn get_kde_window_class() -> Result { - // Use KWin DBus interface - let output = Command::new("qdbus") - .args(&[ - "org.kde.KWin", - "/KWin", - "org.kde.KWin.activeClient" - ]) - .output() - .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - - if output.status.success() { - let window_id = String::from_utf8_lossy(&output.stdout).trim().to_string(); - - // Get window class from ID - let class_output = Command::new("qdbus") - .args(&[ - "org.kde.KWin", - &format!("/Windows/{}", window_id), - "org.kde.KWin.Window.resourceClass" - ]) - .output() - .map_err(|e| InjectionError::Process(format!("qdbus failed: {}", e)))?; - - if class_output.status.success() { - return Ok(String::from_utf8_lossy(&class_output.stdout).trim().to_string()); - } - } - - Err(InjectionError::Other("KDE window class not available".to_string())) -} - -async fn get_x11_window_class() -> Result { - // Use xprop to get active window class - let output = Command::new("xprop") - .args(&["-root", "_NET_ACTIVE_WINDOW"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if output.status.success() { - let window_str = String::from_utf8_lossy(&output.stdout); - if let Some(window_id) = window_str.split("# ").nth(1) { - let window_id = window_id.trim(); - - // Get window class - let class_output = Command::new("xprop") - .args(&["-id", window_id, "WM_CLASS"]) - .output() - .map_err(|e| InjectionError::Process(format!("xprop failed: {}", e)))?; - - if class_output.status.success() { - let class_str = String::from_utf8_lossy(&class_output.stdout); - // Parse WM_CLASS string (format: WM_CLASS(STRING) = "instance", "class") - if let Some(class_part) = class_str.split('"').nth(3) { - return Ok(class_part.to_string()); - } - } - } - } - - Err(InjectionError::Other("X11 window class not available".to_string())) -} - -async fn get_wayland_window_class() -> Result { - // Try using wlr-foreign-toplevel-management protocol if available - // This requires compositor support (e.g., Sway, some KWin versions) - - // For now, we'll try using swaymsg if Sway is running - let output = Command::new("swaymsg") - .args(&["-t", "get_tree"]) - .output() - .map_err(|e| InjectionError::Process(format!("swaymsg failed: {}", e)))?; - - if output.status.success() { - // Parse JSON to find focused window - // This would require serde_json dependency - // For now, return error - return Err(InjectionError::Other("Wayland parsing not implemented".to_string())); - } - - Err(InjectionError::Other("Wayland window class not available".to_string())) -} -``` - -### 5. Clipboard Restoration Enhancement - -**Location**: `crates/app/src/text_injection/clipboard_injector.rs` - -#### Required Implementation - -```rust -impl ClipboardInjector { - /// Save current clipboard content for restoration - async fn save_clipboard(&mut self) -> Result, InjectionError> { - if !self.config.restore_clipboard { - return Ok(None); - } - - #[cfg(feature = "text-injection-clipboard")] - { - use wl_clipboard_rs::paste::{get_contents, ClipboardType, Seat}; - - // Try to get current clipboard content - match get_contents(ClipboardType::Regular, Seat::Unspecified) { - Ok((mut pipe, _mime)) => { - let mut contents = String::new(); - if pipe.read_to_string(&mut contents).is_ok() { - debug!("Saved clipboard content ({} chars)", contents.len()); - return Ok(Some(contents)); - } - } - Err(e) => { - debug!("Could not save clipboard: {}", e); - } - } - } - - Ok(None) - } - - /// Restore previously saved clipboard content - async fn restore_clipboard(&mut self, content: Option) -> Result<(), InjectionError> { - if let Some(content) = content { - if !self.config.restore_clipboard { - return Ok(()); - } - - #[cfg(feature = "text-injection-clipboard")] - { - use wl_clipboard_rs::copy::{MimeType, Options, Source}; - - let opts = Options::new(); - match opts.copy(Source::Bytes(content.as_bytes()), MimeType::Text) { - Ok(_) => { - debug!("Restored clipboard content ({} chars)", content.len()); - } - Err(e) => { - warn!("Failed to restore clipboard: {}", e); - } - } - } - } - - Ok(()) - } -} -``` - -## Testing Strategy - -### Unit Tests - -Create comprehensive unit tests for each component: - -```rust -// tests/test_focus_tracking.rs -#[cfg(test)] -mod tests { - use super::*; - - #[tokio::test] - async fn test_focus_detection() { - let config = InjectionConfig::default(); - let mut tracker = FocusTracker::new(config); - - // Test focus detection - let status = tracker.get_focus_status().await; - assert!(status.is_ok()); - - // Test caching - let cached = tracker.cached_focus_status(); - assert!(cached.is_some()); - } - - #[tokio::test] - async fn test_app_identification() { - let manager = StrategyManager::new(InjectionConfig::default()); - let app_id = manager.get_current_app_id().await; - - assert!(app_id.is_ok()); - assert_ne!(app_id.unwrap(), "unknown_app"); - } - - #[test] - fn test_permission_checking() { - // Test binary existence - let result = check_binary_permissions("ls"); // Should exist - assert!(result.is_ok()); - - let result = check_binary_permissions("nonexistent_binary_xyz"); - assert!(result.is_err()); - } -} -``` - -### Integration Tests - -```rust -// tests/test_injection_integration.rs -#[cfg(all(test, feature = "text-injection"))] -mod integration_tests { - use super::*; - - #[tokio::test] - async fn test_full_injection_flow() { - let config = InjectionConfig { - allow_ydotool: false, - restore_clipboard: true, - ..Default::default() - }; - - let mut manager = StrategyManager::new(config); - - // Test injection with actual text - let result = manager.inject("Test injection").await; - - // Should attempt AT-SPI or clipboard methods - assert!(result.is_ok() || result.is_err()); - - // Check metrics - let metrics = manager.metrics(); - assert!(metrics.attempts > 0); - } -} -``` - -## Configuration Updates - -Add new configuration options to `InjectionConfig`: - -```rust -pub struct InjectionConfig { - // ... existing fields ... - - /// Cache duration for focus status (ms) - #[serde(default = "default_focus_cache_duration_ms")] - pub focus_cache_duration_ms: u64, - - /// Minimum success rate before trying fallback methods - #[serde(default = "default_min_success_rate")] - pub min_success_rate: f64, - - /// Number of samples before trusting success rate - #[serde(default = "default_min_sample_size")] - pub min_sample_size: u32, - - /// Enable window manager integration - #[serde(default = "default_true")] - pub enable_window_detection: bool, -} - -fn default_focus_cache_duration_ms() -> u64 { 200 } -fn default_min_success_rate() -> f64 { 0.3 } -fn default_min_sample_size() -> u32 { 5 } -fn default_true() -> bool { true } -``` - -## Deployment Checklist - -- [ ] Replace all placeholder implementations -- [ ] Add permission checking for external binaries -- [ ] Implement real AT-SPI focus detection -- [ ] Add window manager integration -- [ ] Implement adaptive strategy with success tracking -- [ ] Add clipboard save/restore functionality -- [ ] Create comprehensive unit tests -- [ ] Add integration tests -- [ ] Update configuration with new options -- [ ] Document user setup requirements (groups, permissions) -- [ ] Test on KDE Plasma Wayland -- [ ] Test on X11 environments -- [ ] Benchmark performance impact - -## Performance Considerations - -1. **Caching**: Cache focus status and app ID for 200ms to avoid excessive AT-SPI calls -2. **Async Operations**: Use tokio for all I/O operations to avoid blocking -3. **Timeout Management**: Enforce strict timeouts on all external calls -4. **Success Rate Decay**: Apply time-based decay to prevent stale data -5. **Resource Cleanup**: Always restore clipboard if interrupted - -## Security Considerations - -1. **Permission Verification**: Check binary permissions before execution -2. **Input Validation**: Sanitize text before injection -3. **Clipboard Privacy**: Only restore clipboard if explicitly configured -4. **Process Isolation**: Run external commands with minimal privileges -5. **Error Disclosure**: Don't expose sensitive system information in errors - -## Notes - -- AT-SPI2 requires `at-spi2-core` package on most distributions -- Wayland clipboard requires `wl-clipboard` package -- KDE integration works best with `qdbus` available -- X11 fallback requires `xprop` utility -- Consider implementing a daemon mode for better performance \ No newline at end of file diff --git a/docs/tasks/cleanups0829.md b/docs/tasks/cleanups0829.md deleted file mode 100644 index 840a4986..00000000 --- a/docs/tasks/cleanups0829.md +++ /dev/null @@ -1,82 +0,0 @@ - - -# 2025-08-29 — Small cleanups and quick wins - -This note tracks low-risk cleanups and small improvements discovered today. - -## Summary - -- Implement STT config hot-reload (wire existing `update_config` into the - running pipeline). -- Add a minimal STT metrics panel to the TUI dashboard. -- Remove or update outdated docs (done: `docs/vosk_implementation_gaps.md`). -- Create Criterion bench skeletons (just stubs to start measuring later). -- Minor docs polish (README/TUI notes) and a tiny formatting fix. - -## Tasks - -- [ ] STT config hot reload control path - - - Files: `crates/app/src/stt/processor.rs`, `crates/app/src/stt/vosk.rs` - - Add a control channel/API on `SttProcessor` to receive a new - `TranscriptionConfig` at runtime. - - Apply changes at a safe boundary (Idle or right after `SpeechEnd`) via - `VoskTranscriber::update_config(...)`. - - On failure (e.g., model not found), retain the old recognizer and log an - error; retry later with simple backoff. - - Optional: expose a simple trigger (watch a file path or a one-shot - function called from main). Keep minimal. - - Acceptance: toggling `partial_results` or swapping `model_path` takes - effect without restart and doesn’t panic. - -- [ ] TUI: add basic STT metrics panel - - - File: `crates/app/src/bin/tui_dashboard.rs` - - Show: partial count, final count, error count, and a crude latency - (time from `SpeechStart` to first partial/final). - - Use a simple shared struct similar to `PipelineMetrics` or a small local - snapshot sent over a channel. - - Keep layout minimal (one extra box, no complex graphs). - - Acceptance: metrics update in near real-time while STT runs; panel hides - gracefully if STT is disabled. - -- [x] Remove outdated gap doc - - - File: `docs/vosk_implementation_gaps.md` - - Status: removed — several items are now implemented (unit tests, - persistence, end-to-end WAV test). - -- [ ] Criterion benches — skeleton only - - - Path: `crates/app/benches/` - - Create stubs for: VAD frame processing, STT `accept_frame` on - silence/small speech. - - Don’t wire heavy assets yet; focus on compile-ready placeholders with - `#[ignore]` or feature guards. - - Acceptance: `cargo bench` discovers targets (even if ignored) and - compiles cleanly. - -- [ ] README and docs polish - - - Files: `README.md`, `docs/enhanced_tui_dashboard.md` - - Note current STT status: enabled when `VOSK_MODEL_PATH` or default model - exists. Add one-liner on enabling persistence flags. - - In TUI doc, mark STT metrics as “available” once implemented; otherwise - keep “planned” consistent. - - Acceptance: concise, accurate instructions; no references to removed - docs. - -- [ ] Tiny formatting fix (non-functional) - - - File: `crates/app/src/stt/persistence.rs` - - Minor newline/style glitch around the `handle_vad_event` closing brace - before the `/// Handle transcription event` doc comment. - - Acceptance: tidy formatting without logic changes. - -## Notes - -- Hot-reload edge cases: prefer switching at utterance boundaries to avoid - partial loss. Model loading may be slow — consider `spawn_blocking` if - needed. -- Benchmarks can evolve later; stubs just reserve space and wiring for - measurable growth. diff --git a/docs/tasks/text_injection_strategy_simplification.md b/docs/tasks/text_injection_strategy_simplification.md deleted file mode 100644 index 43e25bbd..00000000 --- a/docs/tasks/text_injection_strategy_simplification.md +++ /dev/null @@ -1,214 +0,0 @@ -# Text Injection Strategy Simplification Analysis - -**Date:** 2025-08-31 -**Status:** Design Decision Required - -## Problem Statement - -The current `StrategyManager` implementation includes sophisticated per-app adaptive behavior with success tracking, cooldowns, and dynamic method reordering. While powerful, this may be over-engineered for our primary target: KDE Plasma on Linux. - -## Proposed Simplification - -### Platform-Based Configuration - -Instead of dynamic per-app adaptation, pass platform context at initialization: - -```rust -pub struct PlatformContext { - os: OperatingSystem, // Linux, Windows, macOS - desktop_environment: Option, // KDE, GNOME, etc. - compositor: Option, // KWin, Mutter, wlroots - distro: Option, // Debian, Fedora, etc. -} - -impl StrategyManager { - pub fn new(platform: PlatformContext, config: InjectionConfig) -> Self { - // Configure static strategy based on platform - let method_order = Self::get_platform_strategy(&platform); - // ... - } -} -``` - -### App Type Categories (Instead of Per-App) - -Replace granular per-app tracking with broad categories: - -```rust -#[derive(Debug, Clone, Copy)] -pub enum AppType { - Terminal, // Konsole, gnome-terminal, alacritty - WebBrowser, // Firefox, Chrome, Edge - IDE, // VS Code, IntelliJ, Kate - Office, // LibreOffice, OnlyOffice - Chat, // Discord, Slack, Element - Generic, // Everything else -} - -// Static configuration per app type -const APP_TYPE_STRATEGIES: &[(AppType, &[InjectionMethod])] = &[ - (AppType::Terminal, &[ - InjectionMethod::YdoToolPaste, - InjectionMethod::Clipboard, - ]), - (AppType::WebBrowser, &[ - InjectionMethod::AtspiInsert, - InjectionMethod::ClipboardAndPaste, - InjectionMethod::Clipboard, - ]), - // ... -]; -``` - -## Analysis: Is This Simplification Worth It? - -### Option 1: Keep Current Implementation As-Is - -**Pros:** -- ✅ Already implemented and tested -- ✅ Self-optimizing without manual configuration -- ✅ Handles edge cases automatically -- ✅ No need to maintain app categorization -- ✅ Works across all platforms without changes - -**Cons:** -- ❌ More complex code to maintain -- ❌ ~5-10ms overhead on first injection per app -- ❌ Memory overhead for success tracking (~1KB per app) -- ❌ May converge to same patterns anyway - -### Option 2: Platform-Based Static Strategy - -**Pros:** -- ✅ Simpler, more predictable behavior -- ✅ Faster (no sorting/adaptation overhead) -- ✅ Easier to debug and reason about -- ✅ Clear documentation of what works where - -**Cons:** -- ❌ Requires maintaining platform detection logic -- ❌ Need to manually optimize for each platform -- ❌ Can't adapt to unexpected app behavior -- ❌ Loses ability to learn from failures - -### Option 3: Hybrid - Platform Base + Optional Adaptation - -**Pros:** -- ✅ Best of both worlds -- ✅ Fast defaults with learning capability -- ✅ Can disable adaptation for simplicity -- ✅ Platform-optimized starting point - -**Cons:** -- ❌ Still maintains complexity in codebase -- ❌ Two code paths to test and maintain - -## Real-World Impact Assessment - -### For KDE Plasma Specifically - -Given that we're targeting KDE Plasma: - -1. **App Uniformity**: Most KDE apps behave similarly (Qt + AT-SPI2) -2. **Limited Variety**: Maybe 20-30 apps total in typical use -3. **Predictable Patterns**: - - Terminals → Need ydotool or clipboard - - Qt Apps → AT-SPI2 works - - GTK Apps → AT-SPI2 works - - Browsers → AT-SPI2 works - -### Memory & Performance - -**Current Implementation Overhead:** -- Memory: ~50KB for strategy manager + ~1KB per app -- CPU: ~5ms on first injection, <0.1ms cached -- **Total Impact**: Negligible for human-speed dictation - -**Simplified Implementation:** -- Memory: ~10KB static configuration -- CPU: ~0.5ms constant time -- **Savings**: ~40KB memory, 4.5ms on first injection - -## Recommendation - -### Keep Current Implementation, But Configure It - -The existing implementation is **not complex enough to justify refactoring**. Instead: - -1. **Add Platform Hints** to configuration: -```rust -// In InjectionConfig -pub struct InjectionConfig { - // Existing fields... - - // New platform hints - pub platform_hint: Option, - pub disable_adaptation: bool, // Turn off per-app learning - pub force_method_order: Option>, // Override -} - -pub struct PlatformHint { - pub environment: &'static str, // "kde-plasma", "gnome", etc. - pub prefer_methods: Vec, -} -``` - -2. **Provide Presets**: -```rust -impl InjectionConfig { - pub fn kde_plasma_preset() -> Self { - Self { - disable_adaptation: false, // Keep learning on - platform_hint: Some(PlatformHint { - environment: "kde-plasma", - prefer_methods: vec![ - InjectionMethod::AtspiInsert, - InjectionMethod::ClipboardAndPaste, - ], - }), - ..Default::default() - } - } -} -``` - -3. **Document Platform Best Practices**: -- KDE Plasma: AT-SPI2 → Clipboard → ydotool -- GNOME: AT-SPI2 → Clipboard -- Sway/wlroots: Clipboard → wtype -- X11: xdotool → Clipboard - -## Decision Points - -1. **Is 50KB memory overhead significant?** → No, negligible for desktop app -2. **Is 5ms first-injection overhead significant?** → No, human dictation is slower -3. **Does per-app tracking provide value?** → Yes, terminals vs GUI apps -4. **Is the code too complex to maintain?** → No, it's well-structured and tested - -## Conclusion - -**Don't simplify.** The current implementation is: -- Already working -- Not causing performance issues -- Provides valuable adaptation -- Well-tested - -Instead, **add configuration helpers** for specific platforms to make the system easier to use while keeping the adaptive capabilities. - -### Action Items - -If we proceed with keeping current implementation: -1. ✅ Add `kde_plasma_preset()` configuration helper -2. ✅ Add `disable_adaptation` flag for users who want static behavior -3. ✅ Document recommended configurations per platform -4. ✅ Consider adding app type detection as hint (not replacement) for initial ordering - -If we proceed with simplification: -1. ⚠️ Implement platform detection -2. ⚠️ Create static method ordering per platform -3. ⚠️ Remove per-app success tracking -4. ⚠️ Maintain app type categorization - -### Final Recommendation - -**Keep the existing implementation.** It's not broken, not slow, and provides value. Add platform-specific configuration helpers to make it easier to use. The complexity is already paid for and tested - removing it provides minimal benefit while losing adaptive capabilities that handle edge cases automatically. \ No newline at end of file diff --git a/docs/tasks/workspace_split_tasks.md b/docs/tasks/workspace_split_tasks.md deleted file mode 100644 index 156f684e..00000000 --- a/docs/tasks/workspace_split_tasks.md +++ /dev/null @@ -1,256 +0,0 @@ -# ColdVox workspace split: phased task plan - -This document turns the crate-split proposal into concrete, trackable tasks with phases, checklists, and acceptance criteria. - -## Goals -- Isolate heavy/optional deps (Vosk, ONNX) behind feature-gated crates -- Improve incremental compile times and reuse of stable components -- Clarify boundaries via thin, testable public APIs -- Keep `cargo run` usable by default (VAD-only, no STT requirement) - -## Non-goals -- Public publishing to crates.io (can be a follow-up) -- Big behavior changes; this is a surgical extraction - -## Target workspace layout -- crates/coldvox-telemetry: metrics types -- crates/coldvox-foundation: app scaffolding (state, shutdown, health, errors/config) -- crates/coldvox-audio: device/capture/ring buffer/chunker/watchdog/silence detector -- crates/coldvox-vad: VAD config/types/state machine/events (no ONNX) -- crates/coldvox-vad-silero: Silero ONNX wrapper (feature = `silero`) -- crates/coldvox-stt: STT processor/traits (no Vosk) -- crates/coldvox-stt-vosk: Vosk transcriber (feature = `vosk`) -- crates/coldvox-text-injection: text injection session/processor (fast-tracked) -- crates/coldvox-gui (stub): future GUI binary crate (optional; see GUI phase) -- crates/app: thin orchestrator/binaries (main, TUI, probes) - -Feature passthrough: `vosk`, `silero` (default on if desired), `level3` (energy VAD optional). - ---- - -## Phase 0 – Prep and safety rails - -Why: Improve DX immediately and reduce churn during the split. - -Tasks -- [ ] Make `vosk` an optional dependency in `crates/app` and wire `features.vosk = ["dep:vosk"]` -- [ ] Remove `required-features = ["vosk"]` from the `coldvox` bin so `cargo run` works VAD-only -- [ ] Guard all STT code paths with `#[cfg(feature = "vosk")]` -- [ ] Update README/docs run instructions to reflect VAD-only default - -Acceptance criteria -- [ ] `cargo run` builds and runs without libvosk installed -- [ ] `cargo run --features vosk` enables STT paths - -Risks -- Incomplete cfg gates; Mitigation: compile both with and without `--features vosk` in CI. - ---- - -## Phase 1 – Extract telemetry (small, low-risk) - -Tasks -- [ ] Create `crates/coldvox-telemetry` with `PipelineMetrics` and related types -- [ ] Move telemetry code from `crates/app/src` to the new crate -- [ ] Update imports; add dependency in `crates/app/Cargo.toml` -- [ ] Unit tests compile and pass - -Acceptance criteria -- [ ] App builds; metrics increment as before (smoke via logs) - ---- - -## Phase 2 – Fast-track text injection extraction (large subsystem) - -Why now: Text injection is already substantial and pulls desktop-/platform-specific dependencies (atspi, wl-clipboard-rs, enigo, kdotool, etc.) unrelated to audio/VAD/STT. Isolating it early prevents feature leakage and keeps the main app's dependency graph lean. - -Tasks -- [ ] Create `crates/coldvox-text-injection` as a library crate -- [ ] Move `text_injection/{session.rs,processor.rs}` and related configs -- [ ] Introduce backend features: `atspi`, `wl_clipboard`, `enigo`, `xdg_kdotool` (names tentative); make all optional by default -- [ ] Define a stable trait boundary (e.g., `TextInjector`, `TextInjectionSession`) and rework call sites to depend on the trait -- [ ] Update TUI/examples to compile without any text-injection features enabled; wire optional usage behind `#[cfg(feature = "text-injection")]` -- [ ] Document backend support matrix and env/Wayland requirements - -Acceptance criteria -- [ ] `cargo build` succeeds with no text-injection features enabled -- [ ] Enabling a backend feature compiles on supported DE/WM; when unsupported, the crate cleanly disables with helpful messages -- [ ] No new deps appear in the default `cargo run` path - -Risks -- Backend-specific runtime quirks; Mitigation: keep each backend behind separate feature flags and guard with runtime checks/logging. - ---- - -## Phase 3 – Extract foundation (state/shutdown/health/errors) - -Tasks -- [ ] Create `crates/coldvox-foundation` (deps: tracing, thiserror, anyhow optional) -- [ ] Move `foundation/{state,shutdown,health,error}.rs` into lib -- [ ] Define a minimal public API for `AppState`, `StateManager`, `ShutdownHandler`, `HealthMonitor`, `AppError`, `AudioError`, `AudioConfig` -- [ ] Update `crates/app` to depend on `coldvox-foundation` -- [ ] Run the foundation probe example to sanity-check - -Acceptance criteria -- [ ] App and probes build; shutdown and state transitions behave as before - -Risks -- Type relocation ripples; Mitigation: re-export via `pub use` temporarily in app if needed during transition. - ---- - -## Phase 4 – Extract audio - -Tasks -- [ ] Create `crates/coldvox-audio` (deps: cpal, rtrb, dasp, rubato, parking_lot) -- [ ] Move `audio/{device,capture,ring_buffer,watchdog,detector,chunker}.rs` -- [ ] Public API: `DeviceManager`, `AudioCaptureThread::spawn`, `FrameReader`, `AudioChunker` and `ChunkerConfig`, `Watchdog`; frame contract: 512 samples @ 16kHz -- [ ] Depend on `coldvox-foundation` for errors/config; on `coldvox-telemetry` for metrics -- [ ] Update app wiring; run `mic_probe` and existing audio tests - -Acceptance criteria -- [ ] `mic_probe` runs; logs show watchdog feed and 512-sample chunking -- [ ] Backpressure behavior unchanged (drops when ring full) - -Risks -- CPAL format negotiation; Mitigation: preserve existing device selection code; add a smoke test using the bundled test wavs if present - ---- - -## Phase 5 – Extract VAD (core + silero) - -Tasks -- [ ] Create `crates/coldvox-vad` (no ONNX deps) -- [ ] Define `VadEngine` trait, `VadEvent`, `UnifiedVadConfig` (frames: 512 @ 16kHz) -- [ ] Move VAD state machine and config into this crate -- [ ] Create `crates/coldvox-vad-silero` (deps behind `silero` feature) implementing `VadEngine` -- [ ] Replace Git dep `voice_activity_detector` with local `coldvox-vad-silero` path dep -- [ ] Optionally add `level3` energy VAD behind feature -- [ ] Update app and examples; run VAD tests/examples - -Acceptance criteria -- [ ] VAD examples/tests pass; speech start/end events mirror current behavior -- [ ] ONNX runtime only compiles when `--features silero` is set - -Risks -- ONNX runtime loading issues; Mitigation: support dynamic runtime via feature, keep current runtime binaries under `runtimes/` if needed - ---- - -## Phase 6 – Extract STT (core + vosk) - -Tasks -- [ ] Create `crates/coldvox-stt` with `Transcriber` trait, `TranscriptionEvent`, `TranscriptionConfig`, processor gated by VAD events -- [ ] Create `crates/coldvox-stt-vosk` with the Vosk implementation (feature = `vosk`) -- [ ] Ensure model path default (env `VOSK_MODEL_PATH` or `models/vosk-model-small-en-us-0.15`) -- [ ] Update app/TUI wiring; guard with `#[cfg(feature = "vosk")]` -- [ ] Run `vosk_test` example with and without feature - -Acceptance criteria -- [ ] App builds and runs without Vosk; STT paths active only when `--features vosk` - -Risks -- System lib presence; Mitigation: docs note and CI job that skips STT by default - ---- - -## Phase 7 – GUI stub (optional, future-facing) - -Why now: Create a minimal GUI crate skeleton to decouple GUI dependencies and give it a place to grow without affecting app core. Keep it OFF by default and buildable trivially. - -Tasks -- [ ] Create `crates/coldvox-gui` (binary crate) with a minimal `main.rs` that prints version and exits -- [ ] No GUI toolkit dependency yet (placeholder). Optionally add a feature-gated dependency placeholder (e.g., `egui` or `gtk`) but keep disabled by default -- [ ] Wire workspace member, add a `[[bin]]` name `coldvox-gui` -- [ ] Add a short README stating goals and future toolkit evaluation criteria - -Acceptance criteria -- [ ] `cargo run -p coldvox-gui` prints a stub message without pulling extra deps into the default app build -- [ ] No changes to `crates/app` runtime behavior - -Risks -- Premature dependency lock-in; Mitigation: avoid selecting a GUI toolkit until requirements are clearer; keep the crate dependency-free for now. - ---- - -## Phase 8 – TUI separation (optional) - -Tasks -- [ ] Option A: keep binaries in `crates/app` -- [ ] Option B: move TUI to `crates/coldvox-tui` and depend on split crates - -Acceptance criteria -- [ ] Same user-facing commands continue to work (documented in README) - ---- - -## Phase 9 – CI matrix and caching - -Tasks -- [ ] Add workflow to build/test default features on Linux -- [ ] Add a matrix job for feature combos: `{silero, level3} x {vosk on/off}` minimal coverage -- [ ] Cache target per-feature if build times regress notably - -Acceptance criteria -- [ ] CI green across chosen matrix; default job runs fast - ---- - -## Phase 10 – Docs and runbooks - -Tasks -- [ ] Update README: workspace layout, quickstart (VAD-only), feature flags -- [ ] Add `crates/*/README.md` with crate purpose and API sketch -- [ ] Update docs under `docs/` for tuning knobs and new crate paths - -Acceptance criteria -- [ ] A newcomer can build/run VAD-only and enable STT via a documented flag - ---- - -## Contracts and APIs (sketch) - -- Audio frames: 512-sample i16 at 16kHz. Prefer `&[i16]` or `Arc<[i16; 512]>` across crate boundaries -- VAD: `VadEngine::process(frame) -> Result`; `VadEvent::{SpeechStart, SpeechEnd}` -- STT: `Transcriber::feed(frame)`; emits `TranscriptionEvent::{Partial, Final, Error}` via channel -- Errors: central `AppError/AudioError` in foundation; re-export as needed - -Edge cases -- No device / format mismatch -- Ring buffer full (drop-on-full behavior) -- Watchdog inactivity (>5s) triggers recovery -- Silero window misalignment: reject non-512 frames with a clear error -- Vosk model path missing: STT disabled with a warning - ---- - -## Rollout and verification checklist - -- [ ] Build + clippy + tests pass after each phase -- [ ] VAD-only run tested locally -- [ ] STT run tested with model present -- [ ] TUI dashboard smoke: logs update, status shows last transcript when STT enabled -- [ ] Log file rotation still works (appender wiring) - ---- - -## Next actions (Do this week) - -1) Phase 0: fix `vosk` optional gating and remove `required-features` from `coldvox` bin -2) Phase 1: extract `coldvox-telemetry` (fast win), wire into app -3) Phase 2: extract `coldvox-text-injection` (fast-tracked), scaffold backend features; wire to app/TUI behind features -4) Phase 3: extract `coldvox-foundation`, wire probes -5) Re-assess and proceed with audio extraction - -Optional commands (fish) -```fish -# VAD-only -cargo run - -# With STT (requires libvosk + model) -cargo run --features vosk - -# Run examples -cargo run --example vad_demo -cargo run --example vosk_test --features vosk -``` diff --git a/docs/text_injection_focus_ime.md b/docs/text_injection_focus_ime.md deleted file mode 100644 index 6db38db0..00000000 --- a/docs/text_injection_focus_ime.md +++ /dev/null @@ -1,170 +0,0 @@ -# Text Injection Focus and IME/Localization Guidance - -## Overview - -This document outlines the focus tracking and Input Method Editor (IME) considerations for reliable text injection across different platforms and locales. - -## Focus Tracking Requirements - -### Current Limitations -- Focus detection is currently basic and may not handle complex window hierarchies -- AT-SPI integration is incomplete for focus tracking -- No real-time focus change monitoring - -### AT-SPI Roadmap - -#### Required FocusTracker Methods -```rust -impl FocusTracker { - /// Get the currently focused accessible element - pub async fn get_focused_element(&self) -> Result; - - /// Check if the focused element supports paste operations - pub async fn supports_paste_action(&self) -> Result; - - /// Get application identifier from focused element - pub async fn get_app_id(&self) -> Result; -} -``` - -#### Implementation Priority -1. **Phase 1**: Basic focus detection (current) -2. **Phase 2**: AT-SPI element inspection -3. **Phase 3**: Real-time focus monitoring -4. **Phase 4**: Cross-process focus validation - -## IME and Localization Considerations - -### IME-Heavy Environments -- **East Asian Languages**: Chinese, Japanese, Korean require IME for text input -- **Complex Scripts**: Arabic, Hebrew, Devanagari may have IME dependencies -- **Mobile/Desktop Convergence**: Increasing IME usage on desktop - -### Injection Strategy for IME - -#### Current Auto Mode Logic -```rust -let use_paste = match config.injection_mode.as_str() { - "paste" => true, - "keystroke" => false, - "auto" => { - // Current: length-based threshold - text.len() > config.paste_chunk_chars as usize - } - _ => text.len() > config.paste_chunk_chars as usize, -}; -``` - -#### Future IME-Aware Logic -```rust -let use_paste = match config.injection_mode.as_str() { - "paste" => true, - "keystroke" => false, - "auto" => { - // Future: IME-aware decision - text.len() > config.paste_chunk_chars as usize || - self.should_use_paste_for_ime(text).await - } - _ => text.len() > config.paste_chunk_chars as usize, -}; -``` - -### IME Detection Methods - -#### Configuration-Based -```toml -[text_injection] -# Future: Prefer paste for IME environments -prefer_paste_for_ime = true - -# Current: Rely on length thresholds -paste_chunk_chars = 50 -``` - -#### Runtime Detection -- Check for active IME processes -- Monitor keyboard layout changes -- Detect non-ASCII character patterns -- Query system IME status - -## Platform-Specific Considerations - -### Wayland -- **Virtual Keyboard**: Portal/wlr virtual keyboard not yet implemented -- **Clipboard + AT-SPI**: Most reliable current approach -- **Focus Tracking**: Requires AT-SPI for accurate element focus - -### X11 -- **xdotool Path**: Available for fallback injection -- **Window Properties**: WM_CLASS for application identification -- **IME Integration**: Varies by desktop environment - -### Windows/macOS -- **Native APIs**: Platform-specific focus and IME detection -- **Accessibility APIs**: Required for reliable injection -- **Virtual Keyboard**: May be available through system APIs - -## Action Items - -### Immediate (Phase 2) -1. Implement AT-SPI focus element inspection -2. Add basic IME language detection -3. Improve focus validation before injection - -### Medium-term (Phase 3) -1. Add real-time focus monitoring -2. Implement IME-aware injection decisions -3. Add platform-specific IME detection - -### Long-term (Phase 4) -1. Virtual keyboard integration (Wayland portal) -2. Advanced IME state tracking -3. Cross-platform IME compatibility layer - -## Testing Scenarios - -### IME Testing Matrix -- **Locale**: en_US, zh_CN, ja_JP, ko_KR, ar_SA -- **IME State**: Active, Inactive, Switching -- **Text Types**: ASCII-only, Mixed, Unicode-only -- **Injection Methods**: Paste, Keystroke, Auto - -### Focus Testing -- **Window Types**: Native, Web, Terminal, IDE -- **Focus Changes**: During injection, Between injections -- **Modal Dialogs**: System dialogs, Application modals -- **Multi-Monitor**: Focus across displays - -## Configuration Recommendations - -### Conservative Settings (Default) -```toml -[text_injection] -# Prefer paste for reliability -injection_mode = "paste" -# Shorter chunks to avoid IME issues -paste_chunk_chars = 20 -``` - -### Performance-Optimized -```toml -[text_injection] -# Allow keystroke for short text -injection_mode = "auto" -# Balance between IME safety and performance -paste_chunk_chars = 50 -``` - -## Troubleshooting - -### Common IME Issues -1. **Text Not Appearing**: IME consuming keystrokes -2. **Wrong Characters**: Encoding mismatches -3. **Focus Loss**: IME switching focus during injection -4. **Composition Conflicts**: IME composition mode interference - -### Mitigation Strategies -1. **Force Paste Mode**: For IME-heavy applications -2. **Focus Validation**: Ensure target has focus before injection -3. **Timing Adjustments**: Account for IME processing delays -4. **Fallback Chains**: Multiple injection attempts with different methods \ No newline at end of file diff --git a/docs/text_injection_implementation_actual.md b/docs/text_injection_implementation_actual.md deleted file mode 100644 index 46abd6b9..00000000 --- a/docs/text_injection_implementation_actual.md +++ /dev/null @@ -1,261 +0,0 @@ -# ColdVox Text Injection System - Actual Implementation Overview - -**Last Updated:** 2025-08-31 -**Status:** Implementation Complete (Dependencies Missing in Cargo.toml) - -## Executive Summary - -The ColdVox text injection system is a sophisticated, multi-backend text injection framework designed for reliability on Linux desktop environments. Unlike the original over-engineered plans that envisioned complex ML-based adaptive systems, the actual implementation delivers a pragmatic solution focused on **immediate reliability** with smart fallbacks. - -## Core Architecture - -### Design Philosophy - -The implemented system prioritizes: -- **Immediate injection** over complex session buffering (0ms default timeout) -- **Multiple fallback methods** over perfect single-method reliability -- **Pragmatic defaults** over theoretical completeness -- **Always-working fallback** (NoOp injector) over total failure - -### Key Components - -#### 1. TextInjector Trait -```rust -#[async_trait] -pub trait TextInjector: Send + Sync { - fn name(&self) -> &'static str; - fn is_available(&self) -> bool; - async fn inject(&mut self, text: &str) -> Result<(), InjectionError>; - async fn type_text(&mut self, text: &str, rate_cps: u32) -> Result<(), InjectionError>; - async fn paste(&mut self, text: &str) -> Result<(), InjectionError>; - fn metrics(&self) -> &InjectionMetrics; -} -``` - -#### 2. Strategy Manager - -The `StrategyManager` orchestrates injection with: -- **Adaptive method selection** based on per-app success rates -- **Exponential backoff cooldowns** for failed methods (10s → 20s → 40s, max 5min) -- **Budget control** (800ms global timeout) -- **Application filtering** via regex-based allow/blocklists - -#### 3. Backend Detection - -Runtime platform detection identifies available capabilities: -- Wayland (XDG Portal, Virtual Keyboard) -- X11 (xdotool, Native wrapper) -- External tools (ydotool, kdotool) -- Platform-specific features (macOS CGEvent, Windows SendInput) - -## Implemented Injection Methods - -### Primary Methods (Always Available) - -#### 1. **NoOpInjector** ✅ -- **Purpose:** Guaranteed fallback that never fails -- **Implementation:** Logs but performs no action -- **Always last** in method priority - -### Feature-Gated Methods (Require Dependencies) - -#### 2. **AtspiInjector** ✅ -- **Purpose:** Primary method for Wayland/GNOME/KDE -- **Implementation:** AT-SPI2 accessibility protocol -- **Features:** Direct text insertion, paste action triggering -- **Availability:** Wayland sessions only - -#### 3. **ClipboardInjector** ✅ -- **Purpose:** Reliable batch text via system clipboard -- **Implementation:** Native Wayland clipboard operations -- **Features:** Save/restore clipboard contents -- **Availability:** Wayland with `wl-clipboard-rs` - -#### 4. **ComboClipboardAtspiInjector** ✅ -- **Purpose:** Best of both worlds approach -- **Implementation:** Sets clipboard, then triggers AT-SPI paste -- **Features:** 50ms settling delay, focus validation -- **Availability:** Wayland with both clipboard and AT-SPI - -### Opt-In Methods (Disabled by Default) - -#### 5. **YdotoolInjector** ✅ -- **Purpose:** Universal fallback with elevated permissions -- **Implementation:** External binary + daemon -- **Requirements:** User in `input` group, ydotoold running -- **Config:** `allow_ydotool: false` (default) - -#### 6. **EnigoInjector** ✅ -- **Purpose:** Library-based synthetic input -- **Implementation:** Character-by-character typing -- **Limitations:** ASCII-only -- **Config:** `allow_enigo: false` (default) - -#### 7. **MkiInjector** ✅ -- **Purpose:** Low-level uinput events -- **Implementation:** Direct `/dev/uinput` access -- **Requirements:** Input group membership -- **Config:** `allow_mki: false` (default) - -#### 8. **KdotoolInjector** ✅ (Special) -- **Purpose:** Window management helper (not text injection) -- **Implementation:** KDE window activation/focus -- **Use Case:** Assists other injectors on KDE -- **Config:** `allow_kdotool: false` (default) - -## Key Simplifications from Original Plans - -### What Was Planned vs What Was Built - -| Planned Feature | Actual Implementation | -|-----------------|----------------------| -| Complex session buffering with ML timing | Immediate injection (0ms timeout) | -| Event-driven AT-SPI focus tracking | Simple polling-based focus check | -| Per-app ML-based method selection | Success rate tracking with simple sorting | -| Comprehensive focus detection | Best-effort with `inject_on_unknown_focus: true` | -| 10+ injection methods | 8 methods with clear priority | -| Complex state machines | Simplified pass-through session logic | - -### Pragmatic Defaults - -```rust -InjectionConfig { - silence_timeout_ms: 0, // Immediate injection - inject_on_unknown_focus: true, // Don't block on focus detection - require_focus: false, // Work even without focus - allow_ydotool: false, // Security-conscious defaults - global_timeout_ms: 800, // Quick failure detection - cooldown_initial_ms: 10000, // Reasonable retry delays -} -``` - -## Session Management - -While fully implemented, the session system effectively operates as a pass-through: - -**State Machine:** `Idle → Buffering → WaitingForSilence → ReadyToInject` - -**Reality:** With 0ms timeouts, transcriptions immediately trigger injection. - -**Features Available (but unused by default):** -- Buffering multiple transcriptions -- Punctuation-based flushing -- Size-based overflow protection -- Configurable silence detection - -## Focus Detection - -**Implementation Status:** Stubbed but functional - -```rust -// Current implementation always returns Unknown -async fn check_focus_status(&self) -> Result { - Ok(FocusStatus::Unknown) // Placeholder -} -``` - -**Mitigation:** System proceeds with injection anyway (`inject_on_unknown_focus: true`) - -## Integration with ColdVox Pipeline - -### STT to Injection Flow - -``` -STT Processor → TranscriptionEvent → Broadcast Channel - ↓ - AsyncInjectionProcessor - ↓ - InjectionSession - ↓ - StrategyManager - ↓ - TextInjector::inject() -``` - -### Main Application Integration - -- Feature-gated via `--features text-injection` -- CLI configuration for all parameters -- Environment variable support -- Shared metrics with pipeline telemetry - -## Critical Configuration Issue - -**The system won't compile** due to missing dependencies in `Cargo.toml`: - -### Missing Dependencies -```toml -# These need to be added to Cargo.toml: -atspi = { version = "0.28", optional = true } -wl-clipboard-rs = { version = "0.9", optional = true } -enigo = { version = "0.2", optional = true } -mouse-keyboard-input = { version = "0.9", optional = true } -``` - -### Missing Feature Flags -```toml -# These features are referenced but not defined: -text-injection-atspi = ["text-injection", "atspi"] -text-injection-clipboard = ["text-injection", "wl-clipboard-rs"] -text-injection-enigo = ["text-injection", "enigo"] -text-injection-mki = ["text-injection", "mouse-keyboard-input"] -``` - -## Test Coverage - -### Comprehensive Testing -- **Unit tests** for all core components -- **Integration tests** for end-to-end flow -- **Adaptive strategy tests** for cooldown and priority -- **Focus tracking tests** for caching behavior -- **Unicode handling** for text chunking - -### Test Gaps -- Backend-specific integration tests -- Real desktop environment testing -- Permission and capability validation -- Cross-platform behavior - -## Metrics and Observability - -The system tracks comprehensive metrics: -- Per-method success rates and latencies -- Character counts (buffered vs injected) -- Cooldown and backend denial counters -- Rate limiting and focus errors -- Injection latency histograms - -## Security Considerations - -- **Opt-in for privileged methods** (ydotool, uinput) -- **Text redaction** in logs by default -- **Application filtering** via allow/blocklists -- **No elevated permissions** for primary methods - -## Performance Characteristics - -- **800ms global budget** for all injection attempts -- **250ms per-method timeout** -- **20 characters/second** keystroke rate -- **500 character chunks** for paste operations -- **200ms focus cache** duration - -## Conclusion - -The ColdVox text injection system represents a **pragmatic triumph over academic complexity**. By simplifying from the original plans while maintaining robust fallback mechanisms, the implementation delivers: - -1. **Reliable text injection** that works immediately -2. **Multiple fallback paths** for different environments -3. **Security-conscious defaults** with opt-in for privileged operations -4. **Comprehensive observability** through metrics and logging -5. **Clean architecture** that's testable and maintainable - -The main barrier to deployment is adding the missing dependencies to `Cargo.toml`. Once that's resolved, the system is production-ready for Linux desktop environments, particularly Wayland-based systems like KDE Plasma and GNOME. - -## Next Steps - -1. **Fix Cargo.toml** - Add missing dependencies and feature flags -2. **Enable primary methods** - Test with AT-SPI and clipboard on target system -3. **Configure for environment** - Adjust timeouts and methods for specific desktop -4. **Monitor metrics** - Use telemetry to optimize method ordering -5. **Consider session buffering** - If natural dictation flow is needed, increase timeouts \ No newline at end of file diff --git a/docs/text_injection_setup.md b/docs/text_injection_setup.md deleted file mode 100644 index 38fd52e7..00000000 --- a/docs/text_injection_setup.md +++ /dev/null @@ -1,163 +0,0 @@ -# Text Injection Setup for ColdVox - -## Overview - -ColdVox includes text injection capabilities to automatically type recognized speech into any focused application on KDE Plasma Wayland. This uses the most reliable methods available in 2024-2025. - -## Quick Setup - -Run the automated setup script: - -```bash -./scripts/setup_text_injection.sh -``` - -This will: -1. Install required tools (wl-clipboard, ydotool) -2. Configure uinput permissions -3. Add your user to the input group -4. Enable the ydotool service -5. Test the setup - -**Important:** After running the script, log out and log back in for group changes to take effect. - -## Manual Setup - -### Required Tools - -```bash -# Fedora/Nobara -sudo dnf install -y wl-clipboard ydotool - -# Arch/EndeavourOS -sudo pacman -S wl-clipboard ydotool - -# Ubuntu/Debian -sudo apt install -y wl-clipboard ydotool -``` - -### Configure Permissions - -1. Create udev rule for uinput access: -```bash -echo 'KERNEL=="uinput", GROUP="input", MODE="0660", OPTIONS+="static_node=uinput"' | \ - sudo tee /etc/udev/rules.d/99-uinput.rules -sudo udevadm control --reload-rules -sudo udevadm trigger -``` - -2. Add your user to the input group: -```bash -sudo usermod -a -G input $USER -# Log out and log back in after this -``` - -3. Enable ydotool service: -```bash -sudo systemctl enable --now ydotool -``` - -## How It Works - -The text injection system uses a multi-tier approach: - -1. **Primary Method:** `wl-clipboard` + `ydotool` paste - - Sets clipboard with recognized text - - Simulates Ctrl+V to paste - - Most reliable method - -2. **Fallback:** Direct typing with `ydotool` - - Types text character by character - - Slower but works when paste fails - -3. **Last Resort:** Clipboard only - - Sets clipboard and notifies user - - User manually pastes with Ctrl+V - -## Testing - -Test the injection manually: - -```bash -# Test clipboard -echo "Test text" | wl-copy -wl-paste # Should output "Test text" - -# Test ydotool -echo "Hello World" | wl-copy && ydotool key ctrl+v - -# Test direct typing -ydotool type "Hello from ydotool" -``` - -## Troubleshooting - -### ydotool not working - -1. Check if you're in the input group: -```bash -groups | grep input -``` - -2. Check if the service is running: -```bash -systemctl status ydotool -``` - -3. Check uinput permissions: -```bash -ls -l /dev/uinput -``` - -### Clipboard not working - -Ensure you're running under Wayland: -```bash -echo $WAYLAND_DISPLAY -``` - -### Text not appearing - -1. Some applications may block automated input -2. Try clicking in the text field first -3. Check if the application is running under XWayland - -## Security Notes - -- Text injection requires access to `/dev/uinput` -- Being in the `input` group allows keyboard/mouse simulation -- Only grant these permissions to trusted users -- The system respects Wayland's security model - -## Optional Enhancements - -### Install kdotool (improves focus detection) - -kdotool helps ensure text is injected into the correct window: - -```bash -# From source -git clone https://github.com/jinliu/kdotool -cd kdotool -make && sudo make install - -# On Arch (AUR) -yay -S kdotool -``` - -## Architecture - -The implementation is in the `crates/coldvox-text-injection/` crate and follows 2024-2025 best practices for KDE Plasma Wayland: - -- Automatic capability detection -- Graceful fallbacks -- Production-ready error handling -- Minimal dependencies (no complex crates) -- Based on proven tools (ydotool, wl-clipboard) - -## Known Limitations - -- Some sandboxed applications (Flatpak) may not accept input -- Portal-based permission systems add UX friction -- XWayland applications may have different behavior -- Virtual keyboard protocols are not fully supported in KWin \ No newline at end of file diff --git a/docs/text_injection_testing.md b/docs/text_injection_testing.md deleted file mode 100644 index 4f671f8f..00000000 --- a/docs/text_injection_testing.md +++ /dev/null @@ -1,270 +0,0 @@ -# Text Injection Testing Matrix and Scenarios - -## Overview - -This document outlines comprehensive testing scenarios for the text injection system, covering functional, performance, security, and edge case testing. - -## Test Categories - -### 1. Functional Tests - -#### Backend Detection and Selection -- **Scenario**: System with multiple backends available -- **Steps**: - 1. Start with Wayland + AT-SPI available - 2. Disable AT-SPI service - 3. Verify fallback to clipboard-only mode - 4. Re-enable AT-SPI - 5. Verify preferred backend selection -- **Acceptance Criteria**: - - Backend detection completes within 100ms - - Correct backend selected based on availability - - No crashes during backend switching - -#### Focus State Handling -- **Scenario**: Wayland + AT-SPI off, unknown focus handling -- **Test Case 1**: `inject_on_unknown_focus = true` - - **Steps**: - 1. Set focus to unknown state - 2. Attempt text injection - 3. Verify injection proceeds - - **Acceptance Criteria**: - - Injection succeeds - - `focus_missing` metric not incremented - -- **Test Case 2**: `inject_on_unknown_focus = false` - - **Steps**: - 1. Set focus to unknown state - 2. Attempt text injection - 3. Verify injection blocked - - **Acceptance Criteria**: - - Injection fails with appropriate error - - `focus_missing` metric incremented - -### 2. Performance Tests - -#### Large Transcript Handling -- **Scenario**: Paste chunking with large text -- **Steps**: - 1. Generate 10KB text transcript - 2. Configure `paste_chunk_chars = 1000` - 3. Inject text - 4. Monitor chunk processing -- **Acceptance Criteria**: - - Text split into correct chunk sizes (±10%) - - Per-chunk pacing respected (min 50ms between chunks) - - Total budget not exceeded - - All chunks injected successfully - -#### Keystroke Pacing -- **Scenario**: High-frequency keystroke injection -- **Configuration**: `rate_cps = 30` -- **Steps**: - 1. Generate burst of 100 characters - 2. Inject using keystroke method - 3. Measure inter-keystroke timing -- **Acceptance Criteria**: - - Average rate within 25-35 CPS - - Jitter tolerance: ±20ms - - No character loss - - Rate limiting triggers correctly - -### 3. Fallback Cascade Tests - -#### Complete Fallback Chain -- **Scenario**: Progressive backend failure -- **Steps**: - 1. Start with AT-SPI enabled - 2. Inject text - verify AT-SPI used - 3. Disable AT-SPI paste action - 4. Inject text - verify clipboard+paste fallback - 5. Disable clipboard - 6. Inject text - verify clipboard-only fallback - 7. Enable ydotool - 8. Inject text - verify ydotool fallback -- **Acceptance Criteria**: - - Each fallback attempted in correct order - - Success recorded for working fallback - - Appropriate error for failed methods - - No infinite loops - -### 4. Security and Privacy Tests - -#### Allowlist/Blocklist Functionality -- **Test Case 1**: Allow-only mode - - **Configuration**: - ```toml - allowlist = ["firefox", "chromium"] - ``` - - **Steps**: - 1. Focus terminal application - 2. Attempt injection - 3. Focus Firefox - 4. Attempt injection - - **Acceptance Criteria**: - - Terminal injection blocked - - Firefox injection allowed - -- **Test Case 2**: Block specific applications - - **Configuration**: - ```toml - blocklist = ["terminal"] - ``` - - **Steps**: - 1. Focus terminal - 2. Attempt injection - 3. Focus text editor - 4. Attempt injection - - **Acceptance Criteria**: - - Terminal injection blocked - - Text editor injection allowed - -#### Regex Pattern Handling -- **Test Case 1**: Valid regex patterns - - **Configuration**: - ```toml - allowlist = ["^firefox$", "chromium.*"] - ``` - - **Steps**: - 1. Test various window class names - 2. Verify correct matching - - **Acceptance Criteria**: - - Valid patterns work correctly - - Performance impact minimal - -- **Test Case 2**: Invalid regex patterns - - **Configuration**: - ```toml - allowlist = ["[invalid", "^firefox$"] - ``` - - **Steps**: - 1. Attempt injection with invalid pattern - 2. Check logs for warnings - 3. Verify valid patterns still work - - **Acceptance Criteria**: - - Invalid pattern logged as warning - - Invalid pattern skipped - - Valid patterns continue working - - No crashes - -#### Privacy Logging -- **Scenario**: Log content verification -- **Steps**: - 1. Enable debug logging temporarily - 2. Inject sensitive text - 3. Review log output - 4. Disable debug logging -- **Acceptance Criteria**: - - Normal logs show only length/hash - - Debug logs show full text (when explicitly enabled) - - No accidental plaintext in production logs - -## Test Environment Setup - -### System Requirements -- **Wayland**: GNOME/KDE Plasma -- **X11**: Fallback testing -- **AT-SPI**: Accessibility services enabled -- **Tools**: wl-clipboard, ydotool, kdotool installed - -### Test Applications -- **Terminal**: gnome-terminal, konsole -- **Browser**: Firefox, Chromium -- **Editor**: gedit, kate, VS Code -- **Office**: LibreOffice - -## Automated vs Manual Tests - -### Automated Tests -- Backend detection -- Configuration validation -- Basic injection success/failure -- Performance metrics -- Memory usage -- Error handling - -### Manual Tests -- Visual confirmation of injection -- Cross-application testing -- IME interaction -- Focus state verification -- Log content review - -## Test Data - -### Sample Texts -- **Short**: "Hello world" -- **Medium**: 500-character paragraph -- **Long**: 10KB technical documentation -- **Unicode**: Mixed ASCII/Unicode content -- **Special**: Control characters, newlines, tabs - -### Window Classes -- `firefox` -- `chromium-browser` -- `gnome-terminal` -- `code` -- `gedit` - -## Metrics and Monitoring - -### Key Metrics to Verify -- `chars_buffered`: Accurate character counting -- `chars_injected`: Matches actual injected content -- `successes`/`failures`: Correct incrementing -- `latency_samples`: Realistic timing values -- `rate_limited`: Triggers on budget exhaustion - -### Performance Benchmarks -- **Cold Start**: <500ms to first injection -- **Hot Path**: <50ms per injection -- **Memory**: <50MB steady state -- **CPU**: <5% during active injection - -## Edge Cases - -### Error Conditions -- Network clipboard services unavailable -- AT-SPI bus disconnected -- Permission changes during operation -- Window focus lost mid-injection -- System suspend/resume - -### Boundary Conditions -- Empty text injection -- Maximum text size (100KB+) -- Rate limit boundary (exact CPS limit) -- Focus timeout (exact timing) -- Memory pressure scenarios - -## Regression Testing - -### Version Compatibility -- Test across different Wayland compositors -- Verify with different AT-SPI versions -- Check external tool compatibility - -### Configuration Changes -- Hot-reload of allowlist/blocklist -- Runtime backend switching -- Dynamic rate limit adjustment - -## Reporting - -### Test Results Format -``` -Test: Backend Fallback Cascade -Status: PASS -Duration: 2.3s -Details: - - AT-SPI: PASS (45ms) - - Clipboard+Paste: PASS (67ms) - - Clipboard: PASS (23ms) - - YdoTool: PASS (89ms) -``` - -### Coverage Metrics -- **Functional**: >95% code coverage -- **Performance**: All benchmarks met -- **Security**: All privacy checks pass -- **Compatibility**: Works on target platforms \ No newline at end of file diff --git a/docs/text_injection_tui.md b/docs/text_injection_tui.md deleted file mode 100644 index d4fcaf44..00000000 --- a/docs/text_injection_tui.md +++ /dev/null @@ -1,221 +0,0 @@ -# Text Injection TUI Wiring Specification - -## Overview - -This document specifies the user interface elements and data sources for displaying text injection status in the ColdVox TUI dashboard. - -## Display Panels - -### Main Injection Status Panel - -#### Header Section -``` -Text Injection Status -━━━━━━━━━━━━━━━━━━━━━ -``` - -#### Core Metrics Row -``` -Backend: Wayland+AT-SPI Mode: auto Status: Active -``` - -#### Performance Metrics Row -``` -Buffer: 0 chars Last: 42 chars Latency: 45ms Success: 98.5% -``` - -#### Error/Status Indicators Row -``` -Errors: 2 Rate Limited: 0 Backend Denied: 0 Paused: No -``` - -### Detailed Metrics Panel - -#### Success/Failure Breakdown -``` -Success Rate: 98.5% (247/250) -├── Paste: 95.2% (120/126) -├── Keystroke: 99.1% (109/110) -└── AT-SPI: 97.8% (88/90) -``` - -#### Latency Histogram -``` -Latency Distribution (ms): -├── <50ms: ████████░░ 80% -├── 50-100ms: ████░░░░ 40% -├── 100-200ms: █░░░░░░░ 10% -└── >200ms: ░░░░░░░░ 0% -``` - -## Data Sources - -### Primary Data Source -```rust -// Shared metrics instance -let metrics: Arc> = Arc::new(Mutex::new(InjectionMetrics::default())); -``` - -### Backend Detection -```rust -// From BackendDetector -let current_backend = backend_detector.get_preferred_backend(); -let available_backends = backend_detector.detect_available_backends(); -``` - -### Configuration Values -```rust -// From InjectionConfig -let injection_mode = config.injection_mode; // "auto", "paste", "keystroke" -let pause_hotkey = config.pause_hotkey; // Optional hotkey display -``` - -## Field Specifications - -### Backend Field -- **Source**: `BackendDetector.get_preferred_backend()` -- **Format**: "Wayland+AT-SPI", "X11+Clipboard", "Windows+SendInput", etc. -- **Update**: On backend changes or detection failures -- **Fallback**: "Unknown" if detection fails - -### Mode Field -- **Source**: `InjectionConfig.injection_mode` -- **Format**: "auto", "paste", "keystroke" -- **Update**: On configuration changes -- **Default**: "auto" - -### Buffer Chars Field -- **Source**: `InjectionMetrics.chars_buffered` -- **Format**: "X chars" (e.g., "0 chars", "156 chars") -- **Update**: Real-time as text is processed -- **Reset**: On successful injection or flush - -### Last Flush Size Field -- **Source**: `InjectionMetrics.last_flush_size` -- **Format**: "X chars" (e.g., "42 chars") -- **Update**: After each successful injection -- **Default**: "0 chars" - -### Latency Field -- **Source**: Moving average of `InjectionMetrics.latency_samples` -- **Format**: "Xms" (e.g., "45ms", "127ms") -- **Update**: Calculated from recent samples -- **Default**: "0ms" - -### Success Rate Field -- **Source**: `InjectionMetrics.successes` / `InjectionMetrics.attempts` -- **Format**: "X.X%" (e.g., "98.5%") -- **Update**: After each injection attempt -- **Default**: "0.0%" - -### Error Counters -- **Source**: Various `InjectionMetrics` counters -- **Format**: Integer counts -- **Update**: On error occurrence -- **Fields**: - - `failures`: Total injection failures - - `rate_limited`: Times rate limit was hit - - `backend_denied`: Times backend was unavailable - -## Update Cadence - -### Real-time Updates (50-100ms) -- Buffer character count -- Current operation status -- Active injection progress - -### Standard Updates (200-500ms) -- Success/failure rates -- Latency calculations -- Backend status -- Error counters - -### Slow Updates (1-5 seconds) -- Moving averages -- Historical summaries -- Configuration changes - -## Pause Toggle Integration - -### Display Logic -```rust -if injection_paused { - "Paused: Yes (Hotkey: Ctrl+Shift+P)" -} else { - "Paused: No" -} -``` - -### Hotkey Display -- **Source**: `InjectionConfig.pause_hotkey` -- **Format**: Human-readable (e.g., "Ctrl+Shift+P") -- **Fallback**: Hidden if no hotkey configured - -## Example Layout - -``` -┌─ Text Injection ──────────────────────────────┐ -│ Backend: Wayland+AT-SPI Mode: auto │ -│ Buffer: 0 chars Last: 42 chars 45ms │ -│ Success: 98.5% Errors: 2 Paused: No │ -└──────────────────────────────────────────────┘ - -┌─ Injection Methods ──────────────────────────┐ -│ Paste: ████████░░ 95.2% (120/126) │ -│ Keystroke: ████████░ 99.1% (109/110) │ -│ AT-SPI: ███████░░ 97.8% (88/90) │ -└──────────────────────────────────────────────┘ - -┌─ Latency Distribution ──────────────────────┐ -│ <50ms: ████████░░ 80% │ -│ 50-100ms: ████░░░░ 40% │ -│ 100-200ms: █░░░░░░░ 10% │ -│ >200ms: ░░░░░░░░ 0% │ -└──────────────────────────────────────────────┘ -``` - -## Error Handling - -### Backend Unavailable -``` -Backend: UNAVAILABLE (Wayland+AT-SPI) -Mode: auto (degraded) -``` - -### High Error Rate -``` -Success: 45.2% ⚠️ Errors: 23 -``` - -### Rate Limited -``` -Rate Limited: 5 (last 30s) -``` - -## Configuration Integration - -### TUI-Specific Config -```toml -[tui] -# Text injection panel settings -text_injection_panel = true -text_injection_update_ms = 300 -text_injection_show_latency = true -text_injection_show_method_breakdown = true -``` - -### Metrics Collection -```rust -// Ensure metrics are collected -let metrics = Arc::new(Mutex::new(InjectionMetrics::new())); -strategy_manager.set_metrics(metrics.clone()); -processor.set_metrics(metrics); -``` - -## Implementation Notes - -- All data access must be thread-safe using `Arc>` -- UI updates should not block injection operations -- Redact sensitive information in error displays -- Handle division by zero in percentage calculations -- Provide graceful degradation when metrics unavailable \ No newline at end of file diff --git a/docs/vosk_implementation_gaps.md b/docs/vosk_implementation_gaps.md deleted file mode 100644 index 901b9d51..00000000 --- a/docs/vosk_implementation_gaps.md +++ /dev/null @@ -1,290 +0,0 @@ -# Vosk Implementation Gaps and Improvements - -## Current Status - -Vosk integration is functional behind the `vosk` feature flag and is disabled -by default unless a model is found. This document outlines gaps and recommended -improvements. - -## Priority 1: Critical Gaps - -### 1.1 Missing Unit Tests - -**Issue**: No unit tests for STT modules (vosk.rs, processor.rs) - -**Impact**: Cannot verify correctness or catch regressions - -**Solution**: - -```rust -// Add to crates/app/src/stt/vosk.rs -#[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_transcriber_creation() { - // Test with valid/invalid model paths - } - - #[test] - fn test_accept_frame() { - // Test audio processing - } - - #[test] - fn test_utterance_lifecycle() { - // Test utterance ID management - } -} -``` - -### 1.2 No STT Metrics in TUI Dashboard - -**Issue**: TUI dashboard doesn't display STT metrics - -**Impact**: Cannot monitor STT performance in real-time - -**Solution**: Add STT panel to tui_dashboard.rs showing: - -- Transcription rate (words/minute) -- Partial/final counts -- Error rate -- Processing latency -- Current utterance state - -### 1.3 No Error Recovery Mechanism - -**Issue**: STT processor lacks automatic recovery from failures - -**Impact**: System continues with broken STT instead of recovering - -**Solution**: Implement watchdog pattern similar to AudioCapture: - -- Monitor consecutive error count -- Automatic model reload on failure -- Exponential backoff retry -- Health check mechanism - -## Priority 2: Operational Features - -### 2.1 Transcription Persistence - -**Issue**: Transcriptions only logged, not stored - -**Impact**: Cannot review historical transcriptions - -**Solution**: - -```rust -// Add transcription storage -pub struct TranscriptionStore { - output_dir: PathBuf, - format: OutputFormat, // JSON, CSV, SQLite - rotation: RotationPolicy, -} -``` - -### 2.2 Runtime Configuration Updates - -**Issue**: Must restart to change STT settings - -**Impact**: Poor operational flexibility - -**Solution**: - -- Add config reload mechanism -- Support model hot-swapping -- Dynamic enable/disable STT - -### 2.3 Performance Benchmarking - -**Issue**: No performance benchmarks - -**Impact**: Unknown processing overhead and latency - -**Solution**: - -- Add criterion benchmarks -- Measure transcription latency -- Test with different model sizes -- Profile memory usage - -## Priority 3: Enhanced Features - -### 3.1 Multi-Model Support - -**Issue**: Only one model at a time - -**Impact**: Cannot optimize for different use cases - -**Solution**: - -- Support model switching based on context -- Parallel processing with multiple models -- Quality vs speed trade-offs - -### 3.2 Integration Tests - -**Issue**: No end-to-end tests with real audio - -**Impact**: Cannot verify full pipeline - -**Solution**: - -- Test with WAV files -- Verify VAD→STT integration -- Test error scenarios - -### 3.3 Production Deployment - -**Issue**: No production deployment documentation - -**Impact**: Unclear how to deploy in production - -**Solution**: Document: - -- Systemd service configuration -- Resource requirements -- Monitoring setup -- Log management -- API endpoints - -## Implementation Plan - -### Phase 1: Testing & Monitoring (1 week) - -1. Add unit tests for STT modules -2. Integrate STT metrics into TUI dashboard -3. Add integration test with WAV files - -### Phase 2: Reliability (1 week) - -1. Implement error recovery mechanism -2. Add health checks -3. Add performance benchmarks - -### Phase 3: Operations (2 weeks) - -1. Add transcription persistence -2. Implement runtime config updates -3. Create production deployment guide - -### Phase 4: Enhancements (Optional) - -1. Multi-model support -2. Advanced features (speaker diarization, etc.) -3. API endpoints for external access - -## Testing Requirements - -### Unit Tests Needed - -- [ ] VoskTranscriber creation with valid/invalid models -- [ ] Audio frame processing -- [ ] Utterance lifecycle management -- [ ] Configuration updates -- [ ] Error handling paths - -### Integration Tests Needed - -- [ ] VAD → STT event flow -- [ ] End-to-end with test audio files -- [ ] Concurrent processing -- [ ] Memory leak tests -- [ ] Performance under load - -### Example Test Audio Files - -```bash -# Download test audio files -mkdir -p test_data -cd test_data - -# LibriSpeech samples (16kHz, mono) -wget http://www.openslr.org/resources/12/test-clean.tar.gz -tar -xzf test-clean.tar.gz - -# Or create custom test files -sox input.wav -r 16000 -c 1 test_16khz_mono.wav -``` - -## Configuration Enhancements - -### Proposed Config Structure - -```toml -[stt] -enabled = true -engine = "vosk" # Future: whisper, deepgram, etc. - -[stt.vosk] -model_path = "models/vosk-model-small-en-us-0.15" -partial_results = true -max_alternatives = 1 -include_words = false -buffer_size_ms = 512 - -[stt.vosk.fallback] -enabled = true -model_path = "models/vosk-model-tiny-en-us-0.15" - -[stt.output] -format = "json" # json, csv, sqlite -directory = "transcriptions" -rotation = "daily" -keep_days = 30 -``` - -## Monitoring & Observability - -### Metrics to Track - -- Latency: Time from audio received to transcription emitted -- Throughput: Words per minute transcribed -- Accuracy: (Requires reference transcripts) -- Resource Usage: CPU, memory per model -- Error Rate: Failed transcriptions per hour - -### Logging Improvements - -```rust -// Add structured logging fields -tracing::info!( - target: "stt", - utterance_id = %id, - duration_ms = duration.as_millis(), - word_count = words.len(), - model = config.model_path, - "Transcription completed" -); -``` - -## API Design (Future) - -### REST Endpoints - -```text -GET /api/v1/stt/status # STT system status -GET /api/v1/stt/metrics # Current metrics -GET /api/v1/stt/transcriptions # Recent transcriptions -POST /api/v1/stt/config # Update configuration -``` - -### WebSocket Stream - -```text -WS /api/v1/stt/stream # Real-time transcription stream -``` - -## Summary - -The Vosk implementation is functional but needs: - -1. Testing: Unit and integration tests -2. Monitoring: TUI dashboard integration -3. Reliability: Error recovery mechanism -4. Operations: Persistence and config management -5. Documentation: Production deployment guide - -These improvements will make the STT system production-ready and maintainable. diff --git a/examples/inject_demo.rs b/examples/inject_demo.rs index dabd77c0..68c150a3 100644 --- a/examples/inject_demo.rs +++ b/examples/inject_demo.rs @@ -55,10 +55,15 @@ async fn run_processor_demo() -> Result<(), Box> { }; // Create shared injection metrics and injection processor - let injection_metrics = Arc::new(Mutex::new(coldvox_app::text_injection::types::InjectionMetrics::default())); + let injection_metrics = Arc::new(Mutex::new( + coldvox_app::text_injection::types::InjectionMetrics::default(), + )); let mut processor = InjectionProcessor::new(config.clone(), None, injection_metrics.clone()); - info!("Processor created. Current state: {:?}", processor.session_state()); + info!( + "Processor created. Current state: {:?}", + processor.session_state() + ); // Simulate receiving transcriptions let test_transcriptions = vec![ @@ -98,7 +103,7 @@ async fn run_processor_demo() -> Result<(), Box> { // In a real scenario, this would be handled by the async processor // For demo purposes, we'll create a temporary strategy manager let config_clone = config.clone(); - let mut temp_manager = StrategyManager::new(config_clone, injection_metrics.clone()); + let mut temp_manager = StrategyManager::new(config_clone, injection_metrics.clone()); match temp_manager.inject(&text).await { Ok(()) => { info!("✅ Injection successful!"); @@ -142,7 +147,9 @@ async fn run_direct_injection_demo() -> Result<(), Box> { }; // Create shared injection metrics and strategy manager - let injection_metrics = Arc::new(Mutex::new(coldvox_app::text_injection::types::InjectionMetrics::default())); + let injection_metrics = Arc::new(Mutex::new( + coldvox_app::text_injection::types::InjectionMetrics::default(), + )); let mut manager = StrategyManager::new(config, injection_metrics); info!("StrategyManager created"); @@ -176,4 +183,4 @@ async fn run_direct_injection_demo() -> Result<(), Box> { manager.print_stats(); Ok(()) -} \ No newline at end of file +} diff --git a/examples/record_10s.rs b/examples/record_10s.rs index c7cc9747..6cba1348 100644 --- a/examples/record_10s.rs +++ b/examples/record_10s.rs @@ -1,25 +1,26 @@ -use std::sync::{Arc, Mutex}; -use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH}; use cpal::traits::{DeviceTrait, HostTrait, StreamTrait}; use hound::{WavSpec, WavWriter}; +use std::sync::{Arc, Mutex}; +use std::time::{Duration, Instant, SystemTime, UNIX_EPOCH}; fn main() -> Result<(), Box> { println!("🎤 Starting 10-second recording at 16kHz..."); - + // Set up audio host and device let host = cpal::default_host(); - let device = host.default_input_device() + let device = host + .default_input_device() .ok_or("No input device available")?; - + println!("Using device: {}", device.name()?); - + // Configure for 16kHz mono let config = cpal::StreamConfig { channels: 1, sample_rate: cpal::SampleRate(16000), buffer_size: cpal::BufferSize::Default, }; - + // Prepare WAV writer let spec = WavSpec { channels: 1, @@ -27,22 +28,20 @@ fn main() -> Result<(), Box> { bits_per_sample: 16, sample_format: hound::SampleFormat::Int, }; - + // Generate timestamp for unique filename - let timestamp = SystemTime::now() - .duration_since(UNIX_EPOCH)? - .as_secs(); + let timestamp = SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs(); let output_path = format!("recording_16khz_10s_{}.wav", timestamp); let writer = WavWriter::create(&output_path, spec)?; let writer = Arc::new(Mutex::new(Some(writer))); let writer_clone = Arc::clone(&writer); - + // Track recording duration let start_time = Instant::now(); let duration = Duration::from_secs(10); let recording_done = Arc::new(Mutex::new(false)); let recording_done_clone = Arc::clone(&recording_done); - + // Create stream let stream = device.build_input_stream( &config, @@ -52,7 +51,7 @@ fn main() -> Result<(), Box> { *recording_done_clone.lock().unwrap() = true; return; } - + // Write samples to WAV file if let Ok(mut guard) = writer_clone.lock() { if let Some(ref mut writer) = *guard { @@ -67,11 +66,11 @@ fn main() -> Result<(), Box> { }, None, )?; - + // Start recording stream.play()?; println!("Recording for 10 seconds..."); - + // Show progress let mut last_second = 0; while !*recording_done.lock().unwrap() { @@ -85,18 +84,18 @@ fn main() -> Result<(), Box> { std::thread::sleep(Duration::from_millis(100)); } println!("\r10/10 seconds - Done!"); - + // Stop and finalize drop(stream); - + // Finalize WAV file if let Ok(mut guard) = writer.lock() { if let Some(writer) = guard.take() { writer.finalize()?; } } - + println!("✅ Recording saved to: {}", output_path); - + Ok(()) -} \ No newline at end of file +} diff --git a/examples/vad_demo.rs b/examples/vad_demo.rs deleted file mode 100644 index 0dd6fb08..00000000 --- a/examples/vad_demo.rs +++ /dev/null @@ -1,186 +0,0 @@ -use coldvox_app::audio::{AudioFrame, VadProcessor}; -use coldvox_app::vad::{UnifiedVadConfig, VadMode, VadEvent}; -use dasp::interpolate::sinc::Sinc; -use dasp::{ring_buffer, signal, Signal}; -use hound::WavReader; -use tokio::sync::{broadcast, mpsc}; -use tokio::time::{sleep, Duration, Instant}; -use tracing::{error, info}; - -#[tokio::main] -async fn main() -> Result<(), Box> { - tracing_subscriber::fmt().with_env_filter("info").init(); - - info!("Starting VAD demo from WAV file"); - - let (audio_tx, _) = broadcast::channel::(100); - let audio_rx = audio_tx.subscribe(); - let (event_tx, mut event_rx) = mpsc::channel::(100); - - let mut vad_config = UnifiedVadConfig::default(); - - let args: Vec = std::env::args().collect(); - let mode = args.get(1).map(|s| s.as_str()).unwrap_or("silero"); - - vad_config.mode = match mode { - "silero" => { - info!("Using Silero VAD engine"); - if let Some(threshold_str) = args.get(2) { - if let Ok(threshold) = threshold_str.parse::() { - info!("Setting Silero threshold to: {}", threshold); - vad_config.silero.threshold = threshold; - } - } - VadMode::Silero - } - "level3" => { - info!("Using Level3 VAD engine (enabling it for this demo)"); - vad_config.level3.enabled = true; // Enable Level3 for testing - VadMode::Level3 - } - _ => { - info!("Unknown mode '{}'", mode); - VadMode::Silero - } - }; - - let vad_handle = VadProcessor::spawn(vad_config.clone(), audio_rx, event_tx, None) - .expect("failed to spawn VAD"); - - // generator task: feed WAV into broadcast - let gen_tx = audio_tx.clone(); - let audio_file_path = std::env::var("VAD_TEST_FILE") - .unwrap_or_else(|_| "crates/app/test_audio_16k.wav".to_string()); - let frame_size = vad_config.frame_size_samples; - let generator = tokio::spawn(async move { - if let Err(e) = generate_audio_from_wav(gen_tx, frame_size, &audio_file_path).await { - error!("Audio generator failed: {}", e); - } - }); - - // event printer - let event_printer = tokio::spawn(async move { - handle_vad_events(&mut event_rx).await; - }); - - generator.await.ok(); - sleep(Duration::from_secs(2)).await; - vad_handle.abort(); - event_printer.abort(); - info!("Demo completed"); - Ok(()) -} - -async fn generate_audio_from_wav( - tx: broadcast::Sender, - frame_size: usize, - file_path: &str, -) -> Result<(), String> { - info!("Reading audio from: {}", file_path); - let mut reader = WavReader::open(file_path).map_err(|e| format!("Failed to open WAV file: {}", e))?; - let spec = reader.spec(); - info!("WAV spec: {:?}, duration: {}ms", spec, reader.duration() as f32 / spec.sample_rate as f32 * 1000.0); - - let samples_f32: Vec = reader.samples::().map(|s| s.unwrap() as f32 / i16::MAX as f32).collect(); - - let source_signal = signal::from_iter(samples_f32.into_iter().map(|s| [s])); - - let sinc = Sinc::new(ring_buffer::Fixed::from(vec![[0.0]; 128])); - - let original_rate = spec.sample_rate as f64; - let new_rate = 16000.0; - let mut converter = signal::interpolate::Converter::from_hz_to_hz( - source_signal, - sinc, - original_rate, - new_rate, - ); - - let mut timestamp_ms = 0u64; - let frame_duration_ms = (frame_size as f32 * 1000.0 / 16000.0) as u64; - - while !converter.is_exhausted() { - - let mut frame_f32 = Vec::with_capacity(frame_size); - for _ in 0..frame_size { - let frame = converter.next(); - frame_f32.push(frame[0]); - } - - let frame_i16: Vec = frame_f32.iter().map(|&s| (s * i16::MAX as f32) as i16).collect(); - - let mut frame = frame_i16; - if frame.len() < frame_size { - frame.resize(frame_size, 0); - } - - let audio_frame = AudioFrame { - data: frame, - timestamp_ms, - }; - - let _ = tx.send(audio_frame); - - timestamp_ms += frame_duration_ms; - sleep(Duration::from_millis(frame_duration_ms)).await; - } - - info!("Audio generator stopped"); - Ok(()) -} -async fn handle_vad_events(rx: &mut mpsc::Receiver) { - let start = Instant::now(); - let mut speech_segments = 0u64; - let mut total_speech_ms = 0u64; - - while let Some(event) = rx.recv().await { - match event { - VadEvent::SpeechStart { timestamp_ms, energy_db } => { - speech_segments += 1; - info!( - "[{:6.2}s] Speech START - Energy: {:.2} dB", - timestamp_ms as f32 / 1000.0, - energy_db - ); - } - VadEvent::SpeechEnd { timestamp_ms, duration_ms, energy_db } => { - total_speech_ms += duration_ms; - info!( - "[{:6.2}s] Speech END - Duration: {} ms, Energy: {:.2} dB", - timestamp_ms as f32 / 1000.0, - duration_ms, - energy_db - ); - } - } - } - - let elapsed = start.elapsed(); - info!( - "Event handler stopped. Total: {} speech segments, {:.2}s of speech in {:.2}s", - speech_segments, - total_speech_ms as f32 / 1000.0, - elapsed.as_secs_f32() - ); -} - -fn simple_random() -> f32 { - use std::cell::Cell; - use std::num::Wrapping; - - thread_local! { - static SEED: Cell> = Cell::new(Wrapping( - std::time::SystemTime::now() - .duration_since(std::time::UNIX_EPOCH) - .unwrap() - .as_secs() as u32 - )); - } - - SEED.with(|seed| { - let mut s = seed.get(); - s = s * Wrapping(1103515245) + Wrapping(12345); - seed.set(s); - (s.0 >> 16) as f32 / 65536.0 - }) -} diff --git a/examples/vosk_test.rs b/examples/vosk_test.rs index bfc06293..85ef992c 100644 --- a/examples/vosk_test.rs +++ b/examples/vosk_test.rs @@ -1,12 +1,13 @@ -#[cfg(feature = "vosk")] -use coldvox_app::stt::{Transcriber, VoskTranscriber, TranscriptionConfig, TranscriptionEvent}; use std::path::Path; +#[cfg(feature = "vosk")] +use coldvox_app::stt::{Transcriber, TranscriptionConfig, TranscriptionEvent, VoskTranscriber}; + #[cfg(feature = "vosk")] fn main() -> Result<(), Box> { // Test with a small Vosk model (download required) let model_path = "models/vosk-model-small-en-us-0.15"; - + if !Path::new(model_path).exists() { eprintln!("Vosk model not found at: {}", model_path); eprintln!("Download a model from https://alphacephei.com/vosk/models"); @@ -17,9 +18,9 @@ fn main() -> Result<(), Box> { eprintln!(" mv vosk-model-small-en-us-0.15 models/"); return Ok(()); } - + println!("Loading Vosk model from: {}", model_path); - + // Create configuration let config = TranscriptionConfig { enabled: true, @@ -29,59 +30,78 @@ fn main() -> Result<(), Box> { include_words: true, buffer_size_ms: 512, }; - + // Create transcriber with configuration let mut transcriber = VoskTranscriber::new(config.clone(), 16000.0)?; - + println!("Vosk configuration:"); println!(" Partial results: {}", config.partial_results); println!(" Max alternatives: {}", config.max_alternatives); println!(" Include words: {}", config.include_words); - + // Generate test audio: sine wave representing speech-like patterns let sample_rate = 16000; let duration_ms = 1000; // 1 second let samples_count = (sample_rate * duration_ms) / 1000; - + let mut test_audio = Vec::with_capacity(samples_count); for i in 0..samples_count { let t = i as f32 / sample_rate as f32; // Mix of frequencies to simulate speech - let sample = ( - 0.3 * (2.0 * std::f32::consts::PI * 440.0 * t).sin() + - 0.2 * (2.0 * std::f32::consts::PI * 880.0 * t).sin() + - 0.1 * (2.0 * std::f32::consts::PI * 1320.0 * t).sin() - ) * 16384.0; // Scale to i16 range - + let sample = (0.3 * (2.0 * std::f32::consts::PI * 440.0 * t).sin() + + 0.2 * (2.0 * std::f32::consts::PI * 880.0 * t).sin() + + 0.1 * (2.0 * std::f32::consts::PI * 1320.0 * t).sin()) + * 16384.0; // Scale to i16 range + test_audio.push(sample as i16); } - - println!("\nProcessing {} samples of synthetic audio...", test_audio.len()); - + + println!( + "\nProcessing {} samples of synthetic audio...", + test_audio.len() + ); + // Process audio in chunks (512 samples = 32ms at 16kHz) let chunk_size = 512; let mut partial_count = 0; let mut result_count = 0; let mut error_count = 0; - + for (chunk_idx, chunk) in test_audio.chunks(chunk_size).enumerate() { // Use EventBasedTranscriber interface directly match coldvox_stt::EventBasedTranscriber::accept_frame(&mut transcriber, chunk)? { - Some(TranscriptionEvent::Partial { utterance_id, text, t0, t1 }) => { + Some(TranscriptionEvent::Partial { + utterance_id, + text, + t0, + t1, + }) => { partial_count += 1; - println!("Chunk {}: Partial result (utterance {}): \"{}\"", chunk_idx, utterance_id, text); + println!( + "Chunk {}: Partial result (utterance {}): \"{}\"", + chunk_idx, utterance_id, text + ); if t0.is_some() || t1.is_some() { println!(" Timing: {:?} - {:?}", t0, t1); } } - Some(TranscriptionEvent::Final { utterance_id, text, words }) => { + Some(TranscriptionEvent::Final { + utterance_id, + text, + words, + }) => { result_count += 1; - println!("Chunk {}: Final result (utterance {}): \"{}\"", chunk_idx, utterance_id, text); + println!( + "Chunk {}: Final result (utterance {}): \"{}\"", + chunk_idx, utterance_id, text + ); if let Some(words) = words { println!(" Words ({}): ", words.len()); for word in words.iter().take(5) { - println!(" \"{}\" @ {:.2}s-{:.2}s (conf: {:.2})", - word.text, word.start, word.end, word.conf); + println!( + " \"{}\" @ {:.2}s-{:.2}s (conf: {:.2})", + word.text, word.start, word.end, word.conf + ); } if words.len() > 5 { println!(" ... and {} more", words.len() - 5); @@ -97,12 +117,19 @@ fn main() -> Result<(), Box> { } } } - + // Get final result println!("\nFinalizing utterance..."); match coldvox_stt::EventBasedTranscriber::finalize_utterance(&mut transcriber)? { - Some(TranscriptionEvent::Final { utterance_id, text, words }) => { - println!("Final transcription (utterance {}): \"{}\"", utterance_id, text); + Some(TranscriptionEvent::Final { + utterance_id, + text, + words, + }) => { + println!( + "Final transcription (utterance {}): \"{}\"", + utterance_id, text + ); if let Some(words) = words { println!("Total words: {}", words.len()); } @@ -117,30 +144,30 @@ fn main() -> Result<(), Box> { println!("No final transcription (synthetic audio not recognized as speech)"); } } - + println!("\nTest completed:"); println!(" Partial results: {}", partial_count); println!(" Final results: {}", result_count); println!(" Errors: {}", error_count); println!("\nNote: Synthetic audio may not produce meaningful transcriptions."); println!("For real testing, use actual speech audio or WAV files."); - + // Test backward compatibility with Transcriber trait println!("\n--- Testing backward compatibility ---"); let mut simple_transcriber = VoskTranscriber::new_with_default(model_path, 16000.0)?; - + // Test with smaller chunk let test_chunk = &test_audio[0..512]; match simple_transcriber.accept_pcm16(test_chunk)? { Some(text) => println!("Transcriber trait result: \"{}\"", text), None => println!("Transcriber trait: No result"), } - + match simple_transcriber.finalize()? { Some(text) => println!("Transcriber trait final: \"{}\"", text), None => println!("Transcriber trait: No final result"), } - + Ok(()) } diff --git a/release-plz.toml b/release-plz.toml new file mode 100644 index 00000000..c16e14da --- /dev/null +++ b/release-plz.toml @@ -0,0 +1,50 @@ +[workspace] +allow-dirty = false +changelog-include = ["crates/*"] +pr-labels = ["release"] + +[git] +tag-prefix = "v" +release-commit-message = "chore: release {{package_name}} v{{version}}" + +# Configure all workspace members +[[package]] +name = "app" +publish = false +changelog-path = "crates/app/CHANGELOG.md" + +[[package]] +name = "coldvox-foundation" +publish = false + +[[package]] +name = "coldvox-audio" +publish = false + +[[package]] +name = "coldvox-vad" +publish = false + +[[package]] +name = "coldvox-vad-silero" +publish = false + +[[package]] +name = "coldvox-stt" +publish = false + +[[package]] +name = "coldvox-stt-vosk" +publish = false + +[[package]] +name = "coldvox-text-injection" +publish = false + +[[package]] +name = "coldvox-telemetry" +publish = false + +[[package]] +name = "coldvox-gui" +publish = false \ No newline at end of file