Skip to content

fix: ChromaDB provisioner reliability + single-instance#98

Merged
itstauq merged 1 commit intomainfrom
fix/chromadb-reliability-single-instance
Mar 4, 2026
Merged

fix: ChromaDB provisioner reliability + single-instance#98
itstauq merged 1 commit intomainfrom
fix/chromadb-reliability-single-instance

Conversation

@itstauq
Copy link
Member

@itstauq itstauq commented Mar 4, 2026

Summary

Production debugging revealed ChromaDB's internal SQLite (chroma.sqlite3) was corrupted (0 bytes) by process-wick SIGKILL'ing the process group mid-write, causing cascading failures on subsequent boots.

  • ChromaDB startup reliability: Kill orphaned process on health check timeout, detect early subprocess exit (fail fast), post-health liveness guard against wrong process on same port
  • Graceful shutdown: Extended from 2s fixed sleep to 10s polling loop to reduce SIGKILL corruption risk
  • 0-byte DB recovery: Delete corrupted chroma.sqlite3 on startup so ChromaDB recreates it fresh
  • Ingestion error annotation: Prefix errors with [dataset_type] for easier debugging
  • Single-instance: Add tauri-plugin-single-instance to prevent multiple app instances; second launch focuses existing window
  • Window management: Hide from taskbar when minimized to tray, restore on show; log window operation errors instead of silently ignoring

Test plan

  • Block port 8100, start app — verify ChromaDB process is killed after timeout, not left orphaned
  • Create 0-byte ~/.syft-space/chromadb/chroma.sqlite3, start app — verify it's deleted and ChromaDB starts fresh
  • Trigger ingestion error — verify error message shows [local_file] Ingestion error: ... prefix
  • Launch app, launch again — verify second launch focuses existing window (including if minimized)
  • Close window, verify it disappears from taskbar; reopen via tray/dock, verify it reappears

…nagement

- Kill orphaned ChromaDB process on health check timeout instead of leaving it running
- Detect early subprocess exit during startup (fail fast instead of waiting 60s)
- Post-health liveness check to guard against wrong process on same port
- Extend graceful shutdown from 2s to 10s polling to reduce SIGKILL corruption risk
- Delete 0-byte chroma.sqlite3 on startup to recover from SIGKILL-corrupted DB
- Annotate ingestion errors with dataset type prefix for easier debugging
- Add tauri-plugin-single-instance to prevent multiple app instances
- Hide app from taskbar when minimized to tray, restore on show
- Log window operation errors instead of silently ignoring them
@itstauq itstauq merged commit 8898e6d into main Mar 4, 2026
2 checks passed
@itstauq itstauq deleted the fix/chromadb-reliability-single-instance branch March 4, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant