This project is an open-source, modular system designed to continuously capture audio, transcribe it using a local speech-to-text engine, store the transcripts, and provide a web interface for browsing and future AI-driven analysis.
The system is 3 different components that work together:
graph TD
A[Microphone] --> LA[LiveKit Agent]
LA -- Audio Stream --> VB[VoxBox Transcription API]
VB -- Transcribed Text --> LA
LA -- Stores Transcript --> DB[SQLite Database]
FS[Flask Web Server] -- Reads Transcripts --> DB
FS -- Serves UI --> UserBrowser[User's Browser]
FS -- Summarization --> OLLAMA[Ollama]
subgraph Audio_Input_Processing
A
LA
VB
end
subgraph Data_Storage_Access
DB
FS
end
subgraph User_Interface
UserBrowser
end
subgraph AI_Analysis
OLLAMA
end
style LA fill:#f9f,stroke:#333,stroke-width:2px
style VB fill:#ccf,stroke:#333,stroke-width:2px
style DB fill:#cfc,stroke:#333,stroke-width:2px
style FS fill:#ffc,stroke:#333,stroke-width:2px
style OLLAMA fill:#fcc,stroke:#333,stroke-width:2px
- File:
agent.py - Functionality:
- Captures live audio from the environment using a LiveKit Agent.
- Utilizes Silero VAD for voice activity detection.
- Streams audio to the local VoxBox service for transcription.
- Receives transcribed text from VoxBox.
- Saves finalized transcripts to the SQLite database.
- Manages session information in the database to keep transcripts separate.
- Directory:
vox-box/ - Functionality:
- A separate Python service (installed via
pip install vox-box). - Hosts a local instance of the
Systran/faster-whisper-smallmodel (via Hugging Face). - Exposes an OpenAI-compatible API endpoint (typically
http://localhost:5002/v1) that the LiveKit Agent sends audio to. - Returns transcribed text to the agent.
- Requires its own virtual environment due to dependency conflicts.
- A separate Python service (installed via
- Default File:
voice_transcripts.db - Schema & ORM:
app/database/models.py(SQLAlchemy models:User,Session,Transcript) - Functionality:
- Stores all user information, recording sessions, and transcription results.
- Uses UUIDs as primary keys for all tables.
- Includes support for metadata such as confidence scores, timestamps, language, and custom JSON fields.
- Managed via SQLAlchemy ORM, with helper functions in
app/database/helpers.pyandapp/database/__init__.py.
- File:
app/web_app.py(run viarun_webapp.pyor directly) - Functionality:
- Provides a web interface (HTML templates in
app/templates/) to:- Browse all saved recording sessions (
/). - View detailed transcripts for a specific session (
/session/<session_id>). - Copy/paste transcript text.
- Browse all saved recording sessions (
- Includes an analysis page (
/analyze/<session_id>) that:- Retrieves the combined transcript text for a session.
- Checks if a summary already exists for the session in the
Session.summarydatabase field. - If a cached summary exists, it's displayed directly.
- If no cached summary is found, it uses a local Ollama instance (via the
openailibrary) to generate a new summary. This new summary is then saved to theSession.summaryfield in the database for future requests and displayed.
- Exposes an API endpoint (
/api/transcripts/<session_id>) to fetch transcripts for a session in JSON format.
- Provides a web interface (HTML templates in
- Service: Uses a locally running Ollama instance.
- Connection: The Flask Web Server connects to Ollama using the
openaiPython client.- Default Ollama API Base URL:
http://localhost:11434/v1(configurable viaOLLAMA_BASE_URL). - API Key: Required by the
openaiclient, can be any string for local Ollama (configurable viaOLLAMA_API_KEY, defaults to "ollama").
- Default Ollama API Base URL:
- Functionality:
- Provides text summarization for transcripts on the
/analyzepage. - Caching: Generated summaries are cached in the
Session.summaryfield in the database. If a summary for a session is requested, the cached version is used if available; otherwise, a new summary is generated by Ollama and then stored for subsequent requests. This reduces redundant LLM processing. - Default Model:
gemma3:4b(configurable viaOLLAMA_MODEL). - The system is designed to be extensible for other AI-driven insights in the future.
- Provides text summarization for transcripts on the
scripts/manage_db.py:- A command-line utility for database operations:
create: Initializes the database schema (usesapp/database/create_tables.py).reset: Drops all existing tables and recreates the schema.seed: Populates the database with sample data for testing.
- A command-line utility for database operations:
app/database/create_tables.py:- Contains the core logic to create database tables based on SQLAlchemy models.
- Environment Configuration:
- Uses a
.envfile (see.env.example) for settings likeDATABASE_URL,OLLAMA_BASE_URL,OLLAMA_API_KEY, andOLLAMA_MODEL.
- Uses a
- Python 3.10+
- SQLite (comes with Python)
- Ollama installed and running (see Ollama website). Ensure the desired model (e.g.,
gemma3:4b) is pulled:ollama pull gemma3:4b. - Separate virtual environments for the main application and VoxBox are recommended.
# Clone the repository (if you haven't already)
# git clone <repository-url>
# cd <repository-directory>
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies (includes 'openai' for Ollama)
pip install -r requirements.txt
# Copy and configure environment variables
cp .env.example .env
# Edit .env to set DATABASE_URL (default is sqlite:///voice_transcripts.db)
# and Ollama settings if needed:
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_API_KEY=ollama
# OLLAMA_MODEL=gemma3:4b
# Create database tables
python scripts/manage_db.py create
# Optionally seed the database
python scripts/manage_db.py seedcd vox-box
# Create and activate a separate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install VoxBox
pip install -r requirements.txt # This installs the 'vox-box' package
cd .. # Return to project root-
Start Ollama: Ensure your Ollama service is running. If you installed Ollama as a desktop application, it might start automatically. Otherwise, you may need to start it via the command line:
ollama serve
Ensure the model you intend to use (e.g.,
gemma3:4b) is available:ollama list. If not, pull it:ollama pull gemma3:4b. Keep this service running. -
Start VoxBox: Open a new terminal, navigate to the
vox-boxdirectory, activate its virtual environment, and run:# In vox-box/ directory, with vox-box venv active vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 5002Keep this service running.
-
Start the LiveKit Agent: In another terminal (at the project root, with the main application's venv active):
python agent.py console
-
Start the Flask Web Server: In yet another terminal (at the project root, with the main application's venv active):
python run_webapp.py
Or for development mode:
flask --app app.web_app run --debug --port 5000
Access the UI at
http://localhost:5000.
agent.py: Core logic for audio capture and initial transcript handling.vox-box/: Contains setup for the local transcription server.app/database/: SQLAlchemy models, database initialization, and helper functions.app/web_app.py: Flask application, routes, and UI logic.app/templates/: HTML templates for the web interface.scripts/: Database management scripts.