Smart Docs

Smart Docs is a tool to extract requirements from audio files.

Configuration

The backend requires several environment variables to be set for proper operation. These variables are defined in apps/api/.env.example. When running with Docker, you should copy this file to apps/api/.env and customize the values as needed.

API Environment Variables

These variables are validated in apps/api/src/shared/config/envs.ts.

App Configuration (`loadAppEnvs`)

Variable	Description	Default
`NODE_ENV`	Node.js environment.	`dev`
`PORT`	Port for the API server.	`8080`
`CLIENT_URL`	URL of the frontend application.	`http://localhost:3000`

Database & Queue (`loadDbEnvs`)

Variable	Description	Required
`DATABASE_URL`	URL for your PostgreSQL database.	Yes
`RABBITMQ_URL`	URL for your RabbitMQ instance.	Yes

Services (`loadServicesEnvs`)

Variable	Description	Default
`OLLAMA_API_URL`	URL for your local Ollama API server.	`http://localhost:11434`

Workers Configuration

Transcription Worker (`loadTranscriptionEnvs`)

Variable	Description	Required
`TRANSCRIPTION_MODEL`	Model used for audio transcription (e.g., 'tiny').	Yes
`TRANSCRIPTION_LANGUAGE`	Language used for transcription (e.g., 'en', 'pt').	Yes

Analyst Worker (`loadAnalyticsEnvs`)

Variable	Description	Required
`ANALYTICS_MODEL`	Ollama model used for analysis (e.g., 'llama3').	Yes

Gatekeeper Worker (`loadGatekeeperEnvs`)

Variable	Description	Default/Required
`GATEKEEPER_TRANSCRIPTION_MODEL`	Model used for fast transcription by the Gatekeeper.	Yes
`GATEKEEPER_ANALYTICS_MODEL`	Model used for context validation by the Gatekeeper.	Yes
`TRANSCRIPTION_LANGUAGE`	Language for transcription.	Yes
`MAX_RETRIES`	Maximum number of times to sample the audio.	`3`
`SAMPLE_DURATION`	Duration (in seconds) of each audio sample.	`30`

AI Configuration (`loadAiEnvs`)

These variables allow you to switch between different AI providers. If no provider is specified and no API keys are provided, the system defaults to Ollama (local).

Variable	Description	Default
`AI_PROVIDER`	Preferred provider (`gemini`, `openai`, `anthropic`, `openrouter`, `ollama`).	`ollama`
`GEMINI_API_KEY`	API Key for Google Gemini.	Optional
`OPENAI_API_KEY`	API Key for OpenAI (GPT).	Optional
`ANTHROPIC_API_KEY`	API Key for Anthropic (Claude).	Optional
`OPENROUTER_API_KEY`	API Key for OpenRouter.	Optional
`AI_MODEL`	Specific model to use (e.g., `gpt-4o`, `google/gemini-flash-1.5`).	Provider Default

Overview

The core philosophy of SmartDocs is "local-first," but it offers the flexibility to use powerful cloud-based AI models when needed. Your data can be processed entirely on your own hardware using Ollama, or you can provide your own API keys for professional-grade analysis via Gemini, OpenAI, Anthropic, or OpenRouter.

Features

Unified AI Integration: Powered by the Vercel AI SDK, providing a robust and extensible interface for multiple LLM providers.
Flexible AI Providers: Choose between local-first processing (Ollama) or high-performance cloud models (Gemini, OpenAI, Anthropic, OpenRouter).
Hybrid Transcription: Support for both local transcription (via Whisper/nodejs-whisper) and high-accuracy cloud transcription (via OpenAI Whisper API).
User-Provided API Keys: Users can enter their own AI API keys directly in the web interface for custom processing.
Local Fallback: If no API keys are provided, the system seamlessly falls back to your local Ollama instance (using the OpenAI-compatible bridge).
Event-Driven Architecture: Built on a robust, scalable architecture using RabbitMQ for asynchronous job processing.
AI-Powered Filtering: A "Gatekeeper" worker uses a lightweight LLM to quickly discard irrelevant audio (e.g., music, noise).
Multilingual Transcription: Support for multiple languages with automated cleaning of timestamps and noise.
Intelligent Analysis: Generates professional Software Requirements Specification (SRS) documents using the provider of your choice.
Markdown Output: Generates well-structured, readable Markdown documents instead of raw JSON.
Interactive Editor: View and edit the generated requirements directly in the browser.
Processing Cache: Avoids re-processing by caching results based on the audio file's hash.

Architecture

The system is a TypeScript monorepo managed by Turborepo. The backend is built with Bun and ElysiaJS, communicating with a series of background workers via RabbitMQ. AI interactions are abstracted through the Vercel AI SDK, ensuring consistency and resilience across different providers.

API (apps/api): The main entry point. It receives an audio file and optional AI configurations (provider/key/model), generates a hash, and places a new job in the q.audio.new queue.
Gatekeeper Worker: Consumes from q.audio.new. It validates the audio for speech content using the selected AI provider.
Transcriber Worker: Consumes from q.audio.transcribe. It performs transcription using either local Whisper or the OpenAI Whisper API based on the request parameters.
Analyst Worker: Consumes from q.transcript.analyze. It uses the selected AI provider (Local or Cloud) via the AI SDK to generate a structured Markdown SRS document.

Tech Stack

Runtime: Bun
Backend Framework: ElysiaJS
Frontend: Next.js with React & Tailwind CSS
Database: PostgreSQL with Drizzle ORM
Message Broker: RabbitMQ
AI Orchestration: Vercel AI SDK (ai package)
AI Providers:
- Text Generation: Google Gemini, OpenAI (GPT), Anthropic (Claude), OpenRouter, Ollama.
- Transcription: OpenAI Whisper (Cloud) and nodejs-whisper (local).
Audio Processing: FFmpeg

Getting Started

The project runs in Hybrid Mode: infrastructure (PostgreSQL, RabbitMQ) in Docker, and the application (Web, API, Workers) locally via Bun.

Prerequisites

Docker (or OrbStack)
Bun
FFmpeg (brew install ffmpeg on macOS)
Ollama (optional, required only for local AI processing)

Quick Setup (Automated)

Clone the repository and run the setup script. It handles everything: environment files, Docker containers, dependencies, and database migrations.

git clone <your-repository-url>
cd <repository-name>
bun install
bun run setup

After setup, start the application:

bun run dev

Manual Setup

If you prefer to configure each step manually:

Clone and install dependencies

git clone <your-repository-url>
cd <repository-name>
bun install

Configure environment variables

Copy the example files and adjust values as needed:
```
cp apps/api/.env.example apps/api/.env
cp apps/web/.env.example apps/web/.env
```
Start infrastructure
```
docker compose up -d
```
Run database migrations
```
cd apps/api && bun run db:migrate
```
Pull Ollama models (optional, for local AI)
```
ollama pull llama3
ollama pull phi3:mini
```
Start the application
```
bun run dev
```

Available Services

Service	URL
Web App	http://localhost:3000
API Server	http://localhost:8080
RabbitMQ Management	http://localhost:15672 (guest/guest)
Drizzle Studio	`cd apps/api && bun run db:studio`

Stopping the Application

Press Ctrl+C to stop the application services.
Run docker compose down to stop the infrastructure.

How to Use

Open http://localhost:3000 in your browser.
Upload an audio file (MP3, WAV, M4A, MP4).
The system will process the file through the pipeline. You can watch the progress in the web interface.
Once processing is complete, click the "Download Requirements Document" button to get your Markdown file.
Alternatively, access documents via the API at http://localhost:8080/gateway/download/{audio_hash}.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
apps		apps
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
docker-compose.yml		docker-compose.yml
package.json		package.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Docs

Configuration

API Environment Variables

App Configuration (`loadAppEnvs`)

Database & Queue (`loadDbEnvs`)

Services (`loadServicesEnvs`)

Workers Configuration

Transcription Worker (`loadTranscriptionEnvs`)

Analyst Worker (`loadAnalyticsEnvs`)

Gatekeeper Worker (`loadGatekeeperEnvs`)

AI Configuration (`loadAiEnvs`)

Overview

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Quick Setup (Automated)

Manual Setup

Available Services

Stopping the Application

How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart Docs

Configuration

API Environment Variables

App Configuration (loadAppEnvs)

Database & Queue (loadDbEnvs)

Services (loadServicesEnvs)

Workers Configuration

Transcription Worker (loadTranscriptionEnvs)

Analyst Worker (loadAnalyticsEnvs)

Gatekeeper Worker (loadGatekeeperEnvs)

AI Configuration (loadAiEnvs)

Overview

Features

Architecture

Tech Stack

Getting Started

Prerequisites

Quick Setup (Automated)

Manual Setup

Available Services

Stopping the Application

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

App Configuration (`loadAppEnvs`)

Database & Queue (`loadDbEnvs`)

Services (`loadServicesEnvs`)

Transcription Worker (`loadTranscriptionEnvs`)

Analyst Worker (`loadAnalyticsEnvs`)

Gatekeeper Worker (`loadGatekeeperEnvs`)

AI Configuration (`loadAiEnvs`)

Packages