Smart Docs is a tool to extract requirements from audio files.
The backend requires several environment variables to be set for proper operation. These variables are defined in apps/api/.env.example. When running with Docker, you should copy this file to apps/api/.env and customize the values as needed.
These variables are validated in apps/api/src/shared/config/envs.ts.
| Variable | Description | Default |
|---|---|---|
NODE_ENV |
Node.js environment. | dev |
PORT |
Port for the API server. | 8080 |
CLIENT_URL |
URL of the frontend application. | http://localhost:3000 |
| Variable | Description | Required |
|---|---|---|
DATABASE_URL |
URL for your PostgreSQL database. | Yes |
RABBITMQ_URL |
URL for your RabbitMQ instance. | Yes |
| Variable | Description | Default |
|---|---|---|
OLLAMA_API_URL |
URL for your local Ollama API server. | http://localhost:11434 |
| Variable | Description | Required |
|---|---|---|
TRANSCRIPTION_MODEL |
Model used for audio transcription (e.g., 'tiny'). | Yes |
TRANSCRIPTION_LANGUAGE |
Language used for transcription (e.g., 'en', 'pt'). | Yes |
| Variable | Description | Required |
|---|---|---|
ANALYTICS_MODEL |
Ollama model used for analysis (e.g., 'llama3'). | Yes |
| Variable | Description | Default/Required |
|---|---|---|
GATEKEEPER_TRANSCRIPTION_MODEL |
Model used for fast transcription by the Gatekeeper. | Yes |
GATEKEEPER_ANALYTICS_MODEL |
Model used for context validation by the Gatekeeper. | Yes |
TRANSCRIPTION_LANGUAGE |
Language for transcription. | Yes |
MAX_RETRIES |
Maximum number of times to sample the audio. | 3 |
SAMPLE_DURATION |
Duration (in seconds) of each audio sample. | 30 |
These variables allow you to switch between different AI providers. If no provider is specified and no API keys are provided, the system defaults to Ollama (local).
| Variable | Description | Default |
|---|---|---|
AI_PROVIDER |
Preferred provider (gemini, openai, anthropic, openrouter, ollama). |
ollama |
GEMINI_API_KEY |
API Key for Google Gemini. | Optional |
OPENAI_API_KEY |
API Key for OpenAI (GPT). | Optional |
ANTHROPIC_API_KEY |
API Key for Anthropic (Claude). | Optional |
OPENROUTER_API_KEY |
API Key for OpenRouter. | Optional |
AI_MODEL |
Specific model to use (e.g., gpt-4o, google/gemini-flash-1.5). |
Provider Default |
The core philosophy of SmartDocs is "local-first," but it offers the flexibility to use powerful cloud-based AI models when needed. Your data can be processed entirely on your own hardware using Ollama, or you can provide your own API keys for professional-grade analysis via Gemini, OpenAI, Anthropic, or OpenRouter.
- Unified AI Integration: Powered by the Vercel AI SDK, providing a robust and extensible interface for multiple LLM providers.
- Flexible AI Providers: Choose between local-first processing (Ollama) or high-performance cloud models (Gemini, OpenAI, Anthropic, OpenRouter).
- Hybrid Transcription: Support for both local transcription (via Whisper/nodejs-whisper) and high-accuracy cloud transcription (via OpenAI Whisper API).
- User-Provided API Keys: Users can enter their own AI API keys directly in the web interface for custom processing.
- Local Fallback: If no API keys are provided, the system seamlessly falls back to your local Ollama instance (using the OpenAI-compatible bridge).
- Event-Driven Architecture: Built on a robust, scalable architecture using RabbitMQ for asynchronous job processing.
- AI-Powered Filtering: A "Gatekeeper" worker uses a lightweight LLM to quickly discard irrelevant audio (e.g., music, noise).
- Multilingual Transcription: Support for multiple languages with automated cleaning of timestamps and noise.
- Intelligent Analysis: Generates professional Software Requirements Specification (SRS) documents using the provider of your choice.
- Markdown Output: Generates well-structured, readable Markdown documents instead of raw JSON.
- Interactive Editor: View and edit the generated requirements directly in the browser.
- Processing Cache: Avoids re-processing by caching results based on the audio file's hash.
The system is a TypeScript monorepo managed by Turborepo. The backend is built with Bun and ElysiaJS, communicating with a series of background workers via RabbitMQ. AI interactions are abstracted through the Vercel AI SDK, ensuring consistency and resilience across different providers.
- API (
apps/api): The main entry point. It receives an audio file and optional AI configurations (provider/key/model), generates a hash, and places a new job in theq.audio.newqueue. - Gatekeeper Worker: Consumes from
q.audio.new. It validates the audio for speech content using the selected AI provider. - Transcriber Worker: Consumes from
q.audio.transcribe. It performs transcription using either local Whisper or the OpenAI Whisper API based on the request parameters. - Analyst Worker: Consumes from
q.transcript.analyze. It uses the selected AI provider (Local or Cloud) via the AI SDK to generate a structured Markdown SRS document.
- Runtime: Bun
- Backend Framework: ElysiaJS
- Frontend: Next.js with React & Tailwind CSS
- Database: PostgreSQL with Drizzle ORM
- Message Broker: RabbitMQ
- AI Orchestration: Vercel AI SDK (
aipackage) - AI Providers:
- Text Generation: Google Gemini, OpenAI (GPT), Anthropic (Claude), OpenRouter, Ollama.
- Transcription: OpenAI Whisper (Cloud) and
nodejs-whisper(local).
- Audio Processing: FFmpeg
The project runs in Hybrid Mode: infrastructure (PostgreSQL, RabbitMQ) in Docker, and the application (Web, API, Workers) locally via Bun.
- Docker (or OrbStack)
- Bun
- FFmpeg (
brew install ffmpegon macOS) - Ollama (optional, required only for local AI processing)
Clone the repository and run the setup script. It handles everything: environment files, Docker containers, dependencies, and database migrations.
git clone <your-repository-url>
cd <repository-name>
bun install
bun run setupAfter setup, start the application:
bun run devIf you prefer to configure each step manually:
-
Clone and install dependencies
git clone <your-repository-url> cd <repository-name> bun install
-
Configure environment variables
Copy the example files and adjust values as needed:
cp apps/api/.env.example apps/api/.env cp apps/web/.env.example apps/web/.env
-
Start infrastructure
docker compose up -d
-
Run database migrations
cd apps/api && bun run db:migrate
-
Pull Ollama models (optional, for local AI)
ollama pull llama3 ollama pull phi3:mini
-
Start the application
bun run dev
| Service | URL |
|---|---|
| Web App | http://localhost:3000 |
| API Server | http://localhost:8080 |
| RabbitMQ Management | http://localhost:15672 (guest/guest) |
| Drizzle Studio | cd apps/api && bun run db:studio |
- Press
Ctrl+Cto stop the application services. - Run
docker compose downto stop the infrastructure.
- Open http://localhost:3000 in your browser.
- Upload an audio file (MP3, WAV, M4A, MP4).
- The system will process the file through the pipeline. You can watch the progress in the web interface.
- Once processing is complete, click the "Download Requirements Document" button to get your Markdown file.
- Alternatively, access documents via the API at
http://localhost:8080/gateway/download/{audio_hash}.