A full-stack, locally-hosted AI customer support chatbot built with RAG (Retrieval-Augmented Generation). It answers customer questions by searching a vector knowledge base and generating natural language responses — all running on your machine with zero API costs.
- RAG Pipeline — Retrieves relevant knowledge chunks via vector similarity before generating answers, ensuring accurate and grounded responses
- 100% Local & Free — Uses Ollama with Gemma 3 4B for LLM inference and sentence-transformers for embeddings — no API keys or cloud services required
- Vector Search — PostgreSQL + pgvector for fast cosine similarity search across 384-dimensional embeddings
- Real-time Streaming — Server-Sent Events (SSE) deliver token-by-token responses to the UI as the LLM generates them
- 50-Item Knowledge Base — Pre-built Turkish FAQ covering 14 categories (shipping, returns, payments, account management, etc.)
- Modern Chat UI — Full-page Angular 17 chat interface with typing indicators, source citations, quick-start questions, and responsive design
- Multi-tenant Ready —
tenant_idfield supports multiple knowledge bases in a single database
┌─────────────────┐ HTTP/SSE ┌─────────────────────────────────────┐
│ │ ───────────────► │ FastAPI Backend │
│ Angular 17 │ │ │
│ Chat UI │ ◄─────────────── │ ┌───────────┐ ┌──────────────┐ │
│ (port 4200) │ SSE stream │ │ RAG │ │ Ollama │ │
│ │ │ │ Service │──►│ (Gemma 3) │ │
└─────────────────┘ │ └─────┬─────┘ └──────────────┘ │
│ │ │
│ ┌─────▼─────────────────────────┐ │
│ │ sentence-transformers │ │
│ │ (all-MiniLM-L6-v2, 384d) │ │
│ └─────┬────────────────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ PostgreSQL │ │
│ │ + pgvector │ │
│ └───────────┘ │
└─────────────────────────────────────┘
How a query flows:
- User sends a message from the Angular chat UI
- Backend embeds the query using sentence-transformers (384-dim vector)
- pgvector performs cosine similarity search against the knowledge base
- Top-5 most relevant chunks are retrieved as context
- Context + question are sent to Ollama (Gemma 3 4B) with a system prompt
- Response streams back token-by-token via SSE to the frontend
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Angular 17 (standalone components) | Chat UI with SSE streaming |
| Backend | FastAPI + Python 3.11 | Async REST API + SSE endpoints |
| LLM | Ollama (Gemma 3 4B) | Local language model inference |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Local 384-dim text embeddings |
| Database | PostgreSQL 16 + pgvector | Vector similarity search |
| Infra | Docker Compose | Database container orchestration |
ChatBotAI/
├── docker-compose.yml # PostgreSQL + pgvector container
├── knowledge-base.json # 50-item Turkish FAQ knowledge base
│
├── backend/
│ ├── Dockerfile # Python API container
│ ├── requirements.txt # Python dependencies
│ ├── seed_knowledge.py # Seeds knowledge base into pgvector
│ ├── .env.example # Environment variables template
│ └── app/
│ ├── main.py # FastAPI app, CORS, lifespan
│ ├── config.py # Pydantic settings from .env
│ ├── database.py # Async SQLAlchemy + pgvector setup
│ ├── models/
│ │ └── schemas.py # Request/response Pydantic models
│ ├── api/
│ │ ├── chat.py # POST /api/chat, POST /api/chat/stream
│ │ └── knowledge.py # CRUD endpoints for knowledge base
│ └── services/
│ ├── embedding_service.py # Singleton sentence-transformer
│ ├── rag_service.py # Vector search + context builder
│ └── claude_service.py # Ollama LLM client (OpenAI-compatible)
│
└── frontend/
├── angular.json # Angular CLI config with proxy
├── proxy.conf.json # Dev proxy → backend:8000
└── src/app/
├── app.component.ts # Root component
├── models/
│ └── chat.models.ts # TypeScript interfaces
├── services/
│ └── chat.service.ts # HTTP + SSE streaming client
└── components/
└── chat-widget/ # Full-page chat component
├── chat-widget.component.ts
├── chat-widget.component.html
└── chat-widget.component.scss
- Docker Desktop — for PostgreSQL + pgvector
- Python 3.11+ — for the backend
- Node.js 18+ and Angular CLI 17+ — for the frontend
- Ollama — for local LLM inference
git clone https://github.com/ahmetgkdemr/ChatBotAI.git
cd ChatBotAIdocker compose up db -dWait for the health check to pass (~5 seconds).
Download Ollama and pull the model:
ollama pull gemma3:4bMake sure Ollama is running (it starts automatically after installation).
cd backend
pip install -r requirements.txt
cp .env.example .envSeed the knowledge base into pgvector:
python -X utf8 seed_knowledge.pyStart the API server:
uvicorn app.main:app --reload --port 8000cd frontend
npm install
ng serveNavigate to http://localhost:4200 — the chat interface will be ready.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
Health check |
POST |
/api/chat |
Synchronous chat response |
POST |
/api/chat/stream |
SSE streaming chat response |
GET |
/api/knowledge |
List knowledge base entries |
GET |
/api/knowledge/stats |
Category distribution stats |
POST |
/api/knowledge |
Add new knowledge entry |
DELETE |
/api/knowledge/{id} |
Delete knowledge entry |
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "How can I track my order?", "tenant_id": "demo"}'{
"response": "You can track your order from 'My Account > My Orders' page...",
"sources": [
{
"question": "How do I track my order?",
"category": "Order Tracking",
"score": 0.8923
}
]
}The knowledge base (knowledge-base.json) contains 50 Turkish FAQ entries across 14 categories:
| Category | Count | Topics |
|---|---|---|
| Account Management | 5 | Registration, password reset, 2FA, profile |
| Shipping | 5 | Tracking, delivery times, free shipping |
| Returns & Refunds | 5 | Return policy, refund process, conditions |
| Payment Methods | 5 | Credit card, bank transfer, installments |
| Order Management | 4 | Cancellation, modification, bulk orders |
| Product Info | 3 | Warranty, specs, authenticity |
| Campaigns & Discounts | 3 | Promo codes, loyalty program |
| Technical Support | 3 | App issues, notifications |
| Privacy & Security | 3 | Data protection, KVKK compliance |
| Corporate Sales | 3 | B2B, invoicing |
| Gift Cards | 2 | Purchase, redemption |
| Marketplace | 3 | Third-party sellers |
| Accessibility | 2 | Accessibility features |
| Sustainability | 2 | Eco-friendly packaging |
All configuration is managed through environment variables in backend/.env:
| Variable | Default | Description |
|---|---|---|
OLLAMA_BASE_URL |
http://localhost:11434/v1 |
Ollama API endpoint (OpenAI-compatible) |
OLLAMA_MODEL |
gemma3:4b |
LLM model name in Ollama |
DATABASE_URL |
postgresql+asyncpg://... |
PostgreSQL connection string |
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
sentence-transformers model name |
You can swap the LLM by changing OLLAMA_MODEL in .env to any Ollama-supported model:
# Larger, higher quality
OLLAMA_MODEL=llama3.1:8b
# Smaller, faster
OLLAMA_MODEL=gemma3:1b
# Multilingual
OLLAMA_MODEL=qwen2.5:7bChat Interface
The full-page chat interface features:
- Welcome screen with quick-start question buttons
- User messages (right, blue) and AI responses (left, gray)
- Real-time streaming with typing indicator
- Source category badges on each response
- Responsive layout (mobile full-screen, desktop centered card)
This project is for educational and demonstration purposes.