This project is a microservices-based application that allows users to upload CSV files and use AI to query and analyze the data. The system leverages vector databases, AI embeddings, and Redis Pub/Sub for efficient data processing. The architecture is designed to be scalable, containerized with Docker, and deployable in Kubernetes.
- Upload CSV files and track processing status
- Store metadata of uploaded datasets in PostgreSQL
- Vectorize text data and store embeddings in Qdrant
- AI-powered query engine to answer user questions about the data
- Redis Pub/Sub for asynchronous processing
- API Gateway for unified access to microservices
- Kubernetes for deployment and autoscaling
- File Upload Service: Handles file uploads, extracts metadata, and triggers processing.
- Data Processing Service: Reads CSV files, generates embeddings using
nomic-embed-text
, and stores them in Qdrant. - AI Query Service: Uses an AI model (
Qwen 2.5 7B
) via Ollama to answer user queries. - Metadata Store (PostgreSQL): Stores information about uploaded datasets.
- Vector Database (Qdrant): Stores embeddings for efficient retrieval.
- Redis Pub/Sub: Manages event-driven communication between services.
- API Gateway: Centralized entry point for the frontend and services.
- Frontend (React + TypeScript): Provides a user interface for uploading files and querying data.
- User uploads a CSV file via the frontend.
- File Upload Service stores metadata in PostgreSQL and publishes an event to Redis.
- Data Processing Service consumes the event, processes the CSV, generates embeddings, and stores them in Qdrant.
- Metadata status is updated as
ready
in PostgreSQL after processing. - User queries the dataset, and AI Query Service retrieves relevant data using vector search in Qdrant.
- AI Query Service sends context to the AI model (running via Ollama on a different machine) and returns structured responses.
- File Upload Service: Uploading CSV files, generating embeddings, and storing in Qdrant.
- AI Query Service: Using an AI model to answer user queries.
- Data Prep Service: Reading CSV files, generating embeddings, and storing in Qdrant.
- Redis Pub/Sub: (Running in docker) Managing event-driven communication between services.
- API Gateway: Centralized entry point for the frontend and services.
- Frontend (React + TypeScript): Providing a user interface for uploading files and querying data.
- Docker Compose: Configuring Docker containers for the microservices.
- Kubernetes: Deploying the microservices in a Kubernetes cluster.
- Add support to retrieve Actual Data from the CSV file.
- Optimize query response ranking with hybrid search.
- Add support for Contextual Querying.
- Add support for different AI models.
git clone https://github.com/your-repo.git
cd your-repo
docker-compose up -d
This will start all services including PostgreSQL, Redis, Qdrant, and the microservices.
Each microservice has its own package.json
. To run a service locally:
cd services/file-upload
npm install
npm start
Repeat for other microservices.
- Frontend:
http://localhost:3000
- API Gateway:
http://localhost:8080
- PostgreSQL Admin:
http://localhost:5432
- Qdrant UI:
http://localhost:6333
- Redis CLI:
redis-cli
Each microservice requires environment variables. Create a .env
file in each service directory.
Example .env
for File Upload Service:
PORT=5001
POSTGRES_URL=postgres://user:password@localhost:5432/dbname
REDIS_HOST=localhost
REDIS_PORT=6379
Upload a CSV file:
curl -X POST -F "[email protected]" http://localhost:5001/upload
Query the AI Service:
curl -X POST http://localhost:5003/query -H "Content-Type: application/json" -d '{"dataset_id": "123", "query": "What do people like about the phone?"}'
redis-cli
SUBSCRIBE dataset_processing
Contributions are welcome! Fork the repo and submit a PR.
MIT License. See LICENSE
for details.