Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 174 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,174 @@
# Text-to-SQL RAG
# Natural Language to SQL Bot (Text to SQL for SQLite)

## 📅 Overview

This project is a **Text-to-SQL Bot** where users can ask questions in simple English, and the system will:
- Generate the correct SQL query.
- Execute the SQL on a **SQLite database** (`employee.db`).
- Return both the **query** and the **output**.

The system is built using **Flask**, **LangChain**, **OpenAI GPT-3.5**, **FAISS**, and **SQLite**.

---

## 📈 Step-by-Step Project Flow

```text
Step 1:
I created a SQLite database with a basic Employee table having fields like name, age, city, gender, total experience, and blood group.

Step 2:
I prepared a CSV file (employee_questions.csv) where I wrote simple natural language questions, their matching SQL queries, and short descriptions.
This CSV acts as few-shot examples to guide the model.

Step 3:
I generated embeddings for these examples using OpenAI embeddings and stored them in FAISS, a fast vector database, for quick searching.

Step 4:
I used LangChain to create:
- A FAISS retriever (to search examples based on user input)
- A Conversational Retrieval Chain that connects the retriever with a language model (LLM).

Step 5:
I connected the system with OpenAI GPT-3.5-turbo as the model.
(But the setup is flexible — it can also work with Gemini, Llama, or any open-source model.)

Step 6:
I built a Flask API where:
- The user sends a question.
- The system finds similar examples from FAISS.
- It creates a final prompt (schema + examples + user query).
- The model generates the SQL query.
- SQL is cleaned and validated.
- The query is run on the employee.db database.
- The API sends back both the SQL query and the query output.

Step 7:
In future, I can enhance this system by:
- Adding support for multi-table joins and data modification queries (INSERT/UPDATE).
- Integrating conversation history, so the bot can understand previous context and give smarter, more connected answers.
- Replacing the model with open-source alternatives for cost-saving.
```

---

## 📞 System Flow

```text
User Question
Flask API Endpoint (POST /)
FAISS Retriever (Semantic Search on employee_questions.csv examples)
Prompt Formation (Database Schema + Retrieved Examples + User Question)
LLM (OpenAI GPT-3.5 / Gemini / Llama etc.)
Generated SQL Query
SQL Cleaning & Validation
Execution on SQLite Database (employee.db)
Return Query + Output as API Response
```

---

## 📊 Technology Stack

| Component | Technology |
|:----------|:-----------|
| API Server | Flask |
| Database | SQLite (employee.db) |
| Embeddings | OpenAI text-embedding-ada-002 |
| Vectorstore | FAISS |
| LLM | OpenAI GPT-3.5-turbo (flexible to switch) |
| Memory | LangChain ConversationBufferMemory |
| Retrieval Chain | LangChain ConversationalRetrievalChain |

---

## 🔧 Setup Instructions

### 1. Clone the repository
```bash
git clone <https://github.com/kartiktongaria/text2sql-chatbot.git>
cd your-repo-folder
```

### 2. Create and activate a virtual environment
```bash
python3 -m venv env
source env/bin/activate # For Mac/Linux
# OR
env\Scripts\activate.bat # For Windows
```

### 3. Install dependencies
```bash
pip install -r requirements.txt
```

### 4. Set up environment variables
Create a `.env` file:
```bash
echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
```

### 5. Start the server
```bash
python chat.py
```
Server will run on:
```bash
http://127.0.0.1:8000/
```

---

## 📡 API Usage

- **Endpoint:** `POST /`
- **Request Body Example:**
```json
{
"question": "How many employees have more than 5 years of experience?"
}
```

- **Response Example:**
```json
{
"response": {
"query": "SELECT COUNT(*) FROM Employee WHERE total_experience > 5;",
"result": [{"COUNT(*)": 20}]
}
}
```

You can test using **Postman** or **cURL**.

---

## 🛠️ Future Enhancements
- Add multi-table joins.
- Support Insert, Update, and Delete queries.
- Add conversation history to understand context better.
- Integrate open-source LLMs (to reduce cost and improve control).
- Build a simple frontend UI (Streamlit or React).

---

## 💚 Final Notes

This project shows how natural language questions can be turned into real SQL queries and executed live on a database.
It's a working example of how **RAG (Retrieval Augmented Generation)** can make databases talk in human language!

✅ To check chatbot outputs, refer to the **`result_img` folder** available in the repository, where screenshots of working results are attached.

---

# 🌟 Thank you for exploring the Natural Language to SQL Bot!

Binary file added Result_img/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Result_img/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Result_img/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Result_img/4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Result_img/5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Result_img/6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 158 additions & 0 deletions requirement.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
accelerate==0.0.1
aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.0.0
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.0
async-lru==2.0.4
attrs==23.1.0
Babel==2.13.0
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.1.0
blinker==1.7.0
cachetools==5.3.1
certifi==2023.7.22
cffi==1.16.0
charset-normalizer==3.3.0
click==8.1.7
comm==0.1.4
dataclasses-json==0.6.4
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
distro==1.9.0
executing==2.0.0
faiss-cpu==1.8.0
fastjsonschema==2.18.1
Flask==3.0.3
fqdn==1.5.1
frozenlist==1.4.1
google-ai-generativelanguage==0.3.3
google-api-core==2.12.0
google-auth==2.23.3
google-generativeai==0.2.1
googleapis-common-protos==1.61.0
grpcio==1.59.0
grpcio-status==1.59.0
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
hyperplane==0.0.1
idna==3.4
ipykernel==6.25.2
ipython==8.16.1
ipython-genutils==0.2.0
ipywidgets==8.1.1
isoduration==20.11.0
itsdangerous==2.2.0
jedi==0.19.1
Jinja2==3.1.2
json5==0.9.14
jsonpatch==1.33
jsonpointer==2.4
jsonschema==4.19.1
jsonschema-specifications==2023.7.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.8.0
jupyter-lsp==2.2.0
jupyter_client==8.4.0
jupyter_core==5.4.0
jupyter_server==2.8.0
jupyter_server_terminals==0.4.4
jupyterlab==4.0.7
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.9
jupyterlab_server==2.25.0
langchain==0.1.16
langchain-community==0.0.34
langchain-core==0.1.45
langchain-openai==0.1.3
langchain-text-splitters==0.0.1
langsmith==0.1.49
MarkupSafe==2.1.3
marshmallow==3.21.1
matplotlib-inline==0.1.6
mistune==3.0.2
multidict==6.0.5
mypy-extensions==1.0.0
nbclient==0.8.0
nbconvert==7.9.2
nbformat==5.9.2
nest-asyncio==1.5.8
notebook==7.0.6
notebook_shim==0.2.3
numpy==1.26.1
openai==1.23.2
orjson==3.10.1
overrides==7.4.0
packaging==23.2
pandas==2.1.1
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.11.0
prometheus-client==0.17.1
prompt-toolkit==3.0.39
proto-plus==1.22.3
protobuf==4.24.4
psutil==5.9.6
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
pydantic==2.11.3
pydantic_core==2.33.1
Pygments==2.16.1
PyPDF2==3.0.1
python-dateutil==2.8.2
python-dotenv==1.0.0
python-json-logger==2.0.7
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
qtconsole==5.4.4
QtPy==2.4.0
referencing==0.30.2
regex==2024.4.16
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.10.6
rsa==4.9
Send2Trash==1.8.2
setuptools==68.2.2
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
SQLAlchemy==2.0.29
stack-data==0.6.3
tenacity==8.2.3
terminado==0.17.1
tiktoken==0.6.0
tinycss2==1.2.1
tornado==6.3.3
tqdm==4.66.1
traitlets==5.11.2
types-python-dateutil==2.8.19.14
typing-inspect==0.9.0
typing-inspection==0.4.0
typing_extensions==4.13.2
tzdata==2023.3
uri-template==1.3.0
urllib3==2.0.7
wcwidth==0.2.8
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.4
Werkzeug==3.0.2
wheel==0.41.2
widgetsnbextension==4.0.9
yarl==1.9.4
Loading