A natural-language-to-SQL engine and document Q&A platform built on Azure. Ask questions in plain English — the engine translates them to SQL, executes across six supported databases, and returns grounded answers. For unstructured content it provides RAG-based conversational search over indexed documents.
| Component | Purpose |
|---|---|
| NL-to-SQL Engine | Translates natural language into SQL using Azure OpenAI + Semantic Kernel and executes via a secure data-access layer |
| Document Q&A (RAG) | Conversational search over PDFs, Word docs, and web pages via Azure AI Search |
| Data API Builder | Secure REST & GraphQL endpoints in front of every database — no raw connection strings in application code |
The connectors layer unifies these components into a single, hyper-context and semantic-enhanced engine powered by Cosmos vector search, schema sensing across six databases, and Azure services.
┌──────────────────────────────────────────────────────────────────────┐
│ NL2SQL-ENGINE │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │
│ │ Chat with │ │ NLP-to-SQL │ │ Data API Builder │ │
│ │ Your Data │ │ (Semantic │ │ (REST / GraphQL │ │
│ │ (RAG + AI │ │ Kernel + │ │ endpoints over │ │
│ │ Search) │ │ Azure OpenAI) │ │ your database) │ │
│ └────────┬────────┘ └────────┬────────┘ └─────────┬─────────┘ │
│ │ │ │ │
│ └──────────┬──────────┘ │ │
│ │ Connectors │ │
│ ┌─────────▼─────────────────────────────────▼──────┐ │
│ │ Unified Query Router │ │
│ │ • Routes structured queries → NL2SQL → DAB API │ │
│ │ • Routes document queries → Chat / AI Search │ │
│ │ • Schema sensing across all six databases │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
Databases:
Azure SQL · SQL Server · Synapse (SQL DW) · PostgreSQL · MySQL · Cosmos DB
Azure Services:
Azure OpenAI │ Azure AI Search │ Azure Container Apps
Azure Container Registry │ Managed Identity
VNet │ Private Endpoints │ Private DNS Zones
## Features
- **Natural language → SQL**: Ask questions in plain English; the engine translates them to SQL and executes against your database.
- **Document Q&A (RAG)**: Upload PDFs, Word docs, web pages — chat with them using the RAG pattern backed by Azure AI Search.
- **Secure data access**: Data API Builder sits in front of every database, exposing only REST/GraphQL endpoints — no direct DB credentials in application code.
- **Unified query router**: The connector layer automatically decides whether a query should go to the SQL path or the document-search path.
- **Schema sensing**: Automatically detects database schema so no manual mapping is required.
- **Multi-database support**: Azure SQL, SQL Server, Synapse (SQL DW), PostgreSQL, MySQL, Cosmos DB.
- **One-click Azure deployment**: Full infrastructure-as-code (Bicep + AZD).
## Quick Start
### Prerequisites
| Tool | Minimum version |
|---|---|
| [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) | 2.50 |
| [Azure Developer CLI (azd)](https://learn.microsoft.com/azure/developer/azure-developer-cli/install-azd) | 1.5 |
| [.NET SDK](https://get.dot.net) | 8.0 (for Data API Builder CLI) |
| [Python](https://python.org) | 3.10 |
| [Docker Desktop](https://docker.com/products/docker-desktop) | 24+ |
| An [Azure subscription](https://azure.microsoft.com/free/) | — |
### 1 — Use this template
Click **"Use this template"** on GitHub, or clone directly:
```bash
git clone https://github.com/codes4humanity/NL2SQL-ENGINE.git
cd NL2SQL-ENGINE
# Run this from the repository root
cp .env.sample .env
# Edit .env with your Azure resource valuesazd auth login
azd upThis provisions all required Azure resources (OpenAI, AI Search, Azure SQL, Synapse, PostgreSQL, MySQL, Cosmos DB, Container Apps, Container Registry) and deploys the application.
docker compose up --buildAccess the app at http://localhost:3000.
# Windows
.\scripts\setup.ps1
# macOS / Linux
bash scripts/setup.shThe setup script installs all dependencies, creates .env from the template (if needed),
creates the database tables, and seeds sample data. After setup finishes:
# Recommended: use hosted DAB endpoint from deployment outputs in .env
# Optional local fallback only when needed:
# dab start --config src/api/dab-config.json
# Terminal 1 — Start the NL2SQL engine
python -m uvicorn src.nl2sql.app:api --port 8000 --reload
# Terminal 2 — Start the chat backend (run from the repo root)
python -m uvicorn src.chat.backend.app:app --port 5001 --reload
# Terminal 3 — Start the chat frontend (optional)
cd src/chat/frontend
npm install
npm run devPrerequisites for
dab start:
- .NET SDK 8+ and the DAB CLI (
dotnet tool install microsoft.dataapibuilder -g)- A valid
SQL_CONNECTION_STRINGin your.envfile (see.env.samplefor the format)- Azure login (
az login) — required forActive Directory Defaultauthentication- If
ENABLE_PRIVATE_NETWORKING=true, run from a host with private endpoint connectivity; if disabled, allow your source IP in SQL firewall (for example withpython scripts/update_firewall.py)- If you want request-level Microsoft Entra authorization on
/query, set theNL2SQL_ENTRA_*,NL2SQL_ROLE_POLICIES_JSON/NL2SQL_GROUP_POLICIES_JSON, andVITE_ENTRA_*variables in.env.sample, then sign in from the React UI
- Phase 1: Private endpoint networking for SQL/OpenAI/Search/Cosmos via Bicep.
- Phase 2: Hosted DAB-first runtime using
DAB_ENDPOINTwith optionalDAB_FALLBACK_ENDPOINT. - Phase 3: Workload identity role-binding using
RUNTIME_PRINCIPAL_IDfor least-privilege data-plane access.
NL2SQL-ENGINE/
├── README.md # This file
├── azure.yaml # Azure Developer CLI configuration
├── docker-compose.yml # Local development orchestration
├── .env.sample # Environment variable template
├── .gitignore
│
├── connectors/ # ★ Data-access and routing layer
│ ├── unified_query.py # Route queries to NL2SQL or Chat
│ ├── datasource_registry.py # Registry of all supported data sources
│ ├── dab_connector.py # Execute SQL via Data API Builder REST/GraphQL
│ ├── sql_connector.py # Azure SQL / SQL Server direct connector
│ ├── sqldw_connector.py # Synapse Analytics (SQL DW) connector
│ ├── pg_connector.py # PostgreSQL connector
│ ├── mysql_connector.py # MySQL connector
│ ├── cosmos_connector.py # Cosmos DB connector
│ └── search_connector.py # Document search via Azure AI Search
│
├── src/
│ ├── nl2sql/ # NL-to-SQL engine
│ │ ├── app.py # Entry point (CLI + HTTP server)
│ │ ├── requirements.txt
│ │ └── plugins/
│ │ ├── nl_to_sql.py # Semantic Kernel plugin: NL → SQL
│ │ └── query.py # Semantic Kernel plugin: execute SQL
│ │
│ ├── chat/ # Document Q&A (RAG) component
│ │ ├── backend/ # Python FastAPI backend (RAG)
│ │ │ ├── app.py
│ │ │ └── requirements.txt
│ │ └── frontend/ # React/TypeScript web UI
│ │ ├── package.json
│ │ └── src/
│ │
│ └── api/ # Data API Builder configuration
│ └── dab-config.json # Entities, permissions, relationships
│
├── infra/ # Infrastructure as Code (Bicep)
│ ├── main.bicep # Root template
│ ├── main.parameters.json # Default parameters
│ └── modules/
│ ├── openai.bicep # Azure OpenAI
│ ├── sql.bicep # Azure SQL Database
│ ├── synapse.bicep # Synapse Analytics (SQL DW)
│ ├── postgresql.bicep # Azure Database for PostgreSQL
│ ├── mysql.bicep # Azure Database for MySQL
│ ├── cosmos.bicep # Azure Cosmos DB
│ ├── search.bicep # Azure AI Search
│ ├── containerenv.bicep # Container Apps Environment
│ ├── containerregistry.bicep # Azure Container Registry
│ ├── network.bicep # VNet, private endpoints, DNS zones
│ └── identity.bicep # Managed Identity + RBAC
│
├── docs/
│ ├── architecture.md
│ ├── setup-guide.md
│ └── connectors.md
│
├── scripts/
│ ├── setup.sh # Local environment setup (macOS / Linux)
│ ├── setup.ps1 # Local environment setup (Windows)
│ ├── init_db.py # Create tables & seed data in Azure SQL
│ ├── update_firewall.py # Auto-detect public IP & add SQL firewall rule
│ └── postprovision.sh # Runs after `azd provision`
│
└── .devcontainer/
└── devcontainer.json # VS Code / GitHub Codespaces dev container
The connectors/ directory is the engine's data-access and routing layer. It contains nine modules:
Decides at runtime whether a user query should be:
- Sent to the NL2SQL path (structured database query) — for questions like "How many orders were placed last month?"
- Sent to the Chat/RAG path (unstructured document search) — for questions like "Summarise the Q3 earnings report"
Central registry that maps data-source identifiers (azure_sql, sql_server, sqldw, cosmos_db, postgresql, mysql) to their connector classes, metadata, and runtime configuration.
After nl_to_sql.py generates a SQL statement, this connector:
- Introspects the Data API Builder REST/GraphQL endpoint
- Translates the raw SQL into a safe API call (no direct DB access)
- Returns structured JSON results
| Module | Database |
|---|---|
sql_connector.py |
Azure SQL Database / SQL Server |
sqldw_connector.py |
Azure Synapse Analytics (SQL DW) |
pg_connector.py |
Azure Database for PostgreSQL |
mysql_connector.py |
Azure Database for MySQL |
cosmos_connector.py |
Azure Cosmos DB |
Each connector handles authentication (via DefaultAzureCredential), schema introspection, query execution, and result formatting for its respective database.
Wraps the Azure AI Search SDK to:
- Index new documents uploaded by the user
- Perform hybrid (keyword + vector) search for the RAG pipeline
- Return ranked chunks to the Chat backend for answer generation
See .env.sample for all required and optional environment variables.
See docs/setup-guide.md for detailed deployment instructions including:
- Azure subscription setup
- Resource group and naming conventions
- Role assignments
- Post-deployment validation
See CONTRIBUTING.md.
NL2SQL-ENGINE drew inspiration from Microsoft Azure open-source projects, and we thank their contributors for advancing the ecosystem:
- Azure/data-api-builder — Secure REST/GraphQL data-access patterns
- Azure-Samples/chat-with-your-data-solution-accelerator — RAG pipeline design patterns
- Azure-Samples/nlp-sql-in-a-box — Natural-language-to-SQL concepts with Semantic Kernel
This project is licensed under the MIT License.