Skip to content

Latest commit

 

History

History
288 lines (225 loc) · 14.2 KB

File metadata and controls

288 lines (225 loc) · 14.2 KB

NL2SQL-ENGINE

License: MIT Deploy to Azure

A natural-language-to-SQL engine and document Q&A platform built on Azure. Ask questions in plain English — the engine translates them to SQL, executes across six supported databases, and returns grounded answers. For unstructured content it provides RAG-based conversational search over indexed documents.

Component Purpose
NL-to-SQL Engine Translates natural language into SQL using Azure OpenAI + Semantic Kernel and executes via a secure data-access layer
Document Q&A (RAG) Conversational search over PDFs, Word docs, and web pages via Azure AI Search
Data API Builder Secure REST & GraphQL endpoints in front of every database — no raw connection strings in application code

The connectors layer unifies these components into a single, hyper-context and semantic-enhanced engine powered by Cosmos vector search, schema sensing across six databases, and Azure services.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         NL2SQL-ENGINE                                │
│                                                                      │
│  ┌─────────────────┐   ┌─────────────────┐   ┌───────────────────┐   │
│  │   Chat with     │   │   NLP-to-SQL    │   │  Data API Builder │   │
│  │   Your Data     │   │   (Semantic     │   │  (REST / GraphQL  │   │
│  │   (RAG + AI     │   │    Kernel +     │   │   endpoints over  │   │
│  │    Search)      │   │   Azure OpenAI) │   │   your database)  │   │
│  └────────┬────────┘   └────────┬────────┘   └─────────┬─────────┘   │
│           │                     │                      │             │
│           └──────────┬──────────┘                      │             │
│                      │          Connectors             │             │
│            ┌─────────▼─────────────────────────────────▼──────┐      │
│            │          Unified Query Router                    │      │
│            │  • Routes structured queries → NL2SQL → DAB API  │      │
│            │  • Routes document queries   → Chat / AI Search  │      │
│            │  • Schema sensing across all six databases       │      │
│            └──────────────────────────────────────────────────┘      │
└──────────────────────────────────────────────────────────────────────┘

Databases:
  Azure SQL · SQL Server · Synapse (SQL DW) · PostgreSQL · MySQL · Cosmos DB

Azure Services:
  Azure OpenAI  │  Azure AI Search  │  Azure Container Apps
  Azure Container Registry  │  Managed Identity
  VNet  │  Private Endpoints  │  Private DNS Zones


## Features

- **Natural language → SQL**: Ask questions in plain English; the engine translates them to SQL and executes against your database.
- **Document Q&A (RAG)**: Upload PDFs, Word docs, web pages — chat with them using the RAG pattern backed by Azure AI Search.
- **Secure data access**: Data API Builder sits in front of every database, exposing only REST/GraphQL endpoints — no direct DB credentials in application code.
- **Unified query router**: The connector layer automatically decides whether a query should go to the SQL path or the document-search path.
- **Schema sensing**: Automatically detects database schema so no manual mapping is required.
- **Multi-database support**: Azure SQL, SQL Server, Synapse (SQL DW), PostgreSQL, MySQL, Cosmos DB.
- **One-click Azure deployment**: Full infrastructure-as-code (Bicep + AZD).

## Quick Start

### Prerequisites

| Tool | Minimum version |
|---|---|
| [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli) | 2.50 |
| [Azure Developer CLI (azd)](https://learn.microsoft.com/azure/developer/azure-developer-cli/install-azd) | 1.5 |
| [.NET SDK](https://get.dot.net) | 8.0 (for Data API Builder CLI) |
| [Python](https://python.org) | 3.10 |
| [Docker Desktop](https://docker.com/products/docker-desktop) | 24+ |
| An [Azure subscription](https://azure.microsoft.com/free/) | — |

### 1 — Use this template

Click **"Use this template"** on GitHub, or clone directly:

```bash
git clone https://github.com/codes4humanity/NL2SQL-ENGINE.git
cd NL2SQL-ENGINE

2 — Configure environment

# Run this from the repository root
cp .env.sample .env
# Edit .env with your Azure resource values

3 — Deploy to Azure (one command)

azd auth login
azd up

This provisions all required Azure resources (OpenAI, AI Search, Azure SQL, Synapse, PostgreSQL, MySQL, Cosmos DB, Container Apps, Container Registry) and deploys the application.

4 — Run locally with Docker

docker compose up --build

Access the app at http://localhost:3000.

5 — Run locally without Docker

# Windows
.\scripts\setup.ps1

# macOS / Linux
bash scripts/setup.sh

The setup script installs all dependencies, creates .env from the template (if needed), creates the database tables, and seeds sample data. After setup finishes:

# Recommended: use hosted DAB endpoint from deployment outputs in .env
# Optional local fallback only when needed:
# dab start --config src/api/dab-config.json

# Terminal 1 — Start the NL2SQL engine
python -m uvicorn src.nl2sql.app:api --port 8000 --reload

# Terminal 2 — Start the chat backend (run from the repo root)
python -m uvicorn src.chat.backend.app:app --port 5001 --reload

# Terminal 3 — Start the chat frontend (optional)
cd src/chat/frontend
npm install
npm run dev

Prerequisites for dab start:

  1. .NET SDK 8+ and the DAB CLI (dotnet tool install microsoft.dataapibuilder -g)
  2. A valid SQL_CONNECTION_STRING in your .env file (see .env.sample for the format)
  3. Azure login (az login) — required for Active Directory Default authentication
  4. If ENABLE_PRIVATE_NETWORKING=true, run from a host with private endpoint connectivity; if disabled, allow your source IP in SQL firewall (for example with python scripts/update_firewall.py)
  5. If you want request-level Microsoft Entra authorization on /query, set the NL2SQL_ENTRA_*, NL2SQL_ROLE_POLICIES_JSON / NL2SQL_GROUP_POLICIES_JSON, and VITE_ENTRA_* variables in .env.sample, then sign in from the React UI

Security Phases (Now Implemented)

  1. Phase 1: Private endpoint networking for SQL/OpenAI/Search/Cosmos via Bicep.
  2. Phase 2: Hosted DAB-first runtime using DAB_ENDPOINT with optional DAB_FALLBACK_ENDPOINT.
  3. Phase 3: Workload identity role-binding using RUNTIME_PRINCIPAL_ID for least-privilege data-plane access.

Repository Structure

NL2SQL-ENGINE/
├── README.md                     # This file
├── azure.yaml                    # Azure Developer CLI configuration
├── docker-compose.yml            # Local development orchestration
├── .env.sample                   # Environment variable template
├── .gitignore
│
├── connectors/                   # ★ Data-access and routing layer
│   ├── unified_query.py          #   Route queries to NL2SQL or Chat
│   ├── datasource_registry.py    #   Registry of all supported data sources
│   ├── dab_connector.py          #   Execute SQL via Data API Builder REST/GraphQL
│   ├── sql_connector.py          #   Azure SQL / SQL Server direct connector
│   ├── sqldw_connector.py        #   Synapse Analytics (SQL DW) connector
│   ├── pg_connector.py           #   PostgreSQL connector
│   ├── mysql_connector.py        #   MySQL connector
│   ├── cosmos_connector.py       #   Cosmos DB connector
│   └── search_connector.py       #   Document search via Azure AI Search
│
├── src/
│   ├── nl2sql/                   # NL-to-SQL engine
│   │   ├── app.py                #   Entry point (CLI + HTTP server)
│   │   ├── requirements.txt
│   │   └── plugins/
│   │       ├── nl_to_sql.py      #   Semantic Kernel plugin: NL → SQL
│   │       └── query.py          #   Semantic Kernel plugin: execute SQL
│   │
│   ├── chat/                     # Document Q&A (RAG) component
│   │   ├── backend/              #   Python FastAPI backend (RAG)
│   │   │   ├── app.py
│   │   │   └── requirements.txt
│   │   └── frontend/             #   React/TypeScript web UI
│   │       ├── package.json
│   │       └── src/
│   │
│   └── api/                      # Data API Builder configuration
│       └── dab-config.json       #   Entities, permissions, relationships
│
├── infra/                        # Infrastructure as Code (Bicep)
│   ├── main.bicep                #   Root template
│   ├── main.parameters.json      #   Default parameters
│   └── modules/
│       ├── openai.bicep           #   Azure OpenAI
│       ├── sql.bicep              #   Azure SQL Database
│       ├── synapse.bicep          #   Synapse Analytics (SQL DW)
│       ├── postgresql.bicep       #   Azure Database for PostgreSQL
│       ├── mysql.bicep            #   Azure Database for MySQL
│       ├── cosmos.bicep           #   Azure Cosmos DB
│       ├── search.bicep           #   Azure AI Search
│       ├── containerenv.bicep     #   Container Apps Environment
│       ├── containerregistry.bicep #  Azure Container Registry
│       ├── network.bicep          #   VNet, private endpoints, DNS zones
│       └── identity.bicep         #   Managed Identity + RBAC
│
├── docs/
│   ├── architecture.md
│   ├── setup-guide.md
│   └── connectors.md
│
├── scripts/
│   ├── setup.sh                  # Local environment setup (macOS / Linux)
│   ├── setup.ps1                 # Local environment setup (Windows)
│   ├── init_db.py                # Create tables & seed data in Azure SQL
│   ├── update_firewall.py        # Auto-detect public IP & add SQL firewall rule
│   └── postprovision.sh          # Runs after `azd provision`
│
└── .devcontainer/
    └── devcontainer.json         # VS Code / GitHub Codespaces dev container

Connector Design

The connectors/ directory is the engine's data-access and routing layer. It contains nine modules:

unified_query.py — Query Router

Decides at runtime whether a user query should be:

  • Sent to the NL2SQL path (structured database query) — for questions like "How many orders were placed last month?"
  • Sent to the Chat/RAG path (unstructured document search) — for questions like "Summarise the Q3 earnings report"

datasource_registry.py — Data Source Registry

Central registry that maps data-source identifiers (azure_sql, sql_server, sqldw, cosmos_db, postgresql, mysql) to their connector classes, metadata, and runtime configuration.

dab_connector.py — Data API Builder Connector

After nl_to_sql.py generates a SQL statement, this connector:

  1. Introspects the Data API Builder REST/GraphQL endpoint
  2. Translates the raw SQL into a safe API call (no direct DB access)
  3. Returns structured JSON results

Database Connectors

Module Database
sql_connector.py Azure SQL Database / SQL Server
sqldw_connector.py Azure Synapse Analytics (SQL DW)
pg_connector.py Azure Database for PostgreSQL
mysql_connector.py Azure Database for MySQL
cosmos_connector.py Azure Cosmos DB

Each connector handles authentication (via DefaultAzureCredential), schema introspection, query execution, and result formatting for its respective database.

search_connector.py — Azure AI Search Connector

Wraps the Azure AI Search SDK to:

  1. Index new documents uploaded by the user
  2. Perform hybrid (keyword + vector) search for the RAG pipeline
  3. Return ranked chunks to the Chat backend for answer generation

Configuration Reference

See .env.sample for all required and optional environment variables.

Deployment

See docs/setup-guide.md for detailed deployment instructions including:

  • Azure subscription setup
  • Resource group and naming conventions
  • Role assignments
  • Post-deployment validation

Contributing

See CONTRIBUTING.md.

Acknowledgements

NL2SQL-ENGINE drew inspiration from Microsoft Azure open-source projects, and we thank their contributors for advancing the ecosystem:

License

This project is licensed under the MIT License.