DeepSQLi is a project designed to detect SQL injection (SQLi) attacks using deep learning models. It provides a Prediction API, a Flask application that tokenizes and sequences SQL queries, and then predicts whether a query is an SQL injection.
The Prediction API handles both tokenization and prediction in a single endpoint, simplifying the detection workflow. It processes incoming SQL queries and determines if they contain SQL injection patterns.
This project can be integrated with GatewayD using the GatewayD SQL IDS/IPS plugin. The plugin acts as a middleware between clients and the database, intercepting SQL queries and sending them to the Prediction API for analysis. If a query is classified as malicious by the Prediction API, the plugin blocks the query; otherwise, it forwards the query to the database.
flowchart TD
Client <-- PostgreSQL wire protocol:15432 --> GatewayD
GatewayD <--> P["Prediction API"]
P --loads--> SM["SQLi Model"]
P --threshold: 80% --> D{Malicious query?}
D -->|No: send to| Database
D -->|Yes: terminate request| GatewayD
-
LSTM Models: The first two models are LSTM-based and are trained using different datasets:
sqli_model/1
: Trained on dataset v1 (for historical purposes, will remove in the future).sqli_model/2
: Trained on dataset v2 (for historical purposes, will remove in the future).
-
CNN-LSTM Model: The third model,
sqli_model/3
, is a hybrid CNN-LSTM model with a custom SQL tokenizer to improve performance on SQL injection patterns (recommended).
The Prediction API performs tokenization internally:
- Default Tokenizer (Models 1 and 2): Uses Keras’s
Tokenizer
for general tokenization. - Custom SQL Tokenizer (Model 3): A custom tokenizer designed to handle SQL syntax and injection-specific patterns.
To start the Prediction API with Docker Compose:
docker compose up --build -d
To stop and remove the containers:
docker compose down
docker build --no-cache --tag prediction-api:latest -f Dockerfile .
docker run --rm --name prediction-api -p 8000:8000 -d prediction-api:latest
Once the Prediction API is running, use the predict
endpoint to classify SQL queries.
curl 'http://localhost:8000/predict' -X POST -H 'Content-Type: application/json' \
--data-raw '{"query":"SELECT * FROM users WHERE id=1 OR 1=1;"}'
The response includes the prediction (1
for SQL injection, 0
for legitimate query) and confidence score. Note that the confidence score is only available for the CNN-LSTM model using the Prediction API.
{
"confidence": 0.9722,
}
We welcome contributions from everyone. Just open an issue or send us a pull request.
This plugin is licensed under the Affero General Public License v3.0.
This project is inspired by the SQL Injection Dataset project by Abu Syeed Sajid Ahmed and the A CNN-BiLSTM based Approach for Detection of SQL Injection Attacks conference paper by Neel Gandhi, et al.