AI-Driven Evaluation of SQL Tests: A Generative Large Language Model Approach

Evaluate student SQL submissions with the assistance of a Generative Large Language Model (LLM). This project augments traditional auto-grading by using schema-aware prompting and natural-language reasoning to assess correctness, categorize mistakes, and surface rich feedback and analytics.

Highlights

Schema-aware grading: the LLM reads a database schema description to contextualize evaluations.
Automated scoring: classify submissions as correct/incorrect and optionally assign partial credit.
Error categorization: map common failure modes (e.g., wrong join, aggregation mistakes) to interpretable labels.
Analytics: generate reports and plots (e.g., confusion matrices, classification metrics) to monitor performance.
Reproducible pipeline: version inputs, cache model outputs, and export results for downstream analysis.

How it works

Ingest assessment context
- Database schema description (e.g., a PDF or text file).
- A dataset of SQL submissions stored in an SQLite database.
Construct prompts
- Combine the schema summary, the task, and each submitted query into a grading prompt.
LLM-assisted evaluation
- Query a configurable OpenAI-compatible API to obtain judgments (labels, rationales, and optional feedback).
Aggregation and metrics
- Compare LLM predictions with ground truth (if available), compute classification metrics, and visualize results.
Reporting
- Export predictions, rationales, and evaluation artifacts for audit and iterative improvement.

Requirements

Python 3.9
A virtual environment managed with virtualenv
Dependencies:
- sqlite3 for database interactions
- tqdm for progress bars
- pandas for data manipulation
- pdfplumber for PDF parsing (if you are using the PDF database schema descriptions)
- openai Python package for interacting with OpenAI's API
- scikit-learn for classification metrics and confusion matrices

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
results		results
.gitignore		.gitignore
README.md		README.md
Submission copy.sqlite		Submission copy.sqlite
Submission.sqlite		Submission.sqlite
db_schema.pdf		db_schema.pdf
grade.ipynb		grade.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Driven Evaluation of SQL Tests: A Generative Large Language Model Approach

Highlights

How it works

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

markub3327/sql_automatic_grade_system

Folders and files

Latest commit

History

Repository files navigation

AI-Driven Evaluation of SQL Tests: A Generative Large Language Model Approach

Highlights

How it works

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages