Skip to content

Papers for database systems powered by artificial intelligence (machine learning for database)

Notifications You must be signed in to change notification settings

LumingSun/ML4DB-paper-list

Repository files navigation

[Paper List] AI4DB / ML4DB / Autonomous Database / Self-driving Database / 智能数据库 / 自治数据库

Paper list for database systems with artificial intelligence (machine learning, deep learning, reinforcement learning)

New papers keep coming, remember to Watch this repo if you are interested in this topic.

有关机器学习、神经网络、强化学习、自调优技术等在数据库系统中的应用的文章列表,列表持续更新中,记得按赞、分享、打开小铃铛!

Welcome to PR!

欢迎大家补充!

There are so many papers emerging about Text-To-SQL! Sadly I'm not an expert with the topic and can not tell the quality of the papers.
Looking forward to contributions (PR, comment, discussion) about Text-To-SQL!🫶

Table of Contents

System and Tutorial

  • SageDB: A Learned Database System (CIDR 2019)
  • Database Learning: Toward a Database that Becomes Smarter Every Time (SIGMOD 2017)
  • Self-Driving Database Management Systems (CIDR 2017)
  • Self-Driving : From General Purpose to Specialized DBMSs (Phd@PVLDB 2018)
  • Active Learning for ML Enhanced Database Systems (SIGMOD 2020)
  • Database Meets Artificial Intelligence: A Survey (TKDE 2020)
  • Self-driving database systems: a conceptual approach (Distributed and Parallel Databases 2020)
  • One Model to Rule them All: Towards Zero-Shot Learning for Databases (arXiv 2021)
  • UDO: Universal Database Optimization using Reinforcement Learning (arXiv 2021) Source Code
  • Towards a Benchmark for Learned Systems (SMDB workshop 2021)
  • A Unified Transferable Model for ML-Enhanced DBMS [Vision] (arXiv 2021)
  • AI Meets Database: AI4DB and DB4AI (SIGMOD 2021)
  • Expand your Training Limits! Generating Training Data for ML-based Data Management (SIGMOD 2021)
  • MB2: Decomposed Behavior Modeling for Self-Driving Database Management Systems (SIGMOD 2021)
  • Towards instance-optimized data systems (VLDB 2021 from Tim Kraska)
  • Make Your Database System Dream of Electric Sheep: Towards Self-Driving Operation (VLDB 2021 from Andy Pavlo)
  • openGauss: An Autonomous Database System (VLDB 2021 from Guoliang Li)
  • Experience-Enhanced Learning: One Size Still does not Fit All in Automatic Database Management (arXiv 2021)
  • Baihe: SysML Framework for AI-driven Databases (arXiv 2022)
  • Survey on Learnable Databases: A Machine Learning Perspective (Big Data Research 2021)
  • Database Optimizers in the Era of Learning (ICDE 2022)
  • Machine Learning for Data Management: A System View (ICDE 2022)
  • Tastes Great! Less Filling! High Performance and Accurate Training Data Collection for Self-Driving Database Management Systems (SIGMOD 2022)
  • SAM: Database Generation from Query Workload with Supervised Autoregressive Model (SIGMOD 2022) Source code
  • Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data (SIGMOD 2023) Source code
  • SageDB: An Instance-Optimized Data Analytics System (VLDB 2023)
  • Towards Building Autonomous Data Services on Azure (SIGMOD-Companion ’23)
  • Database Gyms (CIDR 2023)
  • Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes (VLDB 2023)
  • Machine Unlearning in Learned Databases: An Experimental Analysis (SIGMOD 2024) Source code
  • PilotScope: Steering Databases with Machine Learning Drivers (VLDB 2024) Source code
  • Machine Learning for Databases: Foundations, Paradigms, and Open problems (SIGMOD 2024)
  • NeurDB: An AI-powered Autonomous Data System (arXiv 2024)
  • GaussML: An End-to-End In-Database Machine Learning System (ICDE 2024)
  • NeurDB: On the Design and Implementation of an AI-powered Autonomous Database (arXiv 2024)
  • LLM for Data Management (VLDB 2024)
  • Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD (VLDB 2024)
  • The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-Actions (VLDB 2024)

Training Data Collection

  • Expand your Training Limits! Generating Training Data for ML-based Data Management (SIGMOD 2021)
  • DataFarm: Farm Your ML-based Query Optimizer's Food! - Human-Guided Training Data Generation -. (CIDR 2022)
  • Farming Your ML-based Query Optimizer's Food. (ICDE 2022, best demo award)
  • Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management Systems (VLDB 2024)

Data Access

Configuration Tuning

  • SARD: A statistical approach for ranking database tuning parameters (ICDEW, 2008)
  • Regularized Cost-Model Oblivious Database Tuning with Reinforcement Learning (2016)
  • Automatic Database Management System Tuning Through Large-scale Machine Learning (SIGMOD 2017)
  • The Case for Automatic Database Administration using Deep Reinforcement Learning ( 2018 ArXiv)
  • An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning (SIGMOD 2019)
  • External vs. Internal : An Essay on Machine Learning Agents for Autonomous Database Management Systems
  • QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning (VLDB 2019)
  • Optimizing Databases by Learning Hidden Parameters of Solid State Drives (VLDB 2019)
  • iBTune: Individualized Buffer Tuning for Large-scale Cloud Databases (VLDB 2019)
  • Black or White? How to Develop an AutoTuner for Memory-based Analytics (SIGMOD 2020)
  • Learning Efficient Parameter Server Synchronization Policies for Distributed SGD (ICLR 2020)
  • Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs (HotStorage 2020)
  • Dynamic Configuration Tuning of Working Database Management Systems (LifeTech 2020)
  • Adaptive Multi-Model Reinforcement Learning for Online Database Tuning (EDBT 2021)
  • An inquiry into machine learning-based automatic configuration tuning services on real-world database management systems (VLDB 2021)
  • The Case for NLP-Enhanced Database Tuning: Towards Tuning Tools that "Read the Manual" (VLDB 2021)
  • CGPTuner: a Contextual Gaussian Process Bandit Approach for the Automatic Tuning of IT Configurations Under Varying Workload Conditions (VLDB 2021)
  • ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases (SIGMOD 2021)
  • KML: Using Machine Learning to Improve Storage Systems (arXiv 2021)
  • Database Tuning using Natural Language Processing (SIGMOD Record 2021)
  • Towards Dynamic and Safe Configuration Tuning for Cloud Databases (SIGMOD 2022)
  • Automatic Performance Tuning for Distributed Data Stream Processing Systems (ICDE 2022)
  • Adaptive Code Learning for Spark Configuration Tuning (ICDE 2022)
  • DB-BERT: A Database Tuning Tool that "Reads the Manual" (SIGMOD 2022)
  • HUNTER: An Online Cloud Database Hybrid Tuning System for Personalized Requirements (SIGMOD 2022)
  • LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications (SIGMOD 2022)
  • Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation (VLDB 2022)
  • LlamaTune: Sample-Efficient DBMS Configuration Tuning (VLDB 2022)
  • BLUTune: Query-informed Multi-stage IBM Db2 Tuning via ML (CIKM 2022)
  • A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning (arXiv 2023)
  • Automatic Database Knob Tuning: A Survey (TKDE)
  • Deep learning based Auto Tuning for Database Management System (arXiv 2023)
  • KeenTune: Automated Tuning Tool for Cloud Application Performance Testing and Optimization (ISSTA 2023)
  • ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems (arXiv 2023)
  • GPTuner: A Manual-Reading Database Tuning System via GPT-Guided Bayesian Optimization (arXiv 2023)
  • An Eficient Transfer Learning Based Configuration Adviser for Database Tuning (VLDB 2024)
  • DB‑GPT: Large Language Model Meets Database (DSE 2024)
  • A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning (arXiv 2024)
  • TIE: Fast Experiment-driven ML-based Configuration Tuning for In-memory Data Analytics (IEEE Transactions on Computers)
  • VDTuner: Automated Performance Tuning for Vector Data Management Systems (ICDE 2024) Source code
  • Nautilus: A Benchmarking Platform for DBMS Knob Tuning (DEEM 2024) Source code
  • Is Large Language Model Good at Database Knob Tuning? A Comprehensive Experimental Evaluation (arXiv 2024)
  • CTuner: Automatic NoSQL Database Tuning with Causal Reinforcement Learning (Internetware 2024)
  • KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning (arXiv 2024)
  • KnobCF: Uncertainty-aware Knob Tuning (arXiv 2024)
  • Db2une: Tuning Under Pressure via Deep Learning (VLDB 2024)

Physical Design

  • Tiresias: Enabling Predictive Autonomous Storage and Indexing (VLDB 2022)

Learned structure

  • Stacked Filters: Learning to Filter by Structure (VLDB 2021)
  • LEA: A Learned Encoding Advisor for Column Stores (aiDM 2021)
  • Learning over Sets for Databases (EDBT 2024)

Index

Index Structure
  • Learning to hash for indexing big data - A survey (2016)
  • The Case for Learned Index Structures (SIGMOD 2018)
  • A-Tree: A Bounded Approximate Index Structure (2017)
  • FITing-Tree: A Data-aware Index Structure (SIGMOD 2019)
  • Learned Indexes for Dynamic Workloads (2019)
  • SOSD: A Benchmark for Learned Indexes (2019)
  • Learning Multi-dimensional Indexes (2019)
  • ALEX: An Updatable Adaptive Learned Index (SIGMOD 2020)
  • Effectively Learning Spatial Indices (VLDB 2020) GitHub Link
  • Stable Learned Bloom Filters for Data Streams (VLDB 2020)
  • START — Self-Tuning Adaptive Radix Tree (ICDEW 2020)
  • Learned Data Structures (2020)
  • RadixSpline: a single-pass learned index (aiDM2020)
  • The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries (EDBT 2020)
  • The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds (VLDB 2020)
  • A Tutorial on Learned Multi-dimensional Indexes (SIGSPATIAL 2020)
  • Why Are Learned Indexes So Effective? (ICML 2020)
  • Learned Indexes for a Google-scale Disk-based Database (arXiv 2020)
  • SIndex: A Scalable Learned Index for String Keys (APSys 2020)
  • XIndex: A Scalable Learned Index for Multicore Data Storage (PPoPP 2020)
  • Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads (VLDB 2021)
  • A Lazy Approach for Efficient Index Learning (2021)
  • The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data (arXiv 2021)
  • Spatial Interpolation-based Learned Index for Range and kNN Queries (arXiv 2021)
  • APEX: A High-Performance Learned Index on Persistent Memory (arXiv 2021)
  • RUSLI: Real-time Updatable Spline Learned Index (aiDM 2021)
  • PLEX: Towards Practical Learned Indexing (arXiv 2021)
  • SPRIG: A Learned Spatial Index for Range and kNN Queries (SSTD 2021)
  • Benchmarking Learned Indexes (VLDB 2021)
  • Updatable Learned Index with Precise Positions (VLDB 2021)
  • The Case for Learned In-Memory Joins (arXiv 2021)
  • Bounding the Last Mile: Efficient Learned String Indexing (arXiv 2021)
  • FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems (VLDB 2022)
  • The next 50 Years in Database Indexing or: The Case for Automatically Generated Index Structures (VLDB 2022)
  • The Concurrent Learned Indexes for Multicore Data Storage (Transactions on Storage 2022)
  • TONE: cutting tail-latency in learned indexes (CHEOPS 22)
  • A Learned Index for Exact Similarity Search in Metric Spaces (ArXiv 2022)
  • RW-tree: A Learned Workload-aware Framework for R-tree Construction (ICDE 2022)
  • The "AI+R"-tree: An Instance-optimized R-tree (MDM 2022)
  • LHI: A Learned Hamming Space Index Framework for Efficient Similarity Search (SIGMOD 2022)
  • Entropy Learned Hashing: 10X Faster Hashing with Controllable Uniformity (SIGMOD 2022)
  • Tuning Hierarchical Learned Indexes on Disk and Beyond (SIGMOD 2022)
  • FLIRT: A Fast Learned Index for Rolling Time frames (EDBT 2022)
  • Testing the Robustness of Learned Index Structures (arXiv 2022)
  • The Case for ML-Enhanced High-Dimensional Indexes (2022)
  • A Learned Index for Exact Similarity Search in Metric Spaces (arxiv 2022)
  • PLIN: A Persistent Learned Index for Non-Volatile Memory with High Performance and Instant Recovery (VLDB 2023)
  • A Data-aware Learned Index Scheme for Efficient Writes (ICPP 2022)
  • Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme (TKDE)
  • FILM: A Fully Learned Index for Larger-Than-Memory Databases (VLDB 2023)
  • WISK: A Workload-aware Learned Index for Spatial Keyword Queries (arXiv 2023)
  • Efficiently Learning Spatial Indices (ICDE 2023)
  • Cutting Learned Index into Pieces: An In-depth Inquiry into Updatable Learned Indexes (ICDE 2023)
  • DILI: A Distribution-Driven Learned Index (arXiv 2023)
  • Learned Index: A Comprehensive Experimental Evaluation (VLDB 2023)
  • LMSFC: A Novel Multidimensional Index based on Learned Monotonic Space Filling Curves (Extended Version) (arXiv 2023)
  • One stone, two birds: A lightweight multidimensional learned index with cardinality support (arXiv 2023)
  • A Simple Yet High-Performing On-disk Learned Index: Can We Have Our Cake and Eat it Too? (aiXiv 2023)
  • Fast Partitioned Learned Bloom Filter (arXiv 2023)
  • Efficient Index Learning via Model Reuse and Fine-tuning (ICDEW 2023)
  • COAX: Correlation-Aware Indexing (ICDEW 2023)
  • Learned Index with Dynamic e (openreview 2023)
  • Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads (arXiv 2023)
  • SALI: A Scalable Adaptive Learned Index Framework based on Probability Models (SIGMODE 2024)
  • Sieve: A Learned Data-Skipping Index for Data Analytics (VLDB 2023)
  • Demonstrating Waffle: A Self-driving Grid Index (VLDB Demo 2023)
  • Can LSH (Locality-Sensitive Hashing) Be Replaced by Neural Network? (arXiv 2023)
  • Workload-aware and Learned Z-Indexes (arXiv 2023)
  • AirIndex: Versatile Index Tuning Through Data and Storage (SIGMOD 2024)
  • A Fast Learned Key-Value Store for Concurrent and Distributed Systems (TKDE 2023)
  • When Learned Indexes Meet Persistent Memory: The Analysis and the Optimization (TKDE 2023)
  • PLATON: Top-down R-tree Packing with Learned Partition Policy (PACMMOD 2023)
  • A Learned Cuckoo Filter for Approximate Membership Queries over Variable-sized Sliding Windows on Data Streams (PACMMOD 2023)
  • WIPE: a Write-Optimized Learned Index for Persistent Memory (TACO 2023)
  • Algorithmic Complexity Attacks on Dynamic Learned Indexes (VLDB 2024)
  • A Fully On-disk Updatable Learned Index (ICDE 2024)
  • Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines (SIGMOD 2024)
  • AStore: Uniformed Adaptive Learned Index and Cache for RDMA-enabled Key-Value Store (TKDE 2024)
  • Cabin: A Compressed Adaptive Binned Scan Index (SIGMOD 2024)
  • SWIX: A Memory-efficient Sliding Window Learned Index (SIGMOD 2024)
  • Limousine: Blending Learned and Classical Indexes to Self-Design Larger-than-Memory Cloud Storage Engines (SIGMOD 2024)
  • A Survey of Learned Indexes for the Multi-dimensional Space (arXiv 2024)
  • Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid Construction (Proceedings of the ACM on Management of Data 2024)
  • Predicate caching: Query-driven secondary indexing for cloud data warehouses (SIGMOD 2024)
  • AStore: Uniformed Adaptive Learned Index and Cache for RDMA-Enabled Key-Value Store (TKDE 2024)
  • Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offs (SIGMOD 2024)
  • Making In-Memory Learned Indexes Efficient on Disk (SIGMOD 2024)
  • LeaderKV: Improving Read Performance of KV Stores via Learned Index and Decoupled KV Table (ICDE 2024)
  • Chameleon: Towards Update-Efficient Learned Indexing for Locally Skewed Data (ICDE 2024)
  • Revisiting Learned Index with Byte-addressable Persistent Storage (ICPP 2024)
  • UpLIF: An Updatable Self-Tuning Learned Index Framework (arXiv 2024)
  • LITS: An Optimized Learned Index for Strings (VLDB 2024)
  • Evaluating Learned Indexes for External-Memory Joins (arXiv 2024)
  • Learned Indexes with Distribution Smoothing via Virtual Points (arXiv 2024)
LSM-tree related
  • Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines (VLDB 2020)
  • From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees (OSDI 2020)
  • TridentKV: A Read-Optimized LSM-Tree Based KV Store via Adaptive Indexing and Space-Efficient Partitioning (TPDS 2022)
  • LearnedKV: Integrating LSM and Learned Index for Superior Performance on SSD (arXiv 2024)
  • CAMAL: Optimizing LSM-trees via Active Learning (arXiv 2024)
Index Recommendation
  • Index Selection in a Self- Adaptive Data Base Management System (SIGMOD 1976)
  • AutoAdmin 'What-if' Index Analysis Utility (SIGMOD 1998)
  • Self-Tuning Database Systems: A Decade of Progress (VLDB 2007)
  • AI Meets AI: Leveraging Query Executions to Improve Index Recommendations (SIGMOD 2019)
  • Automated Database Indexing using Model-free Reinforcement Learning (ICAPS 2020)
  • DRLindex: deep reinforcement learning index advisor for a cluster database (2020 Symposium on International Database Engineering & Applications)
  • Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms (VLDB 2020) GitHub Link
  • An Index Advisor Using Deep Reinforcement Learning (CIKM 2020) GitHub Link
  • DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees (ICDE 2021)
  • MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning (IDEAS 2021)
  • AutoIndex: An Incremental Index Management System for Dynamic Workloads (ICDE 2022) GitHub Link
  • SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning (EDBT 2022) GitHub Link
  • Indexer++: workload-aware online index tuning with transformers and reinforcement learning (ACM SIGAPP SAC, 2022)
  • Budget-aware Index Tuning with Reinforcement Learning (SIGMOD 2022)
  • ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning (SIGMOD 2022)
  • DISTILL: Low-Overhead Data-Driven Techniques for Filtering and Costing Indexes for Scalable Index Tuning (VLDB 2022)
  • SmartIndex: An Index Advisor with Learned Cost Estimator (CIKM 2022)
  • HMAB: self-driving hierarchy of bandits for integrated physical database design tuning (VLDB 2022)
  • Learned Index Benefits: Machine Learning Based Index Performance Estimation (VLDB 2023) GitHub Link
  • AIM: A practical approach to automated index management for SQL databases (ICDE 2023)
  • Updatable Learned Indexes Meet Disk-Resident DBMS - From Evaluations to Design Choices (SIGMOD 2023)
  • Index Tuning with Machine Learning on Quantum Computers for Large-Scale Database Applications (AIDB@VLDB 2023)
  • A Data-Driven Index Recommendation System for Slow Queries (CIKM 2023)
  • ML-Powered Index Tuning: An Overview of Recent Progress and Open Challenges (arXiv 2023)
  • Robustness of Updatable Learning-based Index Advisors against Poisoning Attack (SIGMOD 2024)
  • Refactoring Index Tuning Process with Benefit Estimation (VLDB 2024) GitHub Link
  • Leveraging Dynamic and Heterogeneous Workload Knowledge to Boost the Performance of Index Advisors (VLDB 2024) GitHub Link
  • MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization (ICDE 2024)
  • TRAP: Tailored Robustness Assessment for Index Advisors via Adversarial Perturbation (ICDE 2024)
  • Online Index Recommendation for Slow Queries (ICDE 2024)
  • Automatic Index Tuning: A Survey (TKDE)
  • Breaking It Down: An In-Depth Study of Index Advisors (VLDB 2024)
  • Can Uncertainty Quantification Enable Better Learning-based Index Tuning? (arXiv 2024)

Materialized View

  • Automatic View Generation with Deep Learning and Reinforcement Learning (ICDE 2020)
  • An Autonomous Materialized View Management System with Deep Reinforcement Learning (ICDE 2021)
  • A Technical Report on Dynamic Materialized View Management using Graph Neural Network
  • HMAB: self-driving hierarchy of bandits for integrated physical database design tuning (VLDB 2022)
  • AutoView: An Autonomous Materialized View Management System with Encoder-Reducer (TKDE 2022)
  • Dynamic Materialized View Management using Graph Neural Network (ICDE 2023)

Schema & Partition

  • Schism: a Workload-Driven Approach to Database Replication and Partitioning (VLDB 2010)
  • Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems (SIGMOD 2012)
  • Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions (2016 Transactions on Parallel and distributed systems)
  • GridFormation : Towards Self-Driven Online Data Partitioning using Reinforcement Learning (aiDM@SIGMOD 2018)
  • Learning a Partitioning Advisor with Deep Reinforcement Learning (2019)
  • Qd-tree: Learning Data Layouts for Big Data Analytics (SIGMOD 2020)
  • A Genetic Optimization Physical Planner for Big Data Warehouses (2020)
  • Lachesis: Automated Partitioning for UDF-Centric Analytics (VLDB 2021)
  • Instance-Optimized Data Layouts for Cloud Analytics Workloads (SIGMOD 2021)
  • Jigsaw: A Data Storage and Query Processing Engine for Irregular Table Partitioning (SIGMOD 2021)
  • Dalton: Learned Partitioning for Distributed Data Streams (VLDB 2023)
  • Grep: A Graph Learning Based Database Partitioning System (Management of Data 2023)
  • Learned spatial data partitioning (arXiv 2023)
  • Relax and Let the Database Do the Partitioning Online (BIRTE 2011)
  • SWORD: Scalable Workload-Aware Data Placement for Transactional Workloads (EDBT 2013)
  • Online Data Partitioning in Distributed Database Systems (EDBT 2015)
  • A Robust Partitioning Scheme for Ad-Hoc Query Workloads (SOCC 2017)
  • Automated multidimensional data layouts in Amazon Redshift (SIGMOD 2024)
  • Oasis: An Optimal Disjoint Segmented Learned Range Filter (VLDB 2024)

Cache related

  • A Learned Cache Eviction Framework with Minimal Overhead (arXiv 2023)

Workload

Resource Management and Auto-scaling

  • Automated Demand-driven Resource Scaling in Relational Database-as-a-Service (SIGMOD 2016)
  • Database Workload Capacity Planning using Time Series Analysis and Machine Learning (SIGMOD 2020)
  • Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation (VLDB 2020)
  • FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices (OSDI 2020)
  • Optimal Resource Allocation for Serverless Queries (arXiv 2021)
  • sinan: ml-based and qos-aware resource management for cloud microservices (ASPLOS 2021)
  • Towards Optimal Resource Allocation for Big Data Analytics (EDBT 2022)
  • Tenant Placement in Over-subscribed Database-as-a-Service Clusters (VLDB 2022)
  • Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing (arXiv 2022)
  • SIMPPO: a scalable and incremental online learning framework for serverless resource management (SoCC 2022)
  • SUFS: A Generic Storage Usage Forecasting Service Through Adaptive Ensemble Learning (ICDE 2023)
  • Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift (SIGMOD-Companion ’23)
  • SeLeP: Learning Based Semantic Prefetching for Exploratory Database Workloads (arXiv 2023)
  • Intelligent scaling in Amazon Redshift (SIGMOD 2024)
  • Forecasting Algorithms for Intelligent Resource Scaling: An Experimental Analysis (Socc 2024)

Performance Diagnosis and Modeling

  • Performance and resource modeling in highly-concurrent OLTP workloads (SIGMOD 2013)
  • DBSherlock: A Performance Diagnostic Tool for Transactional Databases (SIGMOD 2016)
  • A Top-Down Approach to Achieving Performance Predictability in Database Systems (SIGMOD 2017)
  • Diagnosing Root Causes of Intermittent Slow Queries in Cloud Databases (VLDB 2020)
  • Workload-Aware Performance Tuning for Autonomous DBMSs (ICDE 2021)
  • Sage: Practical and Scalable ML-Driven Performance Debugging in Microservices (ASPLOS 2021)
  • D-Bot: Database Diagnosis System using Large Language Models (arXiv 2023)
  • Modeling Shifting Workloads for Learned Database Systems (SIGMOD 2024)

Workload Shift Detection

  • Towards workload shift detection and prediction for autonomic databases (CIKM 2007)
  • Consistent on-line classification of dbs workload events (CIKM 2009)
  • On predictive modeling for optimizing transaction execution in parallel OLTP systems (VLDB 2011)

Workload Characterization & Forecasting

  • On Workload Characterization of Relational Database Environments (TSE 1992)
  • Workload Models for Autonomic Database Management Systems (International Conference on Autonomic and Autonomous Systems 2006)
  • Workload characterization and prediction in the cloud: A multiple time series approach (APNOMS 2012)
  • Query-based Workload Forecasting for Self-Driving Database Management Systems (SIGMOD 2018)
  • Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics (Arxiv 2018)
  • Database Workload Characterization with Query Plan Encoders (arXiv 2021)
  • Explaining Inference Queries with Bayesian Optimization (VLDB 2021)
  • Statistical Schema Learning with Occam's Razor (SIGMOD 2022)
  • Intelligent Automated Workload Analysis for Database Replatforming (SIGMOD 2022)
  • Stitcher: Learned Workload Synthesis from Historical Performance Footprints (EDBT 2022)
  • DBAugur: An Adversarial-based Trend Forecasting System for Diversified Workloads (ICDE 2023)
  • An Efficient Online Prediction of Host Workloads Using Pruned GRU Neural Nets (arXiv 2023)
  • Uncertainty-Aware Workload Prediction in Cloud Computing (arXiv 2023)
  • Real-Time Workload Pattern Analysis for Large-Scale Cloud Databases (VLDB 2023)
  • Robust Auto-Scaling with Probabilistic Workload Forecasting for Cloud Databases (ICDE 2024)
  • QPSEncoder: A Database Workload Encoder with Deep Learning (DEXA 2024)

Query Optimization

  • Learned Query Optimizer: What is New and What is Next (SIGMOD 2024)
  • GLO: Towards Generalized Learned Query Optimization (ICDE 2024)
  • Robust Query Optimization in the Era of Machine Learning: State-of-the-Art and Future Directions (ICDE 2024)
  • Presto’s History-based Query Optimizer (VLDB 2024)

Query Rewrite

  • Sia: Optimizing Queries using Learned Predicates (SIGMOD 2021)
  • A Learned Query Rewrite System using Monte Carlo Tree Search (VLDB 2022)
  • WeTune: Automatic Discovery and Verification of Query Rewrite Rules (SIGMOD 2022)
  • A Learned Query Rewrite System (VLDB 2023)
  • Query Rewriting via Large Language Models (arXiv 2024)
  • LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency (arXiv 2024) GitHub

Cardinality Estimation

  • Are We Ready For Learned Cardinality Estimation? (VLDB 2021) GitHub Link
  • A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation (SIGMOD 2021)
  • LATEST: Learning-Assisted Selectivity Estimation Over Spatio-Textual Streams (ICDE 2021)
  • Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation (VLDB 2021)
  • Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation (arXiv 2021) GitHub Link
  • Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation (VLDB 2022)
  • Glue: Adaptively Merging Single Table Cardinality to Estimate Join Query Size (aiXiv 2021)
  • Unsupervised Selectivity Estimation by Integrating Gaussian Mixture Models and an Autoregressive Model (EDBT 2022)
  • Selectivity Functions of Range Queries are Learnable (SIGMOD 2022)
  • Prediction Intervals for Learned Cardinality Estimation: An Experimental Evaluation (ICDE 2022)
  • Learned Cardinality Estimation: An In-depth Study (SIGMOD 2022)
  • FactorJoin: A New Cardinality Estimation Framework for Join Queries (SIGMOD 2023)
  • AutoCE: An Accurate and Efficient Model Advisor for Learned Cardinality Estimation (ICDE 2023)
  • Couper: Memory-Efficient Cardinality Estimation under Unbalanced Distribution (ICDE 2023)
  • ALECE: An Attention-based Learned Cardinality Estimator for SPJ Queries on Dynamic Workloads (VLDB 2023)
  • Advanced Dataset Discovery: When Multi-Query-Dataset Cardinality Estimation Matters (aiXiv 2024)
  • Sample-Efficient Cardinality Estimation Using Geometric Deep Learning (VLDB 2024)
  • PRICE: A Pretrained Model for Cross-Database Cardinality Estimation (arXiv 2024) GitHub Lint
  • ByteCard: Enhancing ByteDance's Data Warehouse with Learned Cardinality Estimation (SIGMOD 2024)
  • ASM in Action: Fast and Practical Learned Cardinality Estimation (SIGMOD 2024)
  • CardBench: A Benchmark for Learned Cardinality Estimation in Relational Database (arXiv 2024)
  • Duet: efficient and scalable hybriD neUral rElation undersTanding. (ICDE 2024)

Data-based

  • Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation (SIGMOD 2015)
  • Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models (VLDB 2017)
  • DeepDB: Learn from Data, not from Queries! (VLDB 2020) GitHub Link
  • Deep Unsupervised Cardinality Estimation (VLDB 2019)
  • Multi-Attribute Selectivity Estimation Using Deep Learning (arXiv 2019)
  • Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries (SIGMOD 2020)
  • NeuroCard: One Cardinality Estimator for All Tables (VLDB 2020) GitHub Link
  • Learning to Sample: Counting with Complex Queries (VLDB 2020)
  • Selectivity estimation using probabilistic models (SIGMOD 2001)
  • Lightweight graphical models for selectivity estimation without independence assumptions (VLDB 2011)
  • Efficiently adapting graphical models for selectivity estimation (VLDB 2013)
  • An Approach Based on Bayesian Networks for Query Selectivity Estimation (DASFAA 2019)
  • BayesCard: A Unified Bayesian Framework for Cardinality Estimation (arXiv 2020) GitHub Link
  • Online Sketch-based Query Optimization (arXiv 2021)
  • LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs (arXiv 2021)
  • LHist: Towards Learning Multi-dimensional Histogram for Massive Spatial Data (ICDE 2021)
  • FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation (VLDB 2021) GitHub Link
  • Astrid: Accurate Selectivity Estimation for String Predicates using Deep Learning (VLDB 2021)
  • FACE: A Normalizing Flow based Cardinality Estimator (VLDB 2022)
  • Pre-training Summarization Models of Structured Datasets for Cardinality Estimation (VLDB 2022)
  • Cardinality Estimation of Approximate Substring Queries using Deep Learning (VLDB 2022)
  • Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (Proceedings of the ACM on Management of Data)
  • Cardinality estimation with smoothing autoregressive models (WWW 2023)
  • Cardinality estimation using normalizing flow (VLDBJ 2023)
  • LPLM: A Neural Language Model for Cardinality Estimation of LIKE-Queries (SIGMOD 2024)
  • ASM: Harmonizing Autoregressive Model, Sampling, and Multi-dimensional Statistics Merging for Cardinality Estimation (SIGMOD 2024)
  • ASM in Action: Fast and Practical Learned Cardinality Estimation (SIGMOD 2024)
  • SAFE: Sampling-Assisted Fast Learned Cardinality Estimation for Dynamic Spatial Data (DEXA 2024)
  • Updateable Data-Driven Cardinality Estimator with Bounded Q-error (arXiv 2024)
  • Grid-AR: A Grid–based Booster for Learned Cardinality Estimation and Range Joins (arXiv 2024)

Query-based

  • Adaptive selectivity estimation using query feedback (SIGMOD 1994)
  • Selectivity Estimation in Extensible Databases -A Neural Network Approach (VLDB 1998)
  • Effective query size estimation using neural networks. (Applied Intelligence 2002)
  • LEO - DB2's LEarning optimizer (VLDB 2011)
  • A Black-Box Approach to Query Cardinality Estimation (CIDR 07)
  • Cardinality Estimation Using Neural Networks (2015)
  • Towards a learning optimizer for shared clouds (VLDB 2018)
  • Learning State Representations for Query Optimization with Deep Reinforcement Learning (DEEM@SIGMOD2018)
  • Learned Cardinalities: Estimating Correlated Joins with Deep Learning (CIDR2019)GitHub Link
  • Estimating Cardinalities with Deep Sketches (SIGMOD 2019) GitHub Link
  • Selectivity estimation for range predicates using lightweight models (VLDB 2019)
  • (Review) An Empirical Analysis of Deep Learning for Cardinality Estimation (arXiv 2019)
  • Flexible Operator Embeddings via Deep Learning (arXiv 2019)
  • Improved Cardinality Estimation by Learning Queries Containment Rates (EDBT 2020)
  • NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT (2020)
  • QuickSel: Quick Selectivity Learning with Mixture Models (SIGMOD 2020)
  • Efficiently Approximating Selectivity Functions using Low Overhead Regression Models (VLDB 2020)
  • Learned Cardinality Estimation for Similarity Queries (SIGMOD 2021)
  • Uncertainty-aware Cardinality Estimation by Neural Network Gaussian Process (arXiv 2021)
  • Flow-Loss: Learning Cardinality Estimates That Matter (VLDB 2021)
  • Warper: Efficiently Adapting Learned Cardinality Estimators to Data and Workload Drifts (SIGMOD 2022)
  • Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian Process for Approximate Complex Event Processing (SIGMOD 2022)
  • Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation (EDBT 2023)
  • Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation (SIGMOD 2023)
  • Robust Query Driven Cardinality Estimation under Changing Workloads (VLDB 2023)
  • Learned Probing Cardinality Estimation for High-Dimensional Approximate NN Search (ICDE 2023)
  • CEDA: Learned Cardinality Estimation with Domain Adaptation (VLDB 2023)
  • Efficient Cardinality and Cost Estimation with Bidirectional Compressor-based Ensemble Learning (arXiv 2023)
  • Adding Domain Knowledge to Query-Driven Learned Databases (arXiv 2023)
  • PACE: Poisoning Attacks on Learned Cardinality Estimation (SIGMOD 2024)
  • Sample-Efficient Cardinality Estimation Using Geometric Deep Learning (VLDB 2024)
  • Automating localized learning for cardinality estimation based on XGBoost (Knowledge and Information Systems)

Cost Estimation

Single Query

  • Statistical learning techniques for costing XML queries (VLDB 2005)
  • Predicting multiple metrics for queries: Better decisions enabled by machine learning (icde 2009)
  • The Case for Predictive Database Systems : Opportunities and Challenges (CIDR 2011)
  • Learning-based query performance modeling and prediction (ICDE 2012)
  • Robust estimation of resource consumption for SQL queries using statistical techniques (VLDB 2012)
  • Learning-based SPARQL query performance modeling and prediction (WWW 2017)
  • Plan-Structured Deep Neural Network Models for Query Performance Prediction (arXiv 2019)
  • An End-to-End Learning-based Cost Estimator (arXiv 2019)(VLDB 2019)
  • Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings (2020)
  • DBMS Fitting: Why should we learn what we already know? (CIDR 2020)
  • A Note On Operator-Level Query Execution Cost Modeling (2020)
  • ML-based Cross-Platform Query Optimization (ICDE 2020)
  • Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction (VLDB 2022)
  • Efficient Learning with Pseudo Labels for Query Cost Estimation (CIKM 2022)
  • gCBO: A Cost-based Optimizer for Graph Databases (CIKM 2022)
  • QueryFormer: A Tree Transformer Model for Query Plan Representation (VLDB 2022)
  • BASE: Bridging the Gap between Cost and Latency for Query Optimization (VLDB 2023)
  • Rethinking Learned Cost Models: Why Start from Scratch? (PACMMOD 2023)
  • Budget-aware Query Tuning: An AutoML Perspective (arXiv 2024)
  • OS Pre-trained Transformer: Predicting Query Latencies across Changing System Contexts GitHub Link
  • Precision Meets Resilience: Cross-Database Generalization with Uncertainty Quantification for Robust Cost Estimation (CIKM 2024)

Concurrent

  • PQR: Predicting query execution times for autonomous workload management (ICAC 2008)
  • Performance Prediction for Concurrent Database Workloads (SIGMOD 2011)
  • Predicting completion times of batch query workloads using interaction-aware models and simulation(EDBT 2011)
  • Interaction-aware scheduling of report-generation workloads (VLDB 2011) (有调度策略)
  • Towards predicting query execution time for concurrent and dynamic database workloads (not machine learning) (VLDB 2014)
  • Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction (EDBT 2014)
  • Query Performance Prediction for Concurrent Queries using Graph Embedding (VLDB 2020)
  • Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload (SIGMOD 2021)
  • A Resource-Aware Deep Cost Model for Big Data Query Processing (ICDE 2022)
  • Stage: Query Execution Time Prediction in Amazon Redshif (SIGMOD 2024)

Join Optimization

  • Adaptive Optimization of Very Large Join Queries (SIGMOD 2018) (Not machine learning
  • Deep Reinforcement Learning for Join Order Enumeration (aiDM@SIGMOD 2018)
  • Learning to Optimize Join Queries With Deep Reinforcement Learning (ArXiv)
  • Reinforcement Learning with Tree-LSTM for Join Order Selection (ICDE 2020)
  • Research Challenges in Deep Reinforcement Learning-based Join Query Optimization (aiDM 2020)
  • Efficient Join Order Selection Learning with Graph-based Representation (KDD 2022)
  • SOAR:A Learned Join Order Selector with Graph Attention Mechanism (IJCNN 2022)
  • Query Join Order Optimization Method Based on Dynamic Double Deep Q-Network (Electronics 2023)
  • Coral: federated query join order optimization based on deep reinforcement learning (WWW 2023)
  • JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning (arXiv 2023)
  • Join Order Selection with Deep Reinforcement Learning: Fundamentals, Techniques, and Challenges (VLDB 2023)
  • Sub-optimal Join Order Identification with L1-error (SIGMOD 2024)
  • TESSM: Tree-based Selective State Space Models for Efficient Join Order Selection Learning (CIKM 2024)

Query Plan

  • Plan Selection Based on Query Clustering (VLDB 2002)
  • Cost-Based Query Optimization via AI Planning (AAAI 2014)
  • Sampling-Based Query Re-Optimization (SIGMOD 2016)
  • Learning State Representations for Query Optimization with Deep Reinforcement Learning (DEEM@SIGMOD2018)
  • Towards a Hands-Free Query Optimizer through Deep Learning (CIDR 2019)
  • Neo: A Learned Query Optimizer (VLDB 2019)
  • Bao: Learning to Steer Query Optimizers (2020)
  • ML-based Cross-Platform Query Optimization (ICDE 2020)
  • Learning-based Declarative Query Optimization (2021)
  • Bao: Making Learned Query Optimization Practical (SIGMOD 2021 Best Paper!) Doc GitHub Link
  • Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft (2021)
  • Steering Query Optimizers: A Practical Take on Big Data Workloads (SIGMOD 2021)
  • A Unified Transferable Model for ML-Enhanced DBMS (CIDR 2021)
  • Balsa: Learning a Query Optimizer Without Expert Demonstrations (SIGMOD 2022)
  • Leveraging Query Logs and Machine Learning for Parametric Query Optimization (VLDB 2022)
  • Deploying a Steered Query Optimizer in Production at Microsoft (SIGMOD 2022)
  • Building Learned Federated Query Optimizers (VLDB 2022 PhD Workshop)
  • Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection (VLDB 2022)
  • Learn What Really Matters: A Learning-to-Rank Approach for ML-based Query Optimization (BTW 2023)
  • Lero: A Learning-to-Rank Query Optimizer (VLDB 2023) GitHub Link
  • Learned Query Superoptimization (arXiv 2023)
  • Kepler: Robust Learning for Faster Parametric Query Optimization (SIGMOD 2023)
  • LOGER: A Learned Optimizer towards Generating Efficient and Robust Query Execution Plans (VLDB 2023)
  • BitE : Accelerating Learned Query Optimization in a Mixed-Workload Environment (arXiv 2023)
  • Reinforcement Learning-based SPARQL Join Ordering Optimizer
  • LEON: A New Framework for ML-Aided Query Optimization (VLDB 2023)
  • AutoSteer: Learned Query Optimization for Any SQL Database (VLDB 2023)
  • FASTgres: Making Learned Query Optimizer Hinting Effective (VLDB 2023)
  • Simple Adaptive Query Processing vs. Learned Query Optimizers: Observations and Analysis (VLDB 2023)
  • QO-Insight: Inspecting Steered Query Optimizer (VLDB Demo 2023)
  • QPSeeker: An Efficient Neural Planner combining both data and queries through Variational Inference (EDBT 2024)
  • FOSS: A Self-Learned Doctor for Query Optimizer (arXiv 2023)
  • Lemo: A Cache-Enhanced Learned Optimizer for Concurrent Queries (PACMMOD 2023)
  • A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB Studies (VLDB 2024)
  • Learned Optimizer for Online Approximate Query Processing in Data Exploration (TKDE 2024)
  • A learning-based framework for spatial join processing: estimation, optimization and tuning (VLDB 2024)
  • Roq: Robust Query Optimization Based on a Risk-aware Learned Cost Model (arXiv 2024)
  • PLAQUE: Automated Predicate Learning at Query Time (SIGMOD 2024)
  • Eraser: Eliminating Performance Regression on Learned Query Optimizer (VLDB 2024)
  • Low Rank Approximation for Learned Query Optimization (aiDM 2024)
  • Lero: applying learning-to-rank in query optimizer (VLDB 2024)
  • RobOpt: A Tool for Robust Workload Optimization Based on Uncertainty-Aware Machine Learning (SIGMOD 2024)
  • A Novel Technique for Query Plan Representation Based on Graph Neural (Big Data Analytics and Knowledge Discovery)
  • An Exploratory Case Study of Query Plan Representations (aiXiv 2024)
  • JAPO: learning join and pushdown order for cloud-native join optimization (Frontiers of Computer Science 2024)
  • Steering the PostgreSQL query optimizer using hinting: State-Of-The-Art and open challenges (35th GI-Workshop on Foundations of Databases)
  • PARQO: Penalty-Aware Robust Plan Selection in Query Optimization (arXiv 2024)

Query Execution

Sort

  • The Case for a Learned Sorting Algorithm (SIGMOD 2020)
  • Defeating duplicates: A re-design of the LearnedSort algorithm (aiXiv 2021)
  • Towards Parallel Learned Sorting (arXiv 2022)

Join

  • SkinnerDB : Regret-Bounded Query Evaluation via Reinforcement Learning (VLDB 2018)
  • The Case for Learned In-Memory Joins (arXiv 2021)

Adaptive Query Processing

  • Eddies: Continuously adaptive query processing. (SIGMOD 2000)
  • Micro adaptivity in Vectorwise (SIGMOD 2013)
  • Cuttlefish: A Lightweight Primitive for Adaptive Query Processing (2018)
  • Scalable Multi-Query Execution using Reinforcement Learning (SIGMOD 2021)

Approximate Query Processing

  • DBEST: Revisiting approximate query processing engines with machine learning models (SIGMOD 2019)
  • LAQP: Learning-based Approximate Query Processing (2020)
  • Approximate Query Processing for Data Exploration using Deep Generative Models (ICDE 2020)
  • ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning (2020)
  • Approximate Query Processing for Group-By Queries based on Conditional Generative Models (2021)
  • Learned Approximate Query Processing: Make it Light, Accurate and Fast (CIDR 2021)
  • NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks (SIGMOD 2023)
  • Exploiting Machine Learning Models for Approximate Query Processing (Big Data 2022)
  • Tuple Bubbles: Learned Tuple Representations for Tunable Approximate Query Processing (aiDM 2023)
  • Learning-Based Sample Tuning for Approximate Query Processing in Interactive Data Exploration (TKDE 2024)

Sheduling

  • Workload management for cloud databases via machine learning (ICDE 2016 WiseDB)
  • A learning-based service for cost and performance management of cloud databases (ICDEW 2017)(short version for WiSeDB)
  • WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases (2016 VLDB)
  • Learning Scheduling Algorithms for Data Processing Clusters (SIGCOMM 2019)
  • CrocodileDB: Efficient Database Execution through Intelligent Deferment (CIDT 2020)
  • Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning (2020)
  • Self-Tuning Query Scheduling for Analytical Workloads (SIGMOD 2021)
  • LSched: A Workload-Aware Learned Query Scheduler for Analytical Database Systems (SIGMOD 2022)
  • DBMLSched: Scheduling In-database Machine Learning Jobs (AIDB@VLDB 2023)
  • Learning Interpretable Scheduling Algorithms for Data Processing Clusters (arXiv 2024)

(transaction 👇)

  • Scheduling OLTP transactions via learned abort prediction (aiDM@SIGMOD 2019)
  • Scheduling OLTP Transactions via Machine Learning (2019)
  • Polyjuice: High-Performance Transactions via Learned Concurrency Control (OSDI 2021)

Text-to-SQL

  • SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning (arXiv 2017)
  • An End-to-end Neural Natural Language Interface for Databases (arXiv 2018)
  • SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task (EMNLP 2018)
  • Robust Text-to-SQL Generation with Execution-Guided Decoding (arXiv 2018)
  • Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation (ACL 2019)
  • Global Reasoning over Database Structures for Text-to-SQL Parsing (EMNLP 2019)
  • Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing (ACL 2019)
  • Natural language to SQL: Where are we today? (VLDB 2020)
  • Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing (EMNLP Findings 2020)
  • RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (ACL 2020)
  • Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing (ACL 2020)
  • TAPAS: Weakly Supervised Table Parsing via Pre-training (ACL 2020)
  • TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data (ACL 2020)
  • Semantic Evaluation for Text-to-SQL with Distilled Test Suites (EMNLP 2020)
  • SMBOP: Semi-autoregressive Bottom-up Semantic Parsing (NAACL-HLT 2021)
  • Natural SQL: Making SQL Easier to Infer from Natural Language Specifications (EMNLP Findings 2021)
  • LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations (ACL 2021)
  • Structure-Grounded Pretraining for Text-to-SQL (NAACL-HLT 2021)
  • GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing (ICLR 2021)
  • SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL (NeurIPS 2021)
  • GP: Context-free Grammar Pre-training for Text-to-SQL Parsers (arXiv 2021)
  • Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL (arXiv 2021)
  • On Robustness of Neural Semantic Parsers (EACL 2021)
  • MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations (VLDB 2021)
  • PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models (EMNLP 2021)
  • Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training (AAAI 2021)
  • Towards robustness of text-to-sql models against synonym substitution (ACL 2021)
  • Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization (EMNLP 2021)
  • CodexDB: Generating Code for Processing SQL Queries using GPT-3 Codex (arXiv 2022)
  • S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers (arXiv 2022)
  • UNIFIEDSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models (EMNLP 2022)
  • RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL (EMNLP 2022)
  • UNISAR: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL (arXiv 2022)
  • N-Best Hypotheses Reranking for Text-To-SQL Systems (SLT 2022)
  • Semantic Enhanced Text-to-SQL Parsing via Iteratively Learning Schema Linking Graph (KDD 2022)
  • SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising (NAACL-HLT Findings 2022)
  • STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing (EMNLP Findings 2022)
  • Towards Generalizable and Robust Text-to-SQL Parsing (EMNLP Findings 2022)
  • SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers (COLING 2022)
  • Towards robustness of text-to-sql models against natural and realistic adversarial table perturbation (ACL 2022)
  • Evaluating the Text-to-SQL Capabilities of Large Language Models (arXiv 2022)
  • A survey on deep learning approaches for text-to-SQL (VLDBJ 2023)
  • GAR: A Generate-and-Rank Approach for Natural Language to SQL Translation (ICDE 2023)
  • Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing (arXiv 2023)
  • Improving Generalization in Language Model-Based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-Based Techniques (arXiv 2023)
  • Exploring Chain-of-Thought Style Prompting for Text-to-SQL (arXiv 2023)
  • Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning (SIGMOD 2023)
  • Multitask pretraining with structured knowledge for text-to-SQL generation (ACL 2023)
  • Demonstrating GPT-DB: Generating Query-Specific and Customizable Code for SQL Processing with GPT-4 (VLDB Demo 2023)
  • Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing (AAAI 2023)
  • SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (arXiv 2023)
  • Teaching Large Language Models to Self-Debug (arXiv 2023)
  • A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability (arXiv 2023)
  • DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction (arXiv 2023)
  • C3: Zero-shot Text-to-SQL with ChatGPT (arXiv 2023)
  • RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL (AAAI 2023)
  • Dr.spider: A Diagnostic Evaluation Benchmark Towards Text-To-Sql Robustness (ICLR 2023)
  • Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL (arXiv 2024)
  • Natural language to SQL Resource repo
  • Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL (VLDB 2024)
  • Awesome-Text2SQL Resource repo

SQL Related

  • Query2Vec (ArXiv)
  • Facilitating SQL Query Composition and Analysis (ArXiv 2020)
  • From Natural Language Processing to Neural Databases (VLDB 2021)
  • BERT Meets Relational DB: Contextual Representations of Relational Databases
  • LearnedSQLGen: Constraint-aware SQL Generation using Reinforcement Learning (SIGMOD 2022)
  • PreQR: Pre-training Representation for SQL Understanding (SIGMDO 2022)
  • From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management (VLDB 2022)
  • Query Generation based on Generative Adversarial Networks (arXiv 2023)

Stargazers over time

Stargazers over time

About

Papers for database systems powered by artificial intelligence (machine learning for database)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published