Skip to content

[RoadMap] VIDEX 0.2.0 and Some 0.3.0 Vision #15

@kr11

Description

@kr11

We create this roadmap to track the upcoming features and improvements for VIDEX - The Disaggregated, Extensible Virtual Index Engine for What-If Analysis in MySQL. The focus of version 0.2.0 is on MySQL 8.0 adaptation, Standalone 5.7 support, and CI/CD improvements.

If you're interested in working on any of these issues, please respond to the related issue or create a new one.

🚀 Version 0.2.0

System Features

Algorithm Features

  • Sampling-based Statistic Collection (NDV by sampling instead of full table scan #12)
    Description: Develop efficient methods using sampling or modeling to generate statistical information without querying the original database. Requires knowledge of data sampling, single-column NDV estimation, histogram generation, and distribution fitting.

  • Multi-Column Cardinality Estimation Enhancement
    Description: Implement improved calculation of multi-column cardinality based on sampled data, with special optimization for correlated columns. Requires knowledge of data-driven cardinality estimation.

Documentation

  • Protocol & Interface &AI Documentation
    Description: Create comprehensive documentation for the RESTful layer to facilitate system developers implementing plugins or Statistic-servers, and for algorithm interfaces to help algorithm developers integrate new algorithms. (Add API Documentation for VIDEX RESTful Services #25 )

Benchmark and Performance

  • Comprehensive Testing on TPCH and JOB
    Description: Validate the completeness of VIDEX 8.0's support for MySQL 5.7 using standard benchmarks.

CI/CD and Developer Productivity

Code Refactoring & Documentation Enhancement

VIDEX 0.3.0

  • ✴️ Integrate core statistics logic into the plugin, reducing dependency on the external Python server for non-AI parts. ([Initiative] Integrate Core Statistics Logic into the Plugin #72)

  • VIDEX Web Tool
    Description: Create a web interface for direct database connection, information collection, query analysis, and index management.

🔮 Long Term & Exploratory (Future Versions)

VIDEX aims to provide a database virtual engine that accurately simulates query plans without requiring real data, supporting index recommendation and join order optimization. Currently supporting Percona, VIDEX will expand to MySQL/MariaDB and PostgreSQL. The algorithm layer will evolve from current heuristic algorithms to AI-boosted solutions, extending beyond cardinality and NDV estimation to simulate all database information.

Image

System Features

  • MariaDB Adaptation (Feature request: support MariaDB #1)
    Description: Collaborate with MariaDB to adapt VIDEX-optimizer for MariaDB. Requires MariaDB development and C++ expertise.

  • PostgreSQL Adaptation
    Description: Extend VIDEX's separated, AI-expandable architecture to PostgreSQL. This is a large task requiring PG development experience and knowledge of hypoPG and cardinality patches.

  • MySQL 5.7 Plugin Mode
    Description: Rewrite VIDEX-plugin to be compatible with MySQL 5.7 in plugin mode.

  • Alternative VIDEX-Statistic Implementations (Self-implemented Videx-stat-server With SpringBoot #8)
    Description: Develop VIDEX-Statistic-Server implementations in other languages (e.g., Java SpringBoot).

  • Mock Index_read Implementation
    Description: Address unsupported index_read interface by implementing mocking, virtual data returns, or direct original database access.

  • Virtual Histogram Support
    Description: Implement histogram mocking to simulate the impact of MySQL 8.0 histograms on query plans.

Algorithm Features

  • Dataless NDV Estimation
    Description: Retrain PLM4NDV on commercially available datasets for NDV prediction without data access.

  • Dataless Cardinality Estimation
    Description: Implement single-table cardinality estimation without requiring data access.

  • Index Cache Percentage Estimation
    Description: Develop models to predict index_cached_pct values accurately.

  • LLM-Native Dataless Database
    Description: Generate simulated statistical information and data distributions based solely on user language descriptions of data scale, range, and correlations, even before database creation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions