Skip to content

Conversation

@YoungHypo
Copy link
Contributor

@YoungHypo YoungHypo commented Oct 30, 2025

Pull Request Summary

This PR adds proper handling for ENUM and TIMESTAMP data types in the Videx histogram module, addressing issues where these types were not correctly processed during histogram bucket generation and selectivity calculation.

Related Issues

Resolves: #55

Changes Made

  1. Type Conversion (convert_str_by_type):

    • Added enum to string-type handling list
    • Added timestamp to datetime-type handling list
  2. Histogram Position Calculation (find_nearest_key_pos):

    • Extended string comparison logic to support enum types
    • Extended datetime offset calculation to support timestamp types
  3. SQL Value Formatting (_format_value_by_type_in_sql):

    • Added ENUM handling (format with quotes and escape)
    • Added TIMESTAMP handling (format with quotes)
  4. Bucket Boundary Generation (_get_uniform_buckets):

    • Added ENUM and TIMESTAMP to sampling-based boundary generation

How to test ENUM and TIMESTAMP

generate_bulk_data_simple.py

source_schema_simple.sql

1.generate testing data

python3 generate_bulk_data_simple.py

2. import data to database

mysql -h 127.0.0.1 -u root -p < source_schema_simple.sql
mysql -h 127.0.0.1 -u root -p < bulk_insert_data.sql

3. start videx_server and collect data

4. test timestamp

mysql -h127.0.0.1 -P13308 -uvidex -ppassword
USE videx_ddw_test_src;
EXPLAIN SELECT * FROM orders WHERE created_at >= '2025-01-01 00:00:00' AND created_at < '2025-04-01 00:00:00';

img_v3_02ri_ed0c9c01-f0c9-40d4-bd7f-10fa915ac57g

5. test enum (haven't fixed)

However, there are still some issues with the ENUM tests. For example, both the user and product tables contain an ENUM column named status, but while the user query returns rows correctly, the query on product.status fails.

img_v3_02ri_59d54af2-184e-416e-b987-fbd485b632eg

More Results

You can see more execution results of explain sqls from innodb_sql.test and videx_sql


Contribution Guidelines (Expand for Details)

We appreciate your contribution to VIDEX! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Core]: Changes to core engine functionality
  • [Opt]: Changes to VIDEX-Optimizer-Plugin
  • [Stats]: Changes to VIDEX-Statistic-Server
  • [Algo]: Implementation of new algorithms for NDV, cardinality estimation, etc.
  • [Pipe]: Enhancements to the pipeline (e.g., data collection, environment setup)
  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [Test]: Adding or updating tests
  • [Perf]: Performance improvements
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use the most specific prefix or multiple prefixes in order of importance (e.g., [Algorithm][Stats]).

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Changes have been tested on both Plugin-Mode and Standalone-Mode (if applicable)
  • Statistical accuracy has been verified (for algorithm or optimizer changes)
  • No regression in query plan accuracy compared to InnoDB (if applicable)
  • Performance benchmarks conducted (for performance-sensitive changes)

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add support for TIMESTAMP data type in statistic server

1 participant