A linear programming approach to jointly optimize the sorting, compression, indexing, and tiering decision for spatio-temporal workloads.
To get all dependencies of the Table Configuration Optimizer installed, run
pip install -r requirements.txt
To start the jupyter lab, run
jupyter lab
Additionally, it is required to have a solver, such as gurobi, installed.
We used a workload consisting of six query templates to evaluate the models based on the data of a transportation network company.
ID | Query | Frequency | Skipped Chunks |
---|---|---|---|
q0 | SELECT * FROM TABLE WHERE ("driver_id" <= {selectivity of the value: 0.0001}) AND ("status" <= {0.7}) |
20% | 0, 3, 6, 7, 8 |
q1 | SELECT * FROM TABLE WHERE ("timestamp" BETWEEN {0.2}) AND ("latitude" BETWEEN {0.5}) AND ("longitude" BETWEEN {0.5}) AND ("status" <= {0.7}) |
15% | |
q2 | SELECT * FROM TABLE WHERE („driver_id" <= {0.01}) AND ("latitude" BETWEEN {0.1}) AND ("longitude" BETWEEN {0.1}) |
10% | 4, 5, 6, 7, 8, 9 |
q3 | SELECT * FROM TABLE WHERE ("timestamp" <= {0.05}) AND ("latitude" BETWEEN {0.7}) AND ("longitude" BETWEEN {0.7}) |
25% | 0, 1, 2, 6, 7, 8, 9 |
q4 | SELECT * FROM TABLE WHERE („driver_id" <= {0.01}) AND ("timestamp" <= {0.4}) |
15% | 1, 2, 3, 4, 5, 6, 7, 8, 9 |
q5 | SELECT * FROM TABLE WHERE ("latitude" BETWEEN {0.01}) AND ("longitude" BETWEEN {0.01}) AND ("timestamp" BETWEEN {0.5}) |
15% | 0, 1, 8, 9 |
Further workload examples are provided in /data/workloads
.