- An attempt to achieve faster querying through column wise data storage than row-wise storage.
- Data is stored in three formats
.datafor storage data of models.metafor metadata of schema.idxfor indexing, min/max indexing
- Implement Basic Columnar Storage
- Define Table Schema & Metadata
- Implement Column-Wise Metadata storage
- Implement Metadata Loading & Table Initialization
- Implement Indexing for faster queries
- Add Min-Max for fast filtering
- Implementing Offset-Based Index for faster reads
- Query Execution
- Implement query execution in your columnar DB
NOTE: Rustup Nightly version is required to run
To see how the columnar database works, run the included test script:
chomd +x test_simd.sh
./test_simd.shSIMD filtering support doc - url
β Create a table
cargo run -- create-table users id:int name:string age:intπ’ Output:
Table 'users' created!β Insert a row
cargo run -- insert users 1 "Alice" 25π’ Output:
Inserted into 'users': ["1", "Alice", "25"]β Scan table
cargo run -- scan users ageπ’ Output:
Read value: 25
Read value: 32
Read value: 43
Read value: 54
Read value: 65
Read value: 35
Read value: 54β Filter using x86 SIMD instructions(4 * i32)
NOTE: This only supports int data types
- for values equal to a threshold value
cargo run -- filter-simd-eq users age 54π’ Output:
Matched value at index 3: 54
Matched value at index 6: 54- for values not equal to a threshold value
cargo run -- filter-simd-not-eq users age 25π’ Output:
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54- for values greater than a threshold value
cargo run -- filter-simd-gt users age 30π’ Output:
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54- for values lesser than a threshold value
cargo run -- filter-simd-lt users age 30π’ Output:
Matched value at index 0: 25- for values lesser than equal to a threshold value
cargo run -- filter-simd-lt-eq users age 30π’ Output:
Matched value at index 0: 25- for values greater than equal to a threshold value
cargo run -- filter-simd-gt-eq users age 30π’ Output:
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54- for using logical operator
cargo run -- filter-simd-logical users age gt 25 age lt 54 orπ’ Output:
Matched row at index 5: age = 35, age = 35
Matched row at index 4: age = 65, age = 65
Matched row at index 1: age = 32, age = 32
Matched row at index 2: age = 43, age = 43
Matched row at index 3: age = 54, age = 54
Matched row at index 6: age = 54, age = 54
Matched row at index 0: age = 25, age = 25β Filter using x86 SIMD instructions AVX2 256(8 * i32)
- for values equal to a threshold value
cargo run -- filter-simd-eq-avx users age 20π’ Output:
Matched value at index 8: 20
Matched value at index 10: 20- for values not equal to a threshold value
cargo run -- filter-simd-not-eq-avx users age 20π’ Output:
Matched value at index 0: 25
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54
Matched value at index 7: 19
Matched value at index 9: 21
Matched value at index 11: 22- for values greater than threshold value
cargo run -- filter-simd-gt-avx users age 20π’ Output:
Matched value at index 0: 25
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54
Matched value at index 9: 21
Matched value at index 11: 22- for values lesser than threshold value
cargo run -- filter-simd-lt-avx users age 20π’ Output:
Matched value at index 7: 19- for values greater than equal to a threshold value
cargo run -- filter-simd-gt-eq-avx users age 20π’ Output:
Matched value at index 0: 25
Matched value at index 1: 32
Matched value at index 2: 43
Matched value at index 3: 54
Matched value at index 4: 65
Matched value at index 5: 35
Matched value at index 6: 54
Matched value at index 8: 20
Matched value at index 9: 21
Matched value at index 10: 20
Matched value at index 11: 22- for values lesser than equal to a threshold value
cargo run -- filter-simd-lt-eq-avx users age 20π’ Output:
Matched value at index 7: 19
Matched value at index 8: 20
Matched value at index 10: 20β List tables
cargo run -- list-tablesπ’ Output:
Tables present in the database:
- users [id (int), name (string), age (int)]Handles column storage and operations on individual columns.
Represents a column in a table.
pub struct Column {
pub name: String,
pub data_type: String,
}Manages column-based storage operations.
pub struct ColumnStore {
pub base_path: String,
}Stores min-max index metadata for filtering.
pub struct MinMaxIndex {
pub chunk_offset: u64,
pub min_value: String,
pub max_value: String,
}Creates a new column store and initializes the base directory.
Inserts a row into the column store, updating min-max indexes.
Reads all values from a specified column and prints them.
ColumnStore::filter_column(&self, table: &TableSchema, column_name: &str, predicate: &str) -> Vec<String>
Filters a column based on a predicate using min-max indexes and returns matching values.
Handles table schema management and metadata storage.
Represents a table schema.
pub struct TableSchema {
pub table_name: String,
pub columns: Vec<Column>,
}Creates a table with the name and columns inside it.
Saves the table schema metadata as a JSON file.
Loads the table schema from metadata storage.
Loads all table metadata in the base directory and prints the available tables.
Module declarations for column.rs and table.rs.
pub mod column;
pub mod table;Entrypoint for the program.
use storage::{column::{Column, ColumnStore}, table::TableSchema};
pub mod storage;fn main() {
let store = ColumnStore::new("./data");
let schema = TableSchema {
table_name: "users".to_string(),
columns: vec![
Column { name: "id".to_string(), data_type: "int".to_string() },
Column { name: "name".to_string(), data_type: "string".to_string() },
],
};
schema.save("./data");
store.insert_row(&schema, vec!["1", "Alice"]);
store.scan_column(&schema, "name");
let results = store.filter_column(&schema, "name", "Alice");
println!("Filtered results: {:?}", results);
}This reference provides a clear overview of the API structure, usage, and example implementation.
- Implement command line using clap
- test the benchmark with postgresql database.