Skip to content

A set of native implementation of common bioinformatics algorithms to be used as Arrow-Datafusion or SeQuiLa (Apache Spark) extensions.

License

Notifications You must be signed in to change notification settings

biodatageeks/sequila-native

Repository files navigation

sequila-native

A set of native implementation of common bioinformatics algorithms to be used as Arrow-DataFusion or SeQuiLa (Apache Spark) extensions.

RUSTFLAGS="-C target-cpu=native" RUST_LOG=info cargo run --release

Run a sql file

RUST_LOG=info cargo run -p sequila-cli -- --file queries/q1-coitrees.sql

Perf

https://docs.rs/crate/flamegraph/0.6.5

On ArchLinux

sudo pacman -S perf gcc-libs glibc
cargo install flamegraph
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

cargo build --release
flamegraph -- target/release/sequila-cli -f queries/q1-coitrees.sql

Recommended parameters

SET sequila.prefer_interval_join TO true;
SET sequila.interval_join_algorithm TO coitrees;
SET datafusion.optimizer.repartition_joins TO false;
SET datafusion.execution.coalesce_batches TO false;

-- for controlling parallism level (only for bechmarking purposes otherwise use defaults)
SET datafusion.execution.target_partitions=1;    

How to run benchmark locally:

  1. Download and unpack test dataset.
  2. Export env variable with path to the root folder with benchmark data, e.g.:
export BENCH_DATA_ROOT=/Users/mwiewior/research/databio/ 
  1. Run benchmark
RUSTFLAGS="-Ctarget-cpu=native" cargo bench --bench databio_benchmark -- --quick

About

A set of native implementation of common bioinformatics algorithms to be used as Arrow-Datafusion or SeQuiLa (Apache Spark) extensions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages