This is a very short and simple benchmark. It is not intended to be a comprehensive analysis of suffix array construction algorithms (SACAs) in general.
The goal is to roughly compare the performance of SACAs that are available for Rust developers on crates.io.
libsais
: Rust bindings for the C librarylibsais
, supports multithreading optionally. (bias alert: I created the bindings)divsufsort
: A port of the C librarydivsufsort
.suffix
: A library for UTF8-encoded (str
) texts and not byte arrays[u8]
. This library therefore solves a slightly different, tricky problem, and is only included out of curiosity.bio
: A large package of algorithms and data structures for bioinformatics applications.psacak
: A stand-alone Rust implementation of a SACA with multithreading support viarayon
.sais-drum
: A Rust implementation heavily inspired bylibsais
, unfinished and not fully optimized. It now also support u32 construction, which is not included yet. (bias alert: I created this library)sufr
: I recently stumbled across this library, and it is not yet included in the benchmark. It has a focus on storing the suffix array in a file and querying it efficiently.
- Input data: The first 2GB of the human genome. This makes sure that the libraries computing
i32
-based suffix arrays can participate. - The benchmarks were executed on a Windows laptop with an AMD Ryzen 7 PRO 8840HS (octa-core) processor and 32 GB of RAM.
- The libraries with multithreading support were instructed to use 8 threads.
The type of the indices of the returned suffix array is provided in the legend. It has a large influence on the memory usage and on the maximum text length that the library supports.
All implementations except for suffix
and bio
do not need a significant amount of additional memory apart from the suffix array itself. libsais
provides the fastest implementation and is among the most memory efficient ones.
Only for texts of length between i32
, psacak
might be a more memory efficient solution.