Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
This looks really neat! I'll definitely read through your implementation soon. One thing though, I noticed that your frizbee usage might be hurting the performance a bit. You'll want to use the Your micro benches use the Also, you might want to try using the parallel implementation in frizbee, as it scales much better than rayon in my testing |
|
Very interesting @LoricAndre, will try! |
And here I was, thinking I finally optimized it enough to catch up to frizbee's performance 😂 |
|
I wish I had the time to investigate this more, but I'm slammed right now. I did a quick test by replacing the frizbee code with the code below, and I've included the results below as well. I realize now the main performance issue is instantiating the c.bench_function("micro_frizbee", |b| {
let mut matcher = frizbee::Matcher::new("test", &Default::default());
b.iter(|| {
let mut count = 0u64;
for _ in matcher.match_iter(&lines) {
count += 1;
}
count
});
});
c.bench_function("micro_typos_frizbee", |b| {
let config = frizbee::Config {
max_typos: Some(1),
..Default::default()
};
let mut matcher = frizbee::Matcher::new("test", &config);
b.iter(|| {
let mut count = 0u64;
for _ in matcher.match_iter(&lines) {
count += 1;
}
count
});
});
c.bench_function("micro_frizbee_parallel", |b| {
b.iter(|| {
let mut count = 0u64;
for _ in frizbee::match_list_parallel("test", &lines, &Default::default(), 16) {
count += 1;
}
count
});
}); |
|
Btw you might want to explore incremental matching in your arinae implementation as well! saghen/frizbee#65 |
|
Yeah recreating the matcher at each iteration is stupid of me. I'll spend some more time looking into this, thanks |
I checked a bit more, and the current architecture does not allow me to reuse matchers, even using some |
LoricAndre
left a comment
There was a problem hiding this comment.
Initial review

Description
Arinae is designed to become skim's default algorithm in the future.
Technically, it uses Smith-Waterman and a modified Levenshtein distance with affine gaps for scoring, as well as multiple optimizations (the main ones being a loose prefilter and checks for early dismissal of paths that cannot lead to the best match). It also forbids typos on the first char of the query.
In practice, it should feel close to FZY's scoring with typos disabled, but with a more natural behavior regarding typos as Frizbee or other algorithms.
These other algorithms usually work by allowing a set number of typos using 3D matrices for computations, the max-typos value being set based on the length of the query. In practice, that meant that
teswill match exactly, buttestwill allow one typo, meaning that typing a single character will change the filtered items completely. This algorithm will instead penalize typos, not block them completely.This algorithm does not aim to revolution anything, but it aims at making typo-resistant fuzzy matching feel more like an actual alternative to the current options (namely FZF and FZY), while maintaining per-item performance at least as good as the current algorithms.
Checklist
README.md, comments,src/manpage.rsand/orsrc/options.rsif applicable)Note: codecov runs on the PR on this repo, but feel free to ignore it.
Benches (w/
bench.sh: less precise but shows rss usage & compatible with FZF)Skim V2 (current default)
Frizbee (no typos)
Arinae (no typos)
FZF for comparison
Frizbee (typos)
Arinae (typos) - note the difference in the number of results compared to frizbee
Benches (w/
criterion: more precise, hooks directly into the code)cargo bench --bench read_and_match Compiling skim v3.5.0 (/home/loric/src/skim) Finished `bench` profile [optimized] target(s) in 1m 21s Running benches/read_and_match.rs (target/release/deps/read_and_match-7be3557b13dbf6c1) Gnuplot not found, using plotters backend Benchmarking default: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 37.9s. default time: [3.6986 s 3.7315 s 3.7645 s] change: [−2.0247% −0.7908% +0.4336%] (p = 0.25 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild Benchmarking query: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 43.9s. query time: [4.5603 s 4.7987 s 5.0747 s] change: [+1.0711% +7.2752% +14.625%] (p = 0.05 > 0.05) No change in performance detected. Found 3 outliers among 10 measurements (30.00%) 1 (10.00%) low severe 1 (10.00%) low mild 1 (10.00%) high severe Benchmarking query_frizbee: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 44.3s. query_frizbee time: [4.6213 s 4.8387 s 5.1942 s] change: [−10.587% +0.9013% +13.415%] (p = 0.90 > 0.05) No change in performance detected. Found 2 outliers among 10 measurements (20.00%) 1 (10.00%) high mild 1 (10.00%) high severe Benchmarking query_ari: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 49.4s. query_ari time: [4.0759 s 4.4534 s 5.0057 s] change: [−13.815% +1.4402% +30.916%] (p = 0.91 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe Benchmarking query_frizbee_typos: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 44.6s. query_frizbee_typos time: [4.2940 s 4.3626 s 4.4337 s] Benchmarking query_ari_typos: Warming up for 3.0000 s Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 43.3s. query_ari_typos time: [4.2142 s 4.4391 s 4.6898 s] Found 2 outliers among 10 measurements (20.00%) 1 (10.00%) low severe 1 (10.00%) high severe