feat: incremental matching by ph1losof · Pull Request #65 · saghen/frizbee

ph1losof · 2026-02-22T21:22:13Z

Hi @saghen, this is between a draft and fully ready, so feel free to share your opinion.

Note: AI was used to help write the benchmark code.

Closes #5, related to television discussion - #19

What this does

When you're typing into a fuzzy finder character by character (f -> fo -> foo), each keystroke can only eliminate matches from the previous set - it can never add new ones. IncrementalMatcher takes advantage of this by remembering which haystack indices matched last time and only rescoring that shrinking subset on each prefix extension. Everything else (completely different query, haystack list change) falls back to a full rescore.

The API mirrors Matcher but takes the needle per-call instead of at construction:

let mut matcher = IncrementalMatcher::new(&config);
let matches = matcher.match_list("f", &haystacks);
let matches = matcher.match_list("fo", &haystacks);  // only rescores previous matches

Also supports match_list_indices, match_list_parallel, reset(), and handles haystack growth between calls.

Benchmark results on synthetic file-path data (500k haystacks, 8 keystroke sequence):

Overall:  one-shot 319ms  incremental 166ms  (1.92x)

Per step:
 "s"        482088 matches    30ms →  30ms   1.0x
 "sr"       355063 matches    44ms →  43ms   1.0x
 "src"      188339 matches    41ms →  36ms   1.2x
 "src/"     112305 matches    43ms →  25ms   1.7x
 "src/c"     53330 matches    50ms →  18ms   2.7x
 "src/co"    42282 matches    51ms →   6ms   8.5x
 "src/com"   17301 matches    38ms →   3ms   9.8x
 "src/comp"   6293 matches    22ms →   1ms  16.2x

Early steps where almost everything matches show no overhead (1.0x). The wins kick in once selectivity increases and the narrowed subset gets meaningfully smaller.

Approaches I tested that didn't pan out

Delta prefilter - before running the full SIMD prefilter on the narrowed set, check if just the new character exists in the haystack (cheap scalar byte scan). Turns out the SIMD prefilter already runs at ~16ns/item, and the scalar check costs ~25ns/item. Adding it as a pre-pass actually made things slower since the SIMD path is already fast enough that the extra branch isn't worth it.

Incremental prefilter/SW construction - set_needle rebuilds both the prefilter and the Smith-Waterman matcher from scratch each call. Thought about making it append-only for prefix extensions. Measured it: set_needle costs 120-270ns depending on needle length. Over an 8-step sequence that's ~1.7µs total out of ~166ms. Not worth the complexity, and the SW matrix uses a variable stride that changes per haystack, making in-place growth impractical without reworking the core DP.

Partial SW reuse - cache the first N-1 rows of the score matrix from the previous needle, only compute the new row. Problem is the matrix is per-haystack (different haystack lengths = different column counts), so you'd need to store a matrix per surviving match. That's ~12KB per match. At 23k matches that's 280MB.

What could come next

Top-K mode where you can stop scoring early once you have enough high-quality results. Right now we score everything because the API returns all matches. A match_list_top_k(needle, haystacks, k) variant could skip SW for items that can't possibly make the cut based on their previous score.
Score threshold parameter for similar early-out behavior when the caller knows they only care about matches above a quality bar.

Replace `if let ... && ...` expressions with nested `if let` / `if` blocks to avoid E0658 on toolchains without stabilized let chains.

When a user types a query character by character, extending a needle can only eliminate matches, never create new ones. IncrementalMatcher stores which haystack indices matched previously and only rescores that subset on prefix extension, giving ~2x overall speedup. Supports match_list, match_list_indices, match_list_parallel, reset, and haystack growth between calls.

Compares IncrementalMatcher vs one-shot match_list with file-path datasets across multiple query patterns and dataset sizes. Shows per-step breakdowns and selectivity impact.

ph1losof added 5 commits February 22, 2026 22:13

fix: replace let chains with nested ifs in alignment_iter

7747f77

Replace `if let ... && ...` expressions with nested `if let` / `if` blocks to avoid E0658 on toolchains without stabilized let chains.

bench: add incremental matching benchmark

2f369f0

Compares IncrementalMatcher vs one-shot match_list with file-path datasets across multiple query patterns and dataset sizes. Shows per-step breakdowns and selectivity impact.

chore: cleanup incremental benchmark

e5f02b2

feat: restore from history on backspace instead of full rescore

80a0439

saghen force-pushed the main branch from 30d10ea to efc3f88 Compare February 23, 2026 01:37

saghen mentioned this pull request Feb 27, 2026

feat: add Arinae algorithm skim-rs/skim#990

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: incremental matching#65

feat: incremental matching#65
ph1losof wants to merge 5 commits intosaghen:mainfrom
ph1losof:feat/incremental-matching

ph1losof commented Feb 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ph1losof commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Approaches I tested that didn't pan out

What could come next

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ph1losof commented Feb 22, 2026 •

edited

Loading