The current search algorithm implemented in SearchUtil is a bit strange:
- it supports:
- prefix matches
- word matches
- 'limited fuzzy matches': matches with missing words, so long as they're in order
("KeepJustMe" can be found with "KeepMe" and "KeMe" but not "JustKeep")
- it does not support arbitrary substring matches ("eep" does not find "Keep")
- its entry scoring system seems excessively complicated and might cause noticeable lag, but I haven't benchmarked anything (see
SearchUtil.Entry::getScoreFor)
If I missed anything the current algorithm supports or why the scoring has to be the way it is, please comment.
I started poking around in SearchUtil to see if I could use #338's StringLookup to improve or de-duplicate things.
I found that switching StringLookup would require changing the kinds of matches supported:
- remove 'limited fuzzy matches'
- add arbitrary substring matches
I could look into adding some form of fuzzy matching to StringLookup.
It would have the following advantages:
- make the complex part of the scoring system unnecessary (unless I overlooked something)
- potentially be faster
- maybe allow for more caching, esp. if we remove the 'hit count' part of the scoring
A rewrite could also lay some ground work for adding feature parity with the search-mappings command.
ofc StringLookup could only be used if it ends up being performant at MC source scale and can be effectively parallelized.
Alternatives to a substantial rewrite centered around StringLookup:
- try to improve the current algorithm using the
StringMultiTries that back lookups
- try to implement more caching in the current algorithm
- expand the current fuzzy matching
- overhaul the scoring
The current search algorithm implemented in
SearchUtilis a bit strange:("KeepJustMe" can be found with "KeepMe" and "KeMe" but not "JustKeep")
SearchUtil.Entry::getScoreFor)If I missed anything the current algorithm supports or why the scoring has to be the way it is, please comment.
I started poking around in
SearchUtilto see if I could use #338'sStringLookupto improve or de-duplicate things.I found that switching
StringLookupwould require changing the kinds of matches supported:I could look into adding some form of fuzzy matching to
StringLookup.It would have the following advantages:
A rewrite could also lay some ground work for adding feature parity with the
search-mappingscommand.ofc
StringLookupcould only be used if it ends up being performant at MC source scale and can be effectively parallelized.Alternatives to a substantial rewrite centered around
StringLookup:StringMultiTries that back lookups