Tantivy v0.20
What's Changed
Bugfixes
- Fix phrase queries with slop (slop supports now transpositions, algorithm that carries slop so far for num terms > 2) #2031#2020(@PSeitz)
- Handle error for exists on MMapDirectory #1988 (@PSeitz)
- Aggregation
Features/Improvements
- Add PhrasePrefixQuery #1842 (@trinity-1686a)
- Add
coerce
option for text and numbers types (convert the value instead of returning an error during indexing) #1904 (@PSeitz) - Add regex tokenizer #1759(@mkleen)
- Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. #1767 (@PSeitz)
- Columnar crate: New fast field handling (@fulmicoton @PSeitz) #1806#1809
- Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
- Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. #1876 (@fulmicoton)
- Unified access for fast fields over different cardinalities.
- Unified storage for typed and untyped fields.
- Move fastfield codecs into columnar. #1782 (@fulmicoton)
- Sparse dense index for optional values #1716 (@PSeitz)
- Switch to nanosecond precision in DateTime fastfield #2016 (@PSeitz)
- Aggregation
- Add
date_histogram
aggregation (onlyfixed_interval
for now) #1900 (@PSeitz) - Add
percentiles
aggregations #1984 (@PSeitz) - [breaking] Drop JSON support on intermediate agg result (we use postcard as format in
quickwit
to send intermediate results) #1992 (@PSeitz) - Set memory limit in bytes for aggregations after which they abort (Previously there was only the bucket limit) #1942#1957(@PSeitz)
- Add support for u64,i64,f64 fields in term aggregation #1883 (@PSeitz)
- Add count, min, max, and sum aggregations #1794 (@guilload)
- Switch to Aggregation without serde_untagged => better deserialization errors. #2003 (@PSeitz)
- Switch to ms in histogram for date type (ES compatibility) #2045 (@PSeitz)
- Reduce term aggregation memory consumption #2013 (@PSeitz)
- Reduce agg memory consumption: Replace generic aggregation collector (which has a high memory requirement per instance) in aggregation tree with optimized versions behind a trait.
- Split term collection count and sub_agg (Faster term agg with less memory consumption for cases without sub-aggs) #1921 (@PSeitz)
- Schemaless aggregations: In combination with stacker tantivy supports now schemaless aggregations via the JSON type.
- Perf: Fetch blocks of vals in aggregation for all cardinality #1950 (@PSeitz)
- Add
Searcher
with disabled scoring viaEnableScoring::Disabled
#1780 (@shikhar)- Enable tokenizer on json fields #2053 (@PSeitz)
- Enforcing "NOT" and "-" queries consistency in UserInputAst #1609 (@denis Bazhenov)
- Faster indexing
- Faster search
- Make BM25 scoring more flexible #1855 (@alexcole)
- Switch fs2 to fs4 as it is now unmaintained and does not support illumos #1944 (@Toasterson)
- Made BooleanWeight and BoostWeight public #1991 (@fulmicoton)
- Make index compatible with virtual drives on Windows #1843 (@yukun Guo)
- Auto downgrade index record option, instead of vint error #1857 (@PSeitz)
- Enable range query on fast field for u64 compatible types #1762 (@PSeitz) [#1876]
- sstable
- Isolating sstable and stacker in independant crates. #1718 (@fulmicoton)
- New sstable format #1943#1953 (@trinity-1686a)
- Use DeltaReader directly to implement Dictionnary::ord_to_term #1928 (@trinity-1686a)
- Use DeltaReader directly to implement Dictionnary::term_ord #1925 (@trinity-1686a)
- Add seperate tokenizer manager for fast fields #2019 (@PSeitz)
- Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. #1756 (@adamreichold)
- Added support for madvise when opening an mmaped Index #2036 (@fulmicoton)
- Rename
DatePrecision
toDateTimePrecision
#2051 (@guilload) - Query Parser
- Quotation mark can now be used for phrase queries. #2050 (@fulmicoton)
- PhrasePrefixQuery is supported in the query parser via:
field:"phrase ter"*
#2044 (@adamreichold)
- Docs
New Contributors
- @mhlakhani made their first contribution in #1733
- @pinkforest made their first contribution in #1746
- @DawChihLiou made their first contribution in #1737
- @mkleen made their first contribution in #1759
- @lonre made their first contribution in #1803
- @gyk made their first contribution in #1843
- @alexcole made their first contribution in #1855
- @Toasterson made their first contribution in #1944
- @vsop-479 made their first contribution in #1970
- @Tony-X made their first contribution in #1985
- @RTEnzyme made their first contribution in #1999
- @tottoto made their first contribution in #2018
- @nyurik made their first contribution in #2038
- @bazhenov made their first contribution in #1609
- @lavrd made their first contribution in #1422
- @tnxbutno made their first contribution in #2069
Full Changelog: https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md