Skip to content

Tantivy v0.20

Compare
Choose a tag to compare
@PSeitz PSeitz released this 09 Jun 13:11
· 257 commits to main since this release
e3eacb4

What's Changed

Bugfixes

  • Fix phrase queries with slop (slop supports now transpositions, algorithm that carries slop so far for num terms > 2) #2031#2020(@PSeitz)
  • Handle error for exists on MMapDirectory #1988 (@PSeitz)
  • Aggregation
    • Fix min doc_count empty merge bug #2057 (@PSeitz)
    • Fix: Sort order for term aggregations (sort order on key was inverted) #1858 (@PSeitz)

Features/Improvements

  • Add PhrasePrefixQuery #1842 (@trinity-1686a)
  • Add coerce option for text and numbers types (convert the value instead of returning an error during indexing) #1904 (@PSeitz)
  • Add regex tokenizer #1759(@mkleen)
  • Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. #1767 (@PSeitz)
  • Columnar crate: New fast field handling (@fulmicoton @PSeitz) #1806#1809
    • Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
    • Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. #1876 (@fulmicoton)
    • Unified access for fast fields over different cardinalities.
    • Unified storage for typed and untyped fields.
    • Move fastfield codecs into columnar. #1782 (@fulmicoton)
    • Sparse dense index for optional values #1716 (@PSeitz)
    • Switch to nanosecond precision in DateTime fastfield #2016 (@PSeitz)
  • Aggregation
    • Add date_histogram aggregation (only fixed_interval for now) #1900 (@PSeitz)
    • Add percentiles aggregations #1984 (@PSeitz)
    • [breaking] Drop JSON support on intermediate agg result (we use postcard as format in quickwit to send intermediate results) #1992 (@PSeitz)
    • Set memory limit in bytes for aggregations after which they abort (Previously there was only the bucket limit) #1942#1957(@PSeitz)
    • Add support for u64,i64,f64 fields in term aggregation #1883 (@PSeitz)
    • Add count, min, max, and sum aggregations #1794 (@guilload)
    • Switch to Aggregation without serde_untagged => better deserialization errors. #2003 (@PSeitz)
    • Switch to ms in histogram for date type (ES compatibility) #2045 (@PSeitz)
    • Reduce term aggregation memory consumption #2013 (@PSeitz)
    • Reduce agg memory consumption: Replace generic aggregation collector (which has a high memory requirement per instance) in aggregation tree with optimized versions behind a trait.
    • Split term collection count and sub_agg (Faster term agg with less memory consumption for cases without sub-aggs) #1921 (@PSeitz)
    • Schemaless aggregations: In combination with stacker tantivy supports now schemaless aggregations via the JSON type.
    • Perf: Fetch blocks of vals in aggregation for all cardinality #1950 (@PSeitz)
  • Searcher with disabled scoring via EnableScoring::Disabled #1780 (@shikhar)
  • Enable tokenizer on json fields #2053 (@PSeitz)
  • Enforcing "NOT" and "-" queries consistency in UserInputAst #1609 (@denis Bazhenov)
  • Faster indexing
  • Faster search
    • Work in batches of docs on the SegmentCollector (Only for cases without score for now) #1937 (@PSeitz)
    • Faster fast field range queries using SIMD #1954 (@fulmicoton)
    • Improve fast field range query performance #1864 (@PSeitz)
  • Make BM25 scoring more flexible #1855 (@alexcole)
  • Switch fs2 to fs4 as it is now unmaintained and does not support illumos #1944 (@Toasterson)
  • Made BooleanWeight and BoostWeight public #1991 (@fulmicoton)
  • Make index compatible with virtual drives on Windows #1843 (@yukun Guo)
  • Auto downgrade index record option, instead of vint error #1857 (@PSeitz)
  • Enable range query on fast field for u64 compatible types #1762 (@PSeitz) [#1876]
  • sstable
  • Add seperate tokenizer manager for fast fields #2019 (@PSeitz)
  • Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. #1756 (@adamreichold)
  • Added support for madvise when opening an mmaped Index #2036 (@fulmicoton)
  • Rename DatePrecision to DateTimePrecision #2051 (@guilload)
  • Query Parser
    • Quotation mark can now be used for phrase queries. #2050 (@fulmicoton)
    • PhrasePrefixQuery is supported in the query parser via: field:"phrase ter"* #2044 (@adamreichold)
  • Docs

New Contributors

Full Changelog: https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md