Is your feature request related to a problem? Please describe.
Currently, the BM25 scoring parameters k1 and b are hardcoded as module-level constants (K1 = 1.2, B = 0.75) in src/query/bm25.rs. This prevents users from tuning these parameters for domain-specific relevance optimization.
While (k1=1.2, b=0.75) works well as a general default (matching Lucene/Elasticsearch), different applications may require different values:
- Short vs. long documents: Fields with significantly different average lengths (e.g., titles vs. full-text articles) may benefit from different
b values (0.3-0.9 range)
- Domain-specific corpora: Legal/medical documents or code repositories often have term frequency distributions that differ from general text, benefiting from adjusted
k1 (1.0-2.0 range)
- A/B testing: Production systems need to experiment with parameter values to optimize search relevance metrics (NDCG, MRR, etc.)
- Code search optimization: In our code-context-engine project, we maintain a Tantivy fork just to adjust these parameters, which creates maintenance burden
Describe the solution you'd like
Add a Bm25Params struct to IndexSettings, allowing per-index BM25 configuration that persists across restarts.
Proposed API:
use tantivy::IndexSettings;
use tantivy::query::Bm25Params;
let bm25_params = Bm25Params { k1: 1.5, b: 0.6 };
let settings = IndexSettings {
bm25_params: Some(bm25_params),
..Default::default()
};
let index = Index::create_in_dir(path, schema, settings)?;
Users who don't specify custom parameters continue to use the default (k1=1.2, b=0.75).
Implementation approach:
- Add
Bm25Params { k1: f32, b: f32 } with Default, Serialize, and Deserialize
- Add
bm25_params: Option<Bm25Params> to IndexSettings with #[serde(default)]
- Pass parameters through query execution chain to
Bm25Weight construction
- When
bm25_params is None, fall back to default constants
- Update call sites in
TermQuery, PhraseQuery, BooleanQuery, and BlockWand
This approach provides:
- Backward compatibility (old meta.json loads successfully)
- Persistence across restarts (stored in meta.json)
- Per-index configurability
- No index format version bump required
[Optional] describe alternatives you've considered
Other approaches (environment variables, compile-time features, maintaining a fork) cannot provide per-index runtime configurability or create unacceptable maintenance burden. The proposed approach follows Tantivy's existing patterns (similar to TokenizerManager).
Additional context
- Constants are defined in
src/query/bm25.rs lines 8-9
- This would make Tantivy more competitive with Elasticsearch/Lucene for production use cases
- I'm willing to implement this and would appreciate feedback on the API design before starting
Is your feature request related to a problem? Please describe.
Currently, the BM25 scoring parameters
k1andbare hardcoded as module-level constants (K1 = 1.2,B = 0.75) insrc/query/bm25.rs. This prevents users from tuning these parameters for domain-specific relevance optimization.While
(k1=1.2, b=0.75)works well as a general default (matching Lucene/Elasticsearch), different applications may require different values:bvalues (0.3-0.9 range)k1(1.0-2.0 range)Describe the solution you'd like
Add a
Bm25Paramsstruct toIndexSettings, allowing per-index BM25 configuration that persists across restarts.Proposed API:
Users who don't specify custom parameters continue to use the default
(k1=1.2, b=0.75).Implementation approach:
Bm25Params { k1: f32, b: f32 }withDefault,Serialize, andDeserializebm25_params: Option<Bm25Params>toIndexSettingswith#[serde(default)]Bm25Weightconstructionbm25_paramsisNone, fall back to default constantsTermQuery,PhraseQuery,BooleanQuery, andBlockWandThis approach provides:
[Optional] describe alternatives you've considered
Other approaches (environment variables, compile-time features, maintaining a fork) cannot provide per-index runtime configurability or create unacceptable maintenance burden. The proposed approach follows Tantivy's existing patterns (similar to
TokenizerManager).Additional context
src/query/bm25.rslines 8-9