I'm running Tantivy alongside a SQLite database to augment it with (better) full-text searching. When the app alters a record in the database, it creates a new index writer and issues a corresponding delete_term and add_document in the Tantivy index, followed by committing and dropping the writer.
Over some time of usage, although the SQLite database is only 93 megabytes, the Tantivy index has ended up with 2k individual segments with an average of ~26kb/segment, which in all takes several seconds to load from cold cache on my ZFS drives.
After deleting the index and recreating it from the data in the database, I can get it down to 15 segments, each around 1MB, but that's not a workaround I want to apply often. I'm not applying any custom options to the index writer, and it should be using the default merger.
#194 mentions this problem as well, but has been open for a long time.
Is there a better way to index small amounts of documents like this without having to constantly remake the index?
I'm running Tantivy alongside a SQLite database to augment it with (better) full-text searching. When the app alters a record in the database, it creates a new index writer and issues a corresponding
delete_termandadd_documentin the Tantivy index, followed by committing and dropping the writer.Over some time of usage, although the SQLite database is only 93 megabytes, the Tantivy index has ended up with 2k individual segments with an average of ~26kb/segment, which in all takes several seconds to load from cold cache on my ZFS drives.
After deleting the index and recreating it from the data in the database, I can get it down to 15 segments, each around 1MB, but that's not a workaround I want to apply often. I'm not applying any custom options to the index writer, and it should be using the default merger.
#194 mentions this problem as well, but has been open for a long time.
Is there a better way to index small amounts of documents like this without having to constantly remake the index?