Skip to content

Slow performance and many small segments #2932

@ColonelThirtyTwo

Description

@ColonelThirtyTwo

I'm running Tantivy alongside a SQLite database to augment it with (better) full-text searching. When the app alters a record in the database, it creates a new index writer and issues a corresponding delete_term and add_document in the Tantivy index, followed by committing and dropping the writer.

Over some time of usage, although the SQLite database is only 93 megabytes, the Tantivy index has ended up with 2k individual segments with an average of ~26kb/segment, which in all takes several seconds to load from cold cache on my ZFS drives.

After deleting the index and recreating it from the data in the database, I can get it down to 15 segments, each around 1MB, but that's not a workaround I want to apply often. I'm not applying any custom options to the index writer, and it should be using the default merger.

#194 mentions this problem as well, but has been open for a long time.

Is there a better way to index small amounts of documents like this without having to constantly remake the index?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions