Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 17 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,50 +25,43 @@ There are a couple ways to install `tantivy-cli`.
If you are a Rust programmer, you probably have `cargo` installed and you can just
run `cargo install tantivy-cli`



## Creating the index: `new`

Let's create a directory in which your index will be stored.

```bash
# create the directory
mkdir wikipedia-index
```


We will now initialize the index and create its schema.
The [schema](https://quickwit-oss.github.io/tantivy/tantivy/schema/index.html) defines
the list of your fields, and for each field:
- its name
- its name
- its type, currently `u64`, `i64` or `str`
- how it should be indexed.

You can find more information about the latter on
You can find more information about the latter on
[tantivy's schema documentation page](https://quickwit-oss.github.io/tantivy/tantivy/schema/index.html)

In our case, our documents will contain
* a title
* a body
* a body
* a url

We want the title and the body to be tokenized and indexed. We also want
We want the title and the body to be tokenized and indexed. We also want
to add the term frequency and term positions to our index.

Running `tantivy new` will start a wizard that will help you
define the schema of the new index.

Like all the other commands of `tantivy`, you will have to
Like all the other commands of `tantivy`, you will have to
pass it your index directory via the `-i` or `--index`
parameter as follows:


```bash
tantivy new -i wikipedia-index
```



Answer the questions as follows:

```none
Expand Down Expand Up @@ -142,13 +135,10 @@ It is a fairly human readable JSON, so you can check its content.

It contains two sections:
- segments (currently empty, but we will change that soon)
- schema


- schema

# Indexing the document: `index`


Tantivy's `index` command offers a way to index a json file.
The file must contain one JSON object per line.
The structure of this JSON object must match that of our schema definition.
Expand All @@ -168,10 +158,9 @@ If you are in a rush you can [download 100 articles in the right format here (11

The `index` command will index your document.
By default it will use as 3 thread, each with a buffer size of 1GB split a
across these threads.
across these threads.


```
```bash
cat wiki-articles.json | tantivy index -i ./wikipedia-index
```

Expand All @@ -192,18 +181,18 @@ The main file is `meta.json`.

You should also see a lot of files with a UUID as filename, and different extensions.
Our index is in fact divided in segments. Each segment acts as an individual smaller index.
Its name is simply a uuid.
Its name is simply a uuid.

If you decided to index the complete wikipedia, you may also see some of these files disappear.
Having too many segments can hurt search performance, so tantivy actually automatically starts
merging segments.
merging segments.

# Serve the search index: `serve`

Tantivy's cli also embeds a search server.
You can run it with the following command.

```
```bash
tantivy serve -i wikipedia-index
```

Expand All @@ -218,32 +207,31 @@ By default this query is treated as `barack OR obama`.
You can also search for documents that contains both term, by adding a `+` sign before the terms in your query.

http://localhost:3000/api/?q=%2Bbarack%20%2Bobama&nhits=20

Also, `-` makes it possible to remove documents the documents containing a specific term.

http://localhost:3000/api/?q=-barack%20%2Bobama&nhits=20

Finally tantivy handle phrase queries.

http://localhost:3000/api/?q=%22barack%20obama%22&nhits=20


# Search the index via the command line

You may also use the `search` command to stream all documents matching a specific query.
The documents are returned in an unspecified order.

```
```bash
tantivy search -i wikipedia-index -q "barack obama"
tantivy search -i hdfs --query "*" --agg '{"severities":{"terms":{"field":"severity_text"}}}'
```


# Benchmark the index: `bench`

Tantivy's cli provides a simple benchmark tool.
You can run it with the following command.

```
```bash
tantivy bench -i wikipedia-index -n 10 -q queries.txt
```
2 changes: 1 addition & 1 deletion src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ fn main() {
.arg(Arg::new("host")
.long("host")
.value_name("host")
.default_value("localhost")
.help("host to listen to")
.default_value("localhost")
.value_parser(clap::value_parser!(String))
Expand All @@ -42,7 +43,6 @@ fn main() {
.long("port")
.value_name("port")
.help("Port")
.default_value("3000")
.value_parser(clap::value_parser!(usize))
)
)
Expand Down