Skip to content

Commit

Permalink
docs: Updated README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Anush008 committed Feb 28, 2024
1 parent 41d40c4 commit d382438
Showing 1 changed file with 6 additions and 7 deletions.
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ The packaged `jar` file releases can be found [here](https://github.com/qdrant/q

### Building from source πŸ› οΈ

To build the `jar` from source, you need [JDK@17](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html) and [Maven](https://maven.apache.org/) installed.
To build the `jar` from source, you need [JDK@18](https://www.azul.com/downloads/#zulu) and [Maven](https://maven.apache.org/) installed.
Once the requirements have been satisfied, run the following command in the project root. πŸ› οΈ

```bash
mvn package -P assembly
mvn package
```

This will build and store the fat JAR in the `target` directory by default.
Expand All @@ -43,7 +43,7 @@ from pyspark.sql import SparkSession

spark = SparkSession.builder.config(
"spark.jars",
"spark-2.0-jar-with-dependencies.jar", # specify the downloaded JAR file
"spark-2.0.jar", # specify the downloaded JAR file
)
.master("local[*]")
.appName("qdrant")
Expand All @@ -58,7 +58,7 @@ To load data into Qdrant, a collection has to be created beforehand with the app
<pyspark.sql.DataFrame>
.write
.format("io.qdrant.spark.Qdrant")
.option("qdrant_url", <QDRANT_URL>)
.option("qdrant_url", <QDRANT_GRPC_URL>)
.option("collection_name", <QDRANT_COLLECTION_NAME>)
.option("embedding_field", <EMBEDDING_FIELD_NAME>) # Expected to be a field of type ArrayType(FloatType)
.option("schema", <pyspark.sql.DataFrame>.schema.json())
Expand All @@ -81,17 +81,16 @@ You can use the `qdrant-spark` connector as a library in Databricks to ingest da

## Datatype support πŸ“‹

Qdrant supports all the Spark data types, and the appropriate types are mapped based on the provided `schema`.
Qdrant supports all the Spark data types. The appropriate types are mapped based on the provided `schema`.

## Options and Spark types πŸ› οΈ

| Option | Description | DataType | Required |
| :---------------- | :------------------------------------------------------------------------ | :--------------------- | :------- |
| `qdrant_url` | REST URL of the Qdrant instance | `StringType` | βœ… |
| `qdrant_url` | GRPC URL of the Qdrant instance. Eg: <http://localhost:6334> | `StringType` | βœ… |
| `collection_name` | Name of the collection to write data into | `StringType` | βœ… |
| `embedding_field` | Name of the field holding the embeddings | `ArrayType(FloatType)` | βœ… |
| `schema` | JSON string of the dataframe schema | `StringType` | βœ… |
| `mode` | Write mode of the dataframe. Supports "append". | `StringType` | βœ… |
| `id_field` | Name of the field holding the point IDs. Default: Generates a random UUId | `StringType` | ❌ |
| `batch_size` | Max size of the upload batch. Default: 100 | `IntType` | ❌ |
| `retries` | Number of upload retries. Default: 3 | `IntType` | ❌ |
Expand Down

0 comments on commit d382438

Please sign in to comment.