Skip to content

Update embedders settings, hybrid search, and add tests for AI search methods #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,27 @@ JSON output:
}
```

#### Hybrid Search <!-- omit in toc -->

Hybrid search combines traditional keyword search with semantic search for more relevant results. You need to have an embedder configured in your index settings to use this feature.

```python
# Using hybrid search with the search method
index.search(
'action movie',
{
"hybrid": {"semanticRatio": 0.5, "embedder": "default"}
}
)
```

The `semanticRatio` parameter (between 0 and 1) controls the balance between keyword search and semantic search:
- 0: Only keyword search
- 1: Only semantic search
- Values in between: A mix of both approaches

The `embedder` parameter specifies which configured embedder to use for the semantic search component.

#### Custom Search With Filters <!-- omit in toc -->

If you want to enable filtering, you must add your attributes to the `filterableAttributes` index setting.
Expand Down
122 changes: 84 additions & 38 deletions meilisearch/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,22 @@
from meilisearch.config import Config
from meilisearch.errors import version_error_hint_message
from meilisearch.models.document import Document, DocumentsResults
from meilisearch.models.index import (
from meilisearch.models.embedders import (
Embedders,
Faceting,
EmbedderType,
HuggingFaceEmbedder,
IndexStats,
LocalizedAttributes,
OllamaEmbedder,
OpenAiEmbedder,
RestEmbedder,
UserProvidedEmbedder,
)
from meilisearch.models.index import (
Faceting,
IndexStats,
LocalizedAttributes,
Pagination,
ProximityPrecision,
RestEmbedder,
TypoTolerance,
UserProvidedEmbedder,
)
from meilisearch.models.task import Task, TaskInfo, TaskResults
from meilisearch.task import TaskHandler
Expand Down Expand Up @@ -277,14 +280,21 @@ def get_stats(self) -> IndexStats:
def search(self, query: str, opt_params: Optional[Mapping[str, Any]] = None) -> Dict[str, Any]:
"""Search in the index.

https://www.meilisearch.com/docs/reference/api/search

Parameters
----------
query:
String containing the searched word(s)
opt_params (optional):
Dictionary containing optional query parameters.
Note: The vector parameter is only available in Meilisearch >= v1.13.0
https://www.meilisearch.com/docs/reference/api/search#search-in-an-index
Common parameters include:
- hybrid: Dict with 'semanticRatio' and 'embedder' fields for hybrid search
- vector: Array of numbers for vector search
- retrieveVectors: Boolean to include vector data in search results
- filter: Filter queries by an attribute's value
- limit: Maximum number of documents returned
- offset: Number of documents to skip

Returns
-------
Expand All @@ -298,7 +308,9 @@ def search(self, query: str, opt_params: Optional[Mapping[str, Any]] = None) ->
"""
if opt_params is None:
opt_params = {}

body = {"q": query, **opt_params}

return self.http.post(
f"{self.config.paths.index}/{self.uid}/{self.config.paths.search}",
body=body,
Expand Down Expand Up @@ -955,14 +967,7 @@ def get_settings(self) -> Dict[str, Any]:
)

if settings.get("embedders"):
embedders: dict[
str,
OpenAiEmbedder
| HuggingFaceEmbedder
| OllamaEmbedder
| RestEmbedder
| UserProvidedEmbedder,
] = {}
embedders: dict[str, EmbedderType] = {}
for k, v in settings["embedders"].items():
if v.get("source") == "openAi":
embedders[k] = OpenAiEmbedder(**v)
Expand All @@ -988,6 +993,26 @@ def update_settings(self, body: MutableMapping[str, Any]) -> TaskInfo:
----------
body:
Dictionary containing the settings of the index.
Supported settings include:
- 'rankingRules': List of ranking rules
- 'distinctAttribute': Attribute for deduplication
- 'searchableAttributes': Attributes that can be searched
- 'displayedAttributes': Attributes to display in search results
- 'stopWords': Words ignored in search queries
- 'synonyms': Dictionary of synonyms
- 'filterableAttributes': Attributes that can be used for filtering
- 'sortableAttributes': Attributes that can be used for sorting
- 'typoTolerance': Settings for typo tolerance
- 'pagination': Settings for pagination
- 'faceting': Settings for faceting
- 'dictionary': List of custom dictionary words
- 'separatorTokens': List of separator tokens
- 'nonSeparatorTokens': List of non-separator tokens
- 'embedders': Dictionary of embedder configurations for AI-powered search
- 'searchCutoffMs': Maximum search time in milliseconds
- 'proximityPrecision': Precision for proximity ranking
- 'localizedAttributes': Settings for localized attributes

More information:
https://www.meilisearch.com/docs/reference/api/settings#update-settings

Expand All @@ -1000,7 +1025,8 @@ def update_settings(self, body: MutableMapping[str, Any]) -> TaskInfo:
Raises
------
MeilisearchApiError
An error containing details about why Meilisearch can't process your request. Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
An error containing details about why Meilisearch can't process your request.
Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
"""
if body.get("embedders"):
for _, v in body["embedders"].items():
Expand Down Expand Up @@ -1879,10 +1905,13 @@ def reset_non_separator_tokens(self) -> TaskInfo:
def get_embedders(self) -> Embedders | None:
"""Get embedders of the index.

Retrieves the current embedder configuration from Meilisearch.

Returns
-------
settings:
The embedders settings of the index.
Embedders:
The embedders settings of the index, or None if no embedders are configured.
Contains a dictionary of embedder configurations, where keys are embedder names.

Raises
------
Expand All @@ -1894,35 +1923,35 @@ def get_embedders(self) -> Embedders | None:
if not response:
return None

embedders: dict[
str,
OpenAiEmbedder
| HuggingFaceEmbedder
| OllamaEmbedder
| RestEmbedder
| UserProvidedEmbedder,
] = {}
embedders: dict[str, EmbedderType] = {}
for k, v in response.items():
if v.get("source") == "openAi":
source = v.get("source")
if source == "openAi":
embedders[k] = OpenAiEmbedder(**v)
elif v.get("source") == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif v.get("source") == "huggingFace":
elif source == "huggingFace":
embedders[k] = HuggingFaceEmbedder(**v)
elif v.get("source") == "rest":
elif source == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif source == "rest":
embedders[k] = RestEmbedder(**v)
elif source == "userProvided":
embedders[k] = UserProvidedEmbedder(**v)
else:
# Default to UserProvidedEmbedder for unknown sources
embedders[k] = UserProvidedEmbedder(**v)

return Embedders(embedders=embedders)

def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskInfo:
"""Update embedders of the index.

Updates the embedder configuration for the index. The embedder configuration
determines how Meilisearch generates vector embeddings for documents.

Parameters
----------
body: dict
Dictionary containing the embedders.
Dictionary containing the embedders configuration.

Returns
-------
Expand All @@ -1933,13 +1962,28 @@ def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskI
Raises
------
MeilisearchApiError
An error containing details about why Meilisearch can't process your request. Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
An error containing details about why Meilisearch can't process your request.
Meilisearch error codes are described here: https://www.meilisearch.com/docs/reference/errors/error_codes#meilisearch-errors
"""
if body is not None and body.get("embedders"):
embedders: dict[str, EmbedderType] = {}
for k, v in body["embedders"].items():
source = v.get("source")
if source == "openAi":
embedders[k] = OpenAiEmbedder(**v)
elif source == "huggingFace":
embedders[k] = HuggingFaceEmbedder(**v)
elif source == "ollama":
embedders[k] = OllamaEmbedder(**v)
elif source == "rest":
embedders[k] = RestEmbedder(**v)
elif source == "userProvided":
embedders[k] = UserProvidedEmbedder(**v)
else:
# Default to UserProvidedEmbedder for unknown sources
embedders[k] = UserProvidedEmbedder(**v)

if body:
for _, v in body.items():
if "documentTemplateMaxBytes" in v and v["documentTemplateMaxBytes"] is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this handling done by Meili now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing it did not trigger any test failure but it might simply be untested, so I added it back to avoid any unwanted side effects

del v["documentTemplateMaxBytes"]
body = {"embedders": {k: v.model_dump(by_alias=True) for k, v in embedders.items()}}

task = self.http.patch(self.__settings_url_for(self.config.paths.embedders), body)

Expand All @@ -1948,6 +1992,8 @@ def update_embedders(self, body: Union[MutableMapping[str, Any], None]) -> TaskI
def reset_embedders(self) -> TaskInfo:
"""Reset embedders of the index to default values.

Removes all embedder configurations from the index.

Returns
-------
task_info:
Expand Down
Loading