Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Filter on Parent Doc fields inside Nested knn query fails for many Query types #2222

Open
krishy91 opened this issue Oct 21, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@krishy91
Copy link

What is the bug?

When a document contains vectors in nested documents, and we perform a nested knn query with filters set on the parent documents fields, the filters can only be specific Query types (like TermQuery). If for example a Phrase Query is specified as a filter, the knn query fails to retrieve any results at all. There are several other Query types (like exists, range etc) which also fail.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a simple index with nested vector objects & text fields on the parent document to apply filters over
  2. Index a couple of example documents
  3. Perform a nested neural search or nested knn search with filter in the neural search query /knn query set on the parent document field - (let the term be an exact match with one of the doucments in index)
{
    "query_string": {
        "query": "field_standard: \"Hello World\""
    }
}
  1. You can see that no results are returned

What is the expected behavior?

Evene when filters specify PhraseQuery or range query etc. the filters should be applied & results should be returned if any.

What is your host/environment?

  • OS: Windows
  • Version OpenSearch 2.17
  • Plugins - Neural Serach & knn plugin

Do you have any additional context?

On analysis, we found that @navneet1v added the functionality to support applying filters on parent documents here: #1356

The code uses the NestedHelper.mightMatchNestedDocs method determine whether to filter is applied on Parent doucment or nested document. Unfortunately, mightMatchNestedDocs method checks for speicifc Query types individually to see if they contain "field" & check if it is present in the parent or the nested doc. This list of Query types in not complete. Many commonly uses Query types which have "field" are missing like Phrase query, Range query etc.

https://github.com/opensearch-project/OpenSearch/blob/f1c98a4da0cf6583212eecc9ed8ebc3cd426a918/server/src/main/java/org/opensearch/index/search/NestedHelper.java#L65

@krishy91 krishy91 added bug Something isn't working untriaged labels Oct 21, 2024
@krishy91
Copy link
Author

Although this issue might have to resolved directly on NestedHelper, I wanted to know the others opinion on this issue and how to go about it. It affects the knn search & hence the Neural Search (for nested documents) directly.

@jmazanec15
Copy link
Member

@heemin32 could you take a look at this?

@brianjyee
Copy link

brianjyee commented Nov 6, 2024

I am also finding that must_not does not work.

Create index

PUT /knn
{
	"settings": {
		"index": {
			"knn": true,
			"knn.algo_param.ef_search": 100
		}
	},
	"mappings": {
		"properties": {
			"nested_field": {
				"type": "nested",
				"properties": {
					"my_vector1": {
						"type": "knn_vector",
						"dimension": 3,
						"method": {
							"name": "hnsw",
							"space_type": "l2",
							"engine": "faiss",
							"parameters": {
								"ef_construction": 128,
								"m": 24
							}
						}
					}
				}
			}
		}
	}
}

Index documents

PUT /_bulk?refresh=true
{ "index": { "_index": "knn", "_id": "1" } }
{"nested_field":[{"my_vector1":[1,1,1]},{"my_vector1":[2,2,2]},{"my_vector1":[3,3,3]}], "parking": "false"}
{ "index": { "_index": "knn", "_id": "2" } }
{"nested_field":[{"my_vector1":[10,10,10]},{"my_vector1":[11,11,11]},{"my_vector1":[12,12,12]}], "parking": "true"}
{ "index": { "_index": "knn", "_id": "3" } }
{"nested_field":[{"my_vector1":[1,1,1], "parking": "false"},{"my_vector1":[2,2,2]},{"my_vector1":[3,3,3]}]}
{ "index": { "_index": "knn", "_id": "4" } }
{"nested_field":[{"my_vector1":[10,10,10], "parking": "true"},{"my_vector1":[11,11,11]},{"my_vector1":[12,12,12]}]}

Query using must_not

GET knn/_search
{
	"query": {
		"nested": {
			"path": "nested_field",
			"query": {
				"knn": {
					"nested_field.my_vector1": {
						"vector": [
							1,
							1,
							1
						],
						"k": 2,
						"filter": {
							"bool": {
								"must_not": [
									{
										"term": {
											"parking": "false"
										}
									}
								]
							}
						}
					}
				}
			}
		}
	}
}

Should exclude id 1 but it does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Backlog (Hot)
Development

No branches or pull requests

4 participants