Skip to content

Adding support to exclude semantic_text subfields #127664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Samiul-TheSoccerFan
Copy link
Contributor

Update the fieldCaps API to exclude semantic_text subfields in both legacy and new formats.

Legacy format:

setup:


PUT test-field-caps-with-legacy
{
    "settings": {
        "index.mapping.semantic_text.use_legacy_format": true
    },
    "mappings": {
        "properties": {
            "test_field_legacy": {
                "type": "semantic_text",
                "inference_id": ".elser-2-elasticsearch"
            },
            "non_infer_field_legacy": {
                "type": "text"
            },
            "sparse_vector_legacy": {
                "type": "sparse_vector"
            },
            "dense_vector_legacy": {
                "type": "dense_vector",
                "dims": 3,
                "similarity": "l2_norm"
            }
        }
    }
}

PUT test-field-caps-with-legacy/_doc/doc1
{
    "test_field_legacy": "these are not the droids you're looking for. He's free to go around",
    "sparse_vector_legacy": {
        "these": 1,
        "are": 2,
        "not": 3
    },
    "dense_vector_legacy": [1, 2, 3]
}

Query:

GET /_field_caps?allow_no_indices=true&fields=*&index=test*&ignore_unavailable=true&expand_wildcards=open

Response before update (Skimmed):

{
  "indices": [
    "test-field-caps-with-legacy"
  ],
  "fields": {
    "non_infer_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks.text": {
      "keyword": {
        "type": "keyword",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference": {
      "object": {
        "type": "object",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "sparse_vector_legacy": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks.embeddings": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector_legacy": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy.inference.chunks": {
      "nested": {
        "type": "nested",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    }
  }
}

Response after update (Skimmed):

{
  "indices": [
    "test-field-caps-with-legacy"
  ],
  "fields": {
    "non_infer_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "sparse_vector_legacy": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field_legacy": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector_legacy": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    }
  }
}

new format:

setup:

PUT test-field-caps
{
    "mappings": {
        "properties": {
            "test_field": {
                "type": "semantic_text",
                "inference_id": ".elser-2-elasticsearch"
            },
            "non_infer_field": {
                "type": "text"
            },
            "sparse_vector": {
                "type": "sparse_vector"
            },
            "dense_vector": {
                "type": "dense_vector",
                "dims": 3,
                "similarity": "l2_norm"
            }
        }
    }
}

PUT test-field-caps/_doc/doc1
{
    "test_field": "these are not the droids you're looking for. He's free to go around",
    "sparse_vector": {
        "these": 1,
        "are": 2,
        "not": 3
    },
    "dense_vector": [1, 2, 3]
}

Query:

GET /_field_caps?allow_no_indices=true&fields=*&index=test*&ignore_unavailable=true&expand_wildcards=open

Response before update (Skimmed):

{
  "indices": [
    "test-field-caps"
  ],
  "fields": {
    "_ignored_source": {
      "_ignored_source": {
        "type": "_ignored_source",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "non_infer_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_index": {
      "_index": {
        "type": "_index",
        "metadata_field": true,
        "searchable": true,
        "aggregatable": true
      }
    },
    "_feature": {
      "_feature": {
        "type": "_feature",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "sparse_vector": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks.embeddings": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks.offset": {
      "offset_source": {
        "type": "offset_source",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_inference_fields": {
      "_inference_fields": {
        "type": "_inference_fields",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "test_field.inference": {
      "object": {
        "type": "object",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    },
    "dense_vector": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "test_field.inference.chunks": {
      "nested": {
        "type": "nested",
        "metadata_field": false,
        "searchable": false,
        "aggregatable": false
      }
    }
  }
}

Response after update (Skimmed):

{
  "indices": [
    "test-field-caps"
  ],
  "fields": {
    "test_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "_inference_fields": {
      "_inference_fields": {
        "type": "_inference_fields",
        "metadata_field": true,
        "searchable": false,
        "aggregatable": false
      }
    },
    "non_infer_field": {
      "text": {
        "type": "text",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "sparse_vector": {
      "sparse_vector": {
        "type": "sparse_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },
    "dense_vector": {
      "dense_vector": {
        "type": "dense_vector",
        "metadata_field": false,
        "searchable": true,
        "aggregatable": false
      }
    },    
  }
}

@Samiul-TheSoccerFan Samiul-TheSoccerFan added >enhancement v9.1.0 :Search Foundations/Mapping Index mappings, including merging and defining field types :Search Relevance/Vectors Vector search :SearchOrg/Relevance Label for the Search (solution/org) Relevance team labels May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you.

Comment on lines +365 to +367
- requires:
cluster_features: "gte_v8.16.0"
reason: field_caps support for semantic_text added in 8.16.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define a new cluster feature? As per my understanding, these fields are not expected from field_caps API so excluding these should not have an impact on the API level or discover. We have also covered backward compatibility through other yaml file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types :Search Relevance/Vectors Vector search :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants