-
Notifications
You must be signed in to change notification settings - Fork 34
Remove legacy search indicators logic and config #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
b9b25a7
Remove use_search_indicators and legacy search configs from settings.py
keyurva 39560a9
Remove legacy search indicators logic and parameters from DCClient
keyurva e6ad519
Clean up mock setups and tests for legacy search
keyurva b210d3e
Fix lint errors
keyurva 753f268
Cleanup .env.sample
keyurva File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -21,7 +21,6 @@ | |
| import re | ||
| from pathlib import Path | ||
|
|
||
| import requests | ||
| from datacommons_client.client import DataCommonsClient | ||
|
|
||
| from datacommons_mcp._constrained_vars import place_statvar_constraint_mapping | ||
|
|
@@ -60,35 +59,22 @@ def __init__( | |
| self, | ||
| dc: DataCommonsClient, | ||
| search_scope: SearchScope = SearchScope.BASE_ONLY, | ||
| base_index: str = "base_uae_mem", | ||
| custom_index: str | None = None, | ||
| sv_search_base_url: str = "https://datacommons.org", | ||
| topic_store: TopicStore | None = None, | ||
| _place_like_constraints: list[str] | None = None, | ||
| *, | ||
| use_search_indicators: bool = False, | ||
| ) -> None: | ||
| """ | ||
| Initialize the DCClient with a DataCommonsClient and search configuration. | ||
|
|
||
| Args: | ||
| dc: DataCommonsClient instance | ||
| search_scope: SearchScope enum controlling search behavior | ||
| base_index: Index to use for base DC searches | ||
| custom_index: Index to use for custom DC searches (None for base DC) | ||
| sv_search_base_url: Base URL for SV search endpoint | ||
| topic_store: Optional TopicStore for caching | ||
|
|
||
| # TODO(@jm-rivera): Remove this parameter once new endpoint is live. | ||
| _place_like_constraints: Optional list of place-like constraints | ||
| """ | ||
| self.dc = dc | ||
| self.search_scope = search_scope | ||
| self.base_index = base_index | ||
| self.custom_index = custom_index | ||
| # Precompute search indices to validate configuration at instantiation time | ||
| self.search_indices = self._compute_search_indices() | ||
| self.sv_search_base_url = sv_search_base_url | ||
| self.variable_cache = LruCache(128) | ||
|
|
||
| if topic_store is None: | ||
|
|
@@ -100,31 +86,9 @@ def __init__( | |
| else: | ||
| self._place_like_statvar_store = {} | ||
|
|
||
| self.use_search_indicators = use_search_indicators | ||
|
|
||
| # | ||
| # Initialization & Configuration | ||
| # | ||
| def _compute_search_indices(self) -> list[str]: | ||
| """Compute and validate search indices based on the configured search_scope. | ||
|
|
||
| Raises a ValueError immediately for invalid configurations (e.g., CUSTOM_ONLY | ||
| without a custom_index). | ||
| """ | ||
| indices: list[str] = [] | ||
|
|
||
| if self.search_scope in [SearchScope.CUSTOM_ONLY, SearchScope.BASE_AND_CUSTOM]: | ||
| if self.custom_index is not None and self.custom_index != "": | ||
| indices.append(self.custom_index) | ||
| elif self.search_scope == SearchScope.CUSTOM_ONLY: | ||
| raise ValueError( | ||
| "Custom index not configured but CUSTOM_ONLY search scope requested" | ||
| ) | ||
|
|
||
| if self.search_scope in [SearchScope.BASE_ONLY, SearchScope.BASE_AND_CUSTOM]: | ||
| indices.append(self.base_index) | ||
|
|
||
| return indices | ||
|
|
||
| def _compute_place_like_statvar_store(self, constraints: list[str]) -> None: | ||
| """Compute and cache place-like to statistical variable mappings. | ||
|
|
@@ -275,10 +239,6 @@ def _get_topic_places_with_data( | |
|
|
||
| return places_with_data | ||
|
|
||
| # | ||
| # New Search Indicators Endpoint (/api/nl/search-indicators) | ||
| # | ||
|
|
||
| def _check_topic_exists_recursive( | ||
| self, topic_dcid: str, place_dcids: list[str] | ||
| ) -> bool: | ||
|
|
@@ -359,60 +319,6 @@ def _expand_topics_to_variables( | |
|
|
||
| return list(expanded_variables.values()) | ||
|
|
||
| async def _call_search_indicators_temp( | ||
| self, queries: list[str], *, max_results: int = 10 | ||
| ) -> dict: | ||
| """ | ||
| Temporary method that mirrors search_svs but calls the new search-indicators endpoint. | ||
|
|
||
| This method takes the same arguments and returns the same structure as search_svs, | ||
| but uses the new /api/nl/search-indicators endpoint instead of /api/nl/search-vector. | ||
|
|
||
| This method is temporary to create a minimal delta between the two endpoints to minimize the impact of the change. | ||
| After the 1.0 release, this method should be removed in favor of a more complete implementation. | ||
|
|
||
| Returns: | ||
| Dictionary mapping query strings to lists of results with 'SV' and 'CosineScore' keys | ||
| """ | ||
| results_map = {} | ||
| endpoint_url = f"{self.sv_search_base_url}/api/nl/search-indicators" | ||
| headers = {"Content-Type": "application/json", **SURFACE_HEADER} | ||
|
|
||
| # Use precomputed indices based on configured search scope | ||
| indices = self.search_indices | ||
|
|
||
| for query in queries: | ||
| # Prepare parameters for the new endpoint | ||
| params = { | ||
| "queries": [query], | ||
| "limit_per_index": max_results, | ||
| "index": indices, | ||
| } | ||
|
|
||
| try: | ||
| response = await asyncio.to_thread( | ||
| requests.get, | ||
| endpoint_url, | ||
| params=params, | ||
| headers=headers, # noqa: S113 | ||
| ) | ||
| response.raise_for_status() | ||
| api_response = response.json() | ||
|
|
||
| # Transform the response to match search_svs format | ||
| transformed_results = self._transform_search_indicators_to_svs_format( | ||
| api_response, max_results=max_results | ||
| ) | ||
| results_map[query] = transformed_results | ||
|
|
||
| except Exception as e: # noqa: BLE001 | ||
| logger.error( | ||
| "An unexpected error occurred for query '%s': %s", query, e | ||
| ) | ||
| results_map[query] = [] | ||
|
|
||
| return results_map | ||
|
|
||
| def _call_fetch_indicators(self, queries: list[str]) -> dict: | ||
| """ | ||
| Helper method to call the datacommons-client fetch_indicators and transform the response. | ||
|
|
@@ -473,43 +379,6 @@ def _call_fetch_indicators(self, queries: list[str]) -> dict: | |
|
|
||
| return results_map | ||
|
|
||
| def _transform_search_indicators_to_svs_format( | ||
| self, api_response: dict, *, max_results: int = 10 | ||
| ) -> list[dict]: | ||
| """ | ||
| Transform search-indicators response to match search_svs format. | ||
|
|
||
| Returns: | ||
| List of dictionaries with 'SV' and 'CosineScore' keys | ||
| """ | ||
| results = [] | ||
| query_results = api_response.get("queryResults", []) | ||
|
|
||
| for query_result in query_results: | ||
| for index_result in query_result.get("indexResults", []): | ||
| for indicator in index_result.get("results", []): | ||
| dcid = indicator.get("dcid") | ||
| if not dcid: | ||
| continue | ||
|
|
||
| # Extract score (default to 0.0 if not present) | ||
| score = indicator.get("score", 0.0) | ||
|
|
||
| results.append( | ||
| { | ||
| "SV": dcid, | ||
| "CosineScore": score, | ||
| "description": indicator.get("description"), | ||
| "alternate_descriptions": indicator.get( | ||
| "search_descriptions" | ||
| ), | ||
| } | ||
| ) | ||
|
|
||
| # Sort by score descending, then limit results | ||
| results.sort(key=lambda x: x["CosineScore"], reverse=True) | ||
| return results[:max_results] | ||
|
|
||
| async def fetch_indicators( | ||
| self, | ||
| query: str, | ||
|
|
@@ -616,25 +485,24 @@ async def fetch_indicators( | |
| } | ||
|
|
||
| async def _search_vector( | ||
| self, query: str, max_results: int = 10, *, include_topics: bool = True | ||
| self, | ||
| query: str, | ||
| # TODO(keyurs): Use max_results once it's supported by the underlying client. | ||
| # The noqa: ARG002 is to suppress the unused argument error. | ||
| max_results: int = 10, # noqa: ARG002 | ||
| *, | ||
| include_topics: bool = True, | ||
| ) -> dict: | ||
| """ | ||
| Search for topics and variables using the search-indicators or search-vector endpoint. | ||
| Search for topics and variables using the fetch_indicators library method. | ||
| """ | ||
| # Always include topics since we need to expand topics to variables. | ||
| if self.use_search_indicators: | ||
| logger.info("Calling legacy search-indicators endpoint for: '%s'", query) | ||
| search_results = await self._call_search_indicators_temp( | ||
| queries=[query], | ||
| max_results=max_results, | ||
| ) | ||
| else: | ||
| logger.info("Calling client library fetch_indicators for: '%s'", query) | ||
| # Run the synchronous client method in a thread | ||
| search_results = await asyncio.to_thread( | ||
| self._call_fetch_indicators, | ||
| queries=[query], | ||
| ) | ||
| logger.info("Calling client library fetch_indicators for: '%s'", query) | ||
| # Run the synchronous client method in a thread | ||
| search_results = await asyncio.to_thread( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this need to be in a thread and does this overall (clients._search_vector) method need to be async? |
||
| self._call_fetch_indicators, | ||
| queries=[query], | ||
| ) | ||
|
|
||
| results = search_results.get(query, []) | ||
|
|
||
|
|
@@ -839,19 +707,14 @@ def _create_base_dc_client(settings: BaseDCSettings) -> DCClient: | |
| } | ||
| if settings.api_root: | ||
| logger.info("Using API root for base DC: %s", settings.api_root) | ||
| logger.info("Using search root for base DC: %s", settings.search_root) | ||
| dc_client_args["url"] = settings.api_root | ||
| dc = DataCommonsClient(**dc_client_args) | ||
|
|
||
| # Create DCClient | ||
| return DCClient( | ||
| dc=dc, | ||
| search_scope=SearchScope.BASE_ONLY, | ||
| base_index=settings.base_index, | ||
| custom_index=None, | ||
| sv_search_base_url=settings.search_root, | ||
| topic_store=topic_store, | ||
| use_search_indicators=settings.use_search_indicators, | ||
| ) | ||
|
|
||
|
|
||
|
|
@@ -884,11 +747,7 @@ def _create_custom_dc_client(settings: CustomDCSettings) -> DCClient: | |
| return DCClient( | ||
| dc=dc, | ||
| search_scope=search_scope, | ||
| base_index=settings.base_index, | ||
| custom_index=settings.custom_index, | ||
| sv_search_base_url=settings.custom_dc_url, # Use custom_dc_url as sv_search_base_url | ||
| topic_store=topic_store, | ||
| # TODO (@jm-rivera): Remove place-like parameter new search endpoint is live. | ||
| _place_like_constraints=settings.place_like_constraints, | ||
| use_search_indicators=settings.use_search_indicators, | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor some of these methods and/or rename them? I think the current flow is
services.search_indicators->services._search_vector->client.fetch_indicators->client._search_vector->client._call_fetch_indicatorswhich is hard to follow and not clear what each step is actually doing or why it's grouped the way it is.
This could be a follow up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will do this in a follow up. And will look into the async question in the other comment as well.