Skip to content

Add waterdata-client dataframe transformers#307

Merged
jarq6c merged 20 commits intoNOAA-OWP:mainfrom
jarq6c:waterdata-transformers
May 1, 2026
Merged

Add waterdata-client dataframe transformers#307
jarq6c merged 20 commits intoNOAA-OWP:mainfrom
jarq6c:waterdata-transformers

Conversation

@jarq6c
Copy link
Copy Markdown
Collaborator

@jarq6c jarq6c commented May 1, 2026

This PR introduces new functionality to transform deserialized GeoJSON into alternative formats. Prior to this PR, all classes defined in clients.py could only return dict. This PR adds the ability to further process these results into more useful formats for data scientists. This PR also adds column label mapping from WaterData labels to HydroTools canonical labels for some backwards compatibility with hydrotools.nwis_client.

Changes

  • pyproject.toml Added geopandas as a dependency.
  • base_client.py Added new transformer attribute (a Callable type) and _handle_response method that calls the transformer.
  • clients.py Regenerated from template.
  • constants.py Regenerated from template.
  • transformers.py New module that defines a ResponseTransformer protocol and TransformedResponse_co TypeVar. Also includes three example transformer methods: check_features, to_geodataframe, and to_dataframe.
  • clients.py.j2 Updates Jinja2 template for clients.py to include application of optional tranformer.
  • constants.py.j2 Updates Jinja2 template with column enums and mapping from waterdata labels to hydrotools canonical columns.
  • README.md Added example using new to_geodataframe transformer.

Testing

  • test_clients.py Updated to account for new transformer kwarg. Set to None to maintain current tests.
  • test_transformers.py Add testing for transformers.

Checklist

  • PR has an informative and human-readable title
  • PR is well outlined and documented. See #12 for an example
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (see CONTRIBUTING.md)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output) using numpy docstring formatting
  • Placeholder code is flagged / future todos are captured in comments
  • Reviewers requested with the Reviewers tool ➡️

@jarq6c jarq6c self-assigned this May 1, 2026
@jarq6c jarq6c added the enhancement New feature or request label May 1, 2026
@jarq6c jarq6c requested a review from christophertubbs May 1, 2026 15:57
Comment thread python/waterdata_client/src/hydrotools/waterdata_client/base_client.py Outdated
Copy link
Copy Markdown
Contributor

@christophertubbs christophertubbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have been better to set TransformedResponse_co to something like:

TransformedResponse = Sequence[Mapping[str, Any]]

in order to keep the interface uniform where the transformer's duties are just to modify what will eventually get passed into other constructors. May also be worth coming up with an alias for list[dict[str, Any]] anyways since it sounds like it fits Rows or something.

The only thing I consider a tad shaky is the vagueness of what TransformedResponse_co might be, but I'm not looking at this in an editor and the possibility of IDE confusion doesn't qualify as a blocker.

@jarq6c
Copy link
Copy Markdown
Collaborator Author

jarq6c commented May 1, 2026

It might have been better to set TransformedResponse_co to something like:

TransformedResponse = Sequence[Mapping[str, Any]]

in order to keep the interface uniform where the transformer's duties are just to modify what will eventually get passed into other constructors. May also be worth coming up with an alias for list[dict[str, Any]] anyways since it sounds like it fits Rows or something.

The only thing I consider a tad shaky is the vagueness of what TransformedResponse_co might be, but I'm not looking at this in an editor and the possibility of IDE confusion doesn't qualify as a blocker.

TransformedResponse_co is a template type variable. It could literally be anything. Here's a (rather silly) motivating example of what the type engine does with the TypeVar. In VSCode, the linter correctly shows that client.get returns int.

from typing import Any
from hydrotools.waterdata_client import LatestContinuousClient

# Example custom transformer
def my_custom_transformer(data: list[dict[str, Any]]) -> int:
    return 42

# Instantiate client with custom transformer
client = LatestContinuousClient(
    transformer=my_custom_transformer
)

# Call client.get
value = client.get(
    monitoring_location_id="USGS-02146470"
    )

# Look at result
print(value) # 42

@jarq6c
Copy link
Copy Markdown
Collaborator Author

jarq6c commented May 1, 2026

Actually, I think I'll rename the type variable to TransformedResponseT_co to make it more explicit.

@jarq6c
Copy link
Copy Markdown
Collaborator Author

jarq6c commented May 1, 2026

WRT a type alias for list[dict[str, Any]], we might look into using geojson-pydantic. I'm considering introducing pydantic for query validation eventually.

https://developmentseed.org/geojson-pydantic/intro/
https://github.com/developmentseed/geojson-pydantic

@jarq6c jarq6c merged commit 321b4a2 into NOAA-OWP:main May 1, 2026
3 checks passed
@jarq6c jarq6c deleted the waterdata-transformers branch May 1, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants