-
Notifications
You must be signed in to change notification settings - Fork 952
Used TypeDict for CachingVisitor.state #19135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.08
Are you sure you want to change the base?
Used TypeDict for CachingVisitor.state #19135
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
35f9909
to
1f6bcac
Compare
This changes `CachingVisitor.state` to used a `TypedDict`. Previously, we used a `Mapping[str, Any]`, which had two problems: 1. Risks typos in the key names causing unexpected KeyErrors 2. The `Any` type for the values of `state` mean that, at least by default, you won't get any type checking on things using values looked up from `state`. Unit tests should catch all these, but this can provide some earlier feedback on any issues. The `total=False` keyword is used in the typeddict to indicate that these keys are all optional. `mypy` *doesn't* error if you do a lookup on an optional field of a TypedDict, so we do still need to handle missing keys like we have been.
1f6bcac
to
cf2a77b
Compare
Hmm, failing on 3.10: https://github.com/rapidsai/cudf/actions/runs/15594634303/job/43923137155?pr=19135#step:10:7551
This might need to wait until we drop Python 3.10. |
Just seeing if CI passes with |
I'd like to add a dependency on diff --git a/conda/recipes/cudf-polars/recipe.yaml b/conda/recipes/cudf-polars/recipe.yaml
index 85740e5930..8133c7e8a5 100644
--- a/conda/recipes/cudf-polars/recipe.yaml
+++ b/conda/recipes/cudf-polars/recipe.yaml
@@ -50,6 +50,7 @@ requirements:
- python
- pylibcudf =${{ version }}
- polars >=1.24,<1.31
+ - typing-extensions # py==310
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
ignore_run_exports:
by_name:
diff --git a/python/cudf_polars/pyproject.toml b/python/cudf_polars/pyproject.toml
index b3d28e027d..9589a8a3f9 100644
--- a/python/cudf_polars/pyproject.toml
+++ b/python/cudf_polars/pyproject.toml
@@ -21,6 +21,7 @@ requires-python = ">=3.10"
dependencies = [
"polars>=1.25,<1.31",
"pylibcudf==25.8.*,>=0.0.0a0",
+ "typing-extensions; python_version < '3.11'",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
classifiers = [
"Intended Audience :: Developers", I'd hoped that 4ed5168 would do the trick, but that gives the same output as what's on 25.08 (no change). |
Hey @TomAugspurger -- I'm not sure if the dependency file generator knows how to handle version-specific dependencies in a I think it's reasonable to do something like: @@ -834,13 +834,11 @@ dependencies:
packages:
- polars>=1.25,<1.31
specific:
- - output_types: [conda, requirements, pyproject]
+ - output_types: pyproject
matrices:
- - matrix: {py: "3.10"}
- packages:
- - typing-extensions
- matrix: null
- packages: null
+ packages:
+ - "typing-extensions; python_version < '3.11'"
run_cudf_polars_experimental:
common:
- output_types: [conda, requirements, pyproject] The other spot where As for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packaging and CI looks good!
@rjzamora do you have any concerns with this? This adds a bit more friction to putting something in |
Sorry I missed this earlier. This PR probably moves things in the right direction, so I think the changes are fine. With that said, I do think it's pretty confusing to use the same |
I felt that awkwardness too. It's identical to what we have today, it's just that declaring the types makes it obvious that we're putting all kinds of things in there. I'll see what I can do. |
I pushed some changes that seem to do what we want. It's a bit complicated... but I think worth it. Now we have a
and we know that inside If folks are OK with this overall I'll clean up a couple things (docs, where we define these |
|
||
CachingVisitorState: TypeAlias = ( | ||
ExprExprState | ExprDecomposerState | LowerIRState | GenericState | ASTTransformer | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty negative on this change. I would be OK if one could define the protocol for the state near the point of use and one was just generic (I think invariantly) over the parameter. But this is not great. I need to now have globally unique names for the state dictionary, and I feel like you were probably struggling for good ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you were probably struggling for good ones
What gave that away? :)
Just to make sure we're on the same page about what we want:
- Visitor-specific state is a good thing. It lets users of the state know precisely what's available to them.
- We'd like to define the state variables locally to where they're inserted into the
CachingVisitor
(TBD whether that type needs to be visible to the user of the state as well, but these are often close to each other). - Anything that requires enumerating the list of states somewhere is a non-starter.
I think this ugly union came about from trying to get GenericTransformer
to type check successfully. I'll look into this a bit more (I haven't ever worked with covariance / contravariance before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
729b0b8 is hopefully closer to what we want. At a minimum, it's gotten rid of this Union
type.
If that looks OK, then I can move things to where we want them. We just need to figure out where to define the transformers (ExprTransformer
, LowerIRTransformer
, ...) and their associated states. LMK if you have a preference on where things end up at (in cudf_polars.typing
/ where they're used / somewhere else).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this a bit more (I haven't ever worked with covariance / contravariance before)
I wrote a thing (#17016 (comment)), I should probably make a web page...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, reading through that now
73fae84 has the proposed reorganization. The basic rules are to 1.) put the GenericTransformer
subtype next to where they're used, 2.) put the state
declaration next to the CachingVistor / GenericTransformer subtype, and 3.) prefer the name State
for the state dict, unless you need to avoid a name collision (which happens in to_ast.py
)
I'll defer to you on whether the extra effort is worth it, and I'll be happy to review. |
StateT_co = TypeVar("StateT_co", covariant=True) | ||
NodeT = TypeVar("NodeT", bound="nodebase.Node[Any]") | ||
StateT = TypeVar("StateT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we have StateT_co
(and the type parameter for GenericTransformer
should be covariant), then we don't need StateT
(all the specialisations should use StateT_co
), otherwise if you annotate something with StateT
you can provide a GenericTransformer
that has a subtype of the bound StateT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a19936a. I thought I had tried that, but apparently not. And I really am reading through #17016 (comment) now to understand the difference :)
- Move State dicts to where they're used - Move Transformer subclasses to where they're used
This changes
CachingVisitor
to used aTypedDict
for state. Previously, we used aMapping[str, Any]
, which had two problems:Any
type for the values ofstate
means that, at least by default, you won't get any type checking on things using values looked up fromstate
.Unit tests should catch all these, but this can provide some earlier feedback on any issues.
Now, we'll declare a
State
TypedDict
with some specific fields for each transformer that usesCachingVisitor
.