-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Make if_any() and if_all() consistent in all contexts
#7747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if_any() and if_all() with zero or one inputsif_any() and if_all() consistent in all contexts
| @@ -0,0 +1,76 @@ | |||
| # CLAUDE.md | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plucked from ellmer, with modifications. I'm just trying it out.
|
|
||
| * When called with one input, both now return logical vectors rather than the original column. | ||
|
|
||
| * The result of applying `.fns` now must be a logical vector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, but hopefully not a common issue? We really do want the result of .fns to be logical vectors, particularly for #7746 to work right.
The vctrs functions for pany/pall will probably also be pretty strict about requiring logical inputs.
| dplyr_list_pany_pall(x, "any", ..., size = size, error_call = error_call) | ||
| } | ||
|
|
||
| dplyr_list_pany_pall <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hoping to just remove this in favor of the vctrs versions soon, but I needed to get the semantics of it right enough to be able to add all the tests here
| init <- vec_rep(init, times = size) | ||
|
|
||
| reduce(x, op, .init = init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having an initial value is an important reason that the 0 and 1 input cases now "just work" correctly. This was NULL before.
| expr <- expr({ | ||
| ns <- asNamespace("dplyr") | ||
|
|
||
| combine <- function(x, y) { | ||
| if (is_null(x)) { | ||
| y | ||
| } else { | ||
| call(op, x, y) | ||
| } | ||
| } | ||
| expr <- reduce(quos, combine, .init = NULL) | ||
| x <- list(!!!quos) | ||
|
|
||
| # In the evaluation path, `across()` automatically recycles to common size, | ||
| # so we must here as well for compatibility. `across()` also returns a 0 | ||
| # col, 1 row data frame in the case of no inputs so that it will recycle to | ||
| # the group size, which we also do here. | ||
| size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(!!if_fn)) | ||
| x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(!!if_fn)) | ||
|
|
||
| ns[[!!dplyr_fn]](x, size = size, error_call = call(!!if_fn)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same kind of trick we use in as_pick_expansion() for the expansion path. I've carefully tried to match it to the evaluation path.
Basically we replace if_any(c(x, y), fn) with something like
x <- list(x = x, y = y)
ns <- asNamespace("dplyr")
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(if_any()))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(if_any()))
ns[["dplyr_list_pany"]](x, size = size, error_call = call(if_any()))| # Version of `vec_size_common()` that takes a list. | ||
| # Useful for delaying `!!!` when used within an `expr()` call. | ||
| dplyr_list_size_common <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm somewhat convinced that in vctrs we should have exported list_size_common() rather than vec_size_common(), i.e.:
obj_check_vector(x)
list_check_all_vectors(xs)
vec_size(x)
list_size_common(xs)
vec_recycle(x)
list_recycle_common(xs)
vec_cast(x, to)
list_cast_common(xs, to)| ) | ||
| }) | ||
|
|
||
| test_that("`across()` recycle `.fns` results to common size", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Along the way I nearly changed across() to recycle inputs to the group size rather than recycling them to their common size. I think that would have been a mistake so I've added a test to prevent us from ever thinking of doing this.
| }) | ||
| }) | ||
|
|
||
| test_that("`if_any()` and `if_all()` have consistent behavior across `filter()` and `mutate()`", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mega test to make sure things are consistent everywhere.
It's obviously a lot of tests, but I think we really do need them all to be sure we aren't missing an edge case. These are all very hard to reason about since there are so many dimensions that intersect (filter vs mutate, expansion vs evaluation, groups vs no groups, etc)
Let's put it this way, I feel way more confident about this now that we have this test that hits every edge case
And add a battery of tests to ensure we don't regress on this consistency
5ad9c62 to
21023cb
Compare
Closes #7746
Closes #7077
It turns out that our application of
if_any()andif_all()was fairly inconsistent. This is due to the fact that they are tricky to get right, we have 2 different implementations of them. One for the expansion case and one for the evaluation case. I've now tried to unify these to use the same underlying implementation,dplyr_list_pany()ordplyr_list_pall()depending on the scenario. I'll probably look into reviving these r-lib/vctrs#1675 because I think they would be useful for this.In addition to greater consistency across the board, you'll also note that in errors the
In argument:label also now reports the original expression pre-expansion in thefilter()cases, which is a much better errorIn the examples below, for
filter(), note that adding()around theif_any()orif_all()calls triggers the evaluation case rather than the expansion case.