Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Oct 28, 2025

Closes #7746
Closes #7077

It turns out that our application of if_any() and if_all() was fairly inconsistent. This is due to the fact that they are tricky to get right, we have 2 different implementations of them. One for the expansion case and one for the evaluation case. I've now tried to unify these to use the same underlying implementation, dplyr_list_pany() or dplyr_list_pall() depending on the scenario. I'll probably look into reviving these r-lib/vctrs#1675 because I think they would be useful for this.

In addition to greater consistency across the board, you'll also note that in errors the In argument: label also now reports the original expression pre-expansion in the filter() cases, which is a much better error

In the examples below, for filter(), note that adding () around the if_any() or if_all() calls triggers the evaluation case rather than the expansion case.

library(dplyr)

# With zero inputs, if_any

# Before
df <- tibble(x = 1:2)
filter(df, if_any(c(), identity))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, (if_any(c(), identity)))
#> # A tibble: 2 × 1
#>       x
#>   <int>
#> 1     1
#> 2     2
filter(df, any())
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>

# After
df <- tibble(x = 1:2)
filter(df, if_any(c(), identity))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, (if_any(c(), identity)))
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
filter(df, any())
#> # A tibble: 0 × 1
#> # ℹ 1 variable: x <int>
# With one non-logical input

# Before
df <- tibble(x = 1:2)
filter(df, if_any(x, identity))
#> Error in `filter()`:
#> ℹ In argument: `(function (x) ...`.
#> Caused by error:
#> ! `..1` must be a logical vector, not an integer vector.
filter(df, (if_any(x, identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(x, identity))`.
#> Caused by error:
#> ! `..1` must be a logical vector, not an integer vector.
mutate(df, a = if_any(x, identity))
#> # A tibble: 2 × 2
#>       x     a
#>   <int> <int>
#> 1     1     1
#> 2     2     2

# After
df <- tibble(x = 1:2)
filter(df, if_any(x, identity))
#> Error in `filter()`:
#> ℹ In argument: `if_any(x, identity)`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
filter(df, (if_any(x, identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(x, identity))`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
mutate(df, a = if_any(x, identity))
#> Error in `mutate()`:
#> ℹ In argument: `a = if_any(x, identity)`.
#> Caused by error in `if_any()`:
#> ! `x` must be a logical vector, not an integer vector.
# In general, with non-logical types resulting from applying `.fns` we now error
# more appropriately

# Before
df <- tibble(x = c(TRUE, FALSE), y = c("a", "b"))
filter(df, if_any(c(x, y), identity))
#> Error in `filter()`:
#> ℹ In argument: `|...`.
#> Caused by error in `<function(x) x>(x) | <function(x) x>(y)`:
#> ! operations are possible only for numeric, logical or complex types
filter(df, (if_any(c(x, y), identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(c(x, y), identity))`.
#> Caused by error in `op()`:
#> ! operations are possible only for numeric, logical or complex types

# After
df <- tibble(x = c(TRUE, FALSE), y = c("a", "b"))
filter(df, if_any(c(x, y), identity))
#> Error in `filter()`:
#> ℹ In argument: `if_any(c(x, y), identity)`.
#> Caused by error in `if_any()`:
#> ! `y` must be a logical vector, not a character vector.
filter(df, (if_any(c(x, y), identity)))
#> Error in `filter()`:
#> ℹ In argument: `(if_any(c(x, y), identity))`.
#> Caused by error in `if_any()`:
#> ! `y` must be a logical vector, not a character vector.

@DavisVaughan DavisVaughan changed the title Fix evaluation paths of if_any() and if_all() with zero or one inputs Make if_any() and if_all() consistent in all contexts Oct 29, 2025
@@ -0,0 +1,76 @@
# CLAUDE.md
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plucked from ellmer, with modifications. I'm just trying it out.


* When called with one input, both now return logical vectors rather than the original column.

* The result of applying `.fns` now must be a logical vector.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, but hopefully not a common issue? We really do want the result of .fns to be logical vectors, particularly for #7746 to work right.

The vctrs functions for pany/pall will probably also be pretty strict about requiring logical inputs.

dplyr_list_pany_pall(x, "any", ..., size = size, error_call = error_call)
}

dplyr_list_pany_pall <- function(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hoping to just remove this in favor of the vctrs versions soon, but I needed to get the semantics of it right enough to be able to add all the tests here

Comment on lines +430 to +432
init <- vec_rep(init, times = size)

reduce(x, op, .init = init)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an initial value is an important reason that the 0 and 1 input cases now "just work" correctly. This was NULL before.

Comment on lines 713 to 725
expr <- expr({
ns <- asNamespace("dplyr")

combine <- function(x, y) {
if (is_null(x)) {
y
} else {
call(op, x, y)
}
}
expr <- reduce(quos, combine, .init = NULL)
x <- list(!!!quos)

# In the evaluation path, `across()` automatically recycles to common size,
# so we must here as well for compatibility. `across()` also returns a 0
# col, 1 row data frame in the case of no inputs so that it will recycle to
# the group size, which we also do here.
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(!!if_fn))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(!!if_fn))

ns[[!!dplyr_fn]](x, size = size, error_call = call(!!if_fn))
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same kind of trick we use in as_pick_expansion() for the expansion path. I've carefully tried to match it to the evaluation path.

Basically we replace if_any(c(x, y), fn) with something like

x <- list(x = x, y = y)
ns <- asNamespace("dplyr")
size <- ns[["dplyr_list_size_common"]](x, absent = 1L, call = call(if_any()))
x <- ns[["dplyr_list_recycle_common"]](x, size = size, call = call(if_any()))
ns[["dplyr_list_pany"]](x, size = size, error_call = call(if_any()))

Comment on lines +25 to +27
# Version of `vec_size_common()` that takes a list.
# Useful for delaying `!!!` when used within an `expr()` call.
dplyr_list_size_common <- function(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat convinced that in vctrs we should have exported list_size_common() rather than vec_size_common(), i.e.:

obj_check_vector(x)
list_check_all_vectors(xs)

vec_size(x)
list_size_common(xs)

vec_recycle(x)
list_recycle_common(xs)

vec_cast(x, to)
list_cast_common(xs, to)

)
})

test_that("`across()` recycle `.fns` results to common size", {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along the way I nearly changed across() to recycle inputs to the group size rather than recycling them to their common size. I think that would have been a mistake so I've added a test to prevent us from ever thinking of doing this.

})
})

test_that("`if_any()` and `if_all()` have consistent behavior across `filter()` and `mutate()`", {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mega test to make sure things are consistent everywhere.

It's obviously a lot of tests, but I think we really do need them all to be sure we aren't missing an edge case. These are all very hard to reason about since there are so many dimensions that intersect (filter vs mutate, expansion vs evaluation, groups vs no groups, etc)

Let's put it this way, I feel way more confident about this now that we have this test that hits every edge case

@DavisVaughan DavisVaughan marked this pull request as ready for review October 29, 2025 20:14
@DavisVaughan DavisVaughan requested a review from lionel- October 29, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

if_any inside mutate, unexpected return on single columns if_any() does not work as expected inside mutate when no inputs are provided

2 participants