Implement `vec_pall()` and `vec_pany()` #2092

DavisVaughan · 2025-11-06T18:25:58Z

Closes #1675, 3 years later!

If you find yourself questioning whether we need all 3 options for missing = NULL / FALSE / TRUE, the answer is yes:

vctrs	Data frame	Vector
`vec_pall(missing = NULL)`		`when_all(na_rm = FALSE)`
`vec_pall(missing = FALSE)`	`filter()` / `filter_out()`
`vec_pall(missing = TRUE)`		`when_all(na_rm = TRUE)`
`vec_pany(missing = NULL)`		`when_any(na_rm = FALSE)`
`vec_pany(missing = FALSE)`		`when_any(na_rm = TRUE)`
`vec_pany(missing = TRUE)`

We could have gotten away with na_rm in vec_pall() and vec_pany() except for the fact that filter() and filter_out() need the vec_pall(.missing = FALSE) case, which would be lost if we simplified to the binary na_rm.

I think the important thing is that the user facing when_all() and when_any() get the simple na_rm, and vctrs gets the more flexible/holistic but more mental overhead .missing argument.

They are very fast due to a rather clever NA propagation algorithm that uses C level arithmetic rather than if/else branching, making us immune to bad branch prediction. Much faster than base R! Enough so that it might be interesting to see if they'd take a patch.

Here's base R having branchiness:
https://github.com/wch/r-source/blob/3e507c3364b779e42bc06a6bb28867ec4a3a082e/src/main/logic.c#L361-L367

Here's some benchmarks with equal distribution of TRUE, FALSE, and NAs (so, bad for branch prediction, which affects R but not us)

(Ignore the list_* names, this is before I switched back to vec_* names)

library(vctrs)

set.seed(123)

x <- sample(c(TRUE, FALSE, NA), size = 1e8, replace = TRUE)
y <- sample(c(TRUE, FALSE, NA), size = 1e8, replace = TRUE)
z <- sample(c(TRUE, FALSE, NA), size = 1e8, replace = TRUE)

bench::mark(
  x | y,
  list_pany(list(x, y)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x | y                    517ms  519.4ms      1.92     381MB     1.28
#> 2 list_pany(list(x, y))   42.1ms   43.1ms     23.2      381MB     5.79

bench::mark(
  x & y,
  list_pall(list(x, y)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x & y                  463.3ms  467.2ms      2.12     381MB    0.531
#> 2 list_pall(list(x, y))   40.9ms   41.4ms     23.2      381MB    5.79

bench::mark(
  x | y | z,
  list_pany(list(x, y, z)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x | y | z                   1.01s    1.02s     0.974     763MB    0.974
#> 2 list_pany(list(x, y, z))  61.94ms  67.52ms    14.4       381MB    3.60

bench::mark(
  x & y & z,
  list_pall(list(x, y, z)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x & y & z                   781ms  784.3ms      1.27     763MB    0.849
#> 2 list_pall(list(x, y, z))     58ms   58.6ms     17.0      381MB    4.26

bench::mark(
  list_pall(list(x, y, z), missing = FALSE),
  list_pall(list(x, y, z), missing = TRUE),
  list_pall(list(x, y, z), missing = NULL),
  list_pany(list(x, y, z), missing = FALSE),
  list_pany(list(x, y, z), missing = TRUE),
  list_pany(list(x, y, z), missing = NULL),
  check = FALSE,
  iterations = 50
)
#> # A tibble: 6 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 list_pall(list(x, y, z), missing =… 56.2ms 57.3ms      17.4     381MB     3.31
#> 2 list_pall(list(x, y, z), missing =… 56.3ms 57.2ms      17.3     381MB     1.92
#> 3 list_pall(list(x, y, z), missing =… 57.3ms 58.3ms      17.0     381MB     2.32
#> 4 list_pany(list(x, y, z), missing =… 56.3ms 58.3ms      16.7     381MB     2.28
#> 5 list_pany(list(x, y, z), missing =…   56ms 58.2ms      16.8     381MB     1.87
#> 6 list_pany(list(x, y, z), missing =… 61.1ms 62.7ms      15.8     381MB     2.15

Base R gets faster if you remove the "jumpiness" in x. i.e. if you remove any NAs and heavily skew towards TRUE (1000:1) then it's still slower than us but not by as much.

library(vctrs)

set.seed(123)

x <- sample(c(rep(TRUE, 1000), FALSE), size = 1e8, replace = TRUE)
y <- sample(c(rep(TRUE, 1000), FALSE), size = 1e8, replace = TRUE)
z <- sample(c(rep(TRUE, 1000), FALSE), size = 1e8, replace = TRUE)

bench::mark(
  x | y,
  list_pany(list(x, y)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x | y                  122.9ms  124.5ms      7.82     381MB     5.21
#> 2 list_pany(list(x, y))   41.9ms   42.6ms     23.1      381MB     5.78

bench::mark(
  x & y,
  list_pall(list(x, y)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x & y                    156ms  161.6ms      6.14     381MB     1.53
#> 2 list_pall(list(x, y))   40.6ms   41.2ms     24.1      381MB     6.03

bench::mark(
  x | y | z,
  list_pany(list(x, y, z)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x | y | z                 236.1ms  237.8ms      4.13     763MB     4.13
#> 2 list_pany(list(x, y, z))   60.9ms   62.4ms     15.9      381MB     3.98

bench::mark(
  x & y & z,
  list_pall(list(x, y, z)),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression                    min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>               <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 x & y & z                 312.4ms    315ms      3.15     763MB     2.10
#> 2 list_pall(list(x, y, z))   58.1ms     59ms     16.9      381MB     4.23

bench::mark(
  list_pall(list(x, y, z), missing = FALSE),
  list_pall(list(x, y, z), missing = TRUE),
  list_pall(list(x, y, z), missing = NULL),
  list_pany(list(x, y, z), missing = FALSE),
  list_pany(list(x, y, z), missing = TRUE),
  list_pany(list(x, y, z), missing = NULL),
  check = FALSE,
  iterations = 50
)
#> # A tibble: 6 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 list_pall(list(x, y, z), missing =… 56.6ms 58.3ms      17.1     381MB     3.26
#> 2 list_pall(list(x, y, z), missing =… 56.6ms 57.9ms      17.0     381MB     1.89
#> 3 list_pall(list(x, y, z), missing =… 58.2ms 59.8ms      16.2     381MB     2.21
#> 4 list_pany(list(x, y, z), missing =… 56.7ms 58.3ms      17.0     381MB     2.32
#> 5 list_pany(list(x, y, z), missing =… 56.6ms 58.3ms      17.1     381MB     1.90
#> 6 list_pany(list(x, y, z), missing =… 61.7ms 63.1ms      15.7     381MB     2.14

Implement list_pall() and list_pany()

1aa0d0e

DavisVaughan force-pushed the feature/pany-pall-2 branch from 486904b to 8976484 Compare November 12, 2025 17:54

Change to vec_pany() and vec_pall()

9c2ecca

DavisVaughan force-pushed the feature/pany-pall-2 branch from 8976484 to 9c2ecca Compare November 12, 2025 17:54

DavisVaughan changed the title ~~Implement list_pall() and list_pany()~~ Implement vec_pall() and vec_pany() Nov 12, 2025

DavisVaughan requested a review from lionel- November 12, 2025 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `vec_pall()` and `vec_pany()` #2092

Implement `vec_pall()` and `vec_pany()` #2092

DavisVaughan commented Nov 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement vec_pall() and vec_pany() #2092

Are you sure you want to change the base?

Implement vec_pall() and vec_pany() #2092

Conversation

DavisVaughan commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement `vec_pall()` and `vec_pany()` #2092

Implement `vec_pall()` and `vec_pany()` #2092

DavisVaughan commented Nov 6, 2025 •

edited

Loading