Implement vec_pall() and vec_pany()
#2092
Open
+1,024
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #1675, 3 years later!
If you find yourself questioning whether we need all 3 options for
missing = NULL / FALSE / TRUE, the answer is yes:vec_pall(missing = NULL)when_all(na_rm = FALSE)vec_pall(missing = FALSE)filter()/filter_out()vec_pall(missing = TRUE)when_all(na_rm = TRUE)vec_pany(missing = NULL)when_any(na_rm = FALSE)vec_pany(missing = FALSE)when_any(na_rm = TRUE)vec_pany(missing = TRUE)We could have gotten away with
na_rminvec_pall()andvec_pany()except for the fact thatfilter()andfilter_out()need thevec_pall(.missing = FALSE)case, which would be lost if we simplified to the binaryna_rm.I think the important thing is that the user facing
when_all()andwhen_any()get the simplena_rm, and vctrs gets the more flexible/holistic but more mental overhead.missingargument.They are very fast due to a rather clever
NApropagation algorithm that uses C level arithmetic rather than if/else branching, making us immune to bad branch prediction. Much faster than base R! Enough so that it might be interesting to see if they'd take a patch.Here's base R having branchiness:
https://github.com/wch/r-source/blob/3e507c3364b779e42bc06a6bb28867ec4a3a082e/src/main/logic.c#L361-L367
Here's some benchmarks with equal distribution of
TRUE,FALSE, andNAs (so, bad for branch prediction, which affects R but not us)(Ignore the
list_*names, this is before I switched back tovec_*names)Base R gets faster if you remove the "jumpiness" in
x. i.e. if you remove anyNAs and heavily skew towardsTRUE(1000:1) then it's still slower than us but not by as much.