Add `@subset` #263

pdeffebach · 2021-06-25T00:58:37Z

This is an initial attempt to add @subset.

I was able to entirely replace @where by calling skipmissing=true as a keyword argument, which is great.

We are in a real bind with regards to keyword arguments. With the move to using :block we can't just support keyword argument handling like @subset(df, ...; skipmissing = true). So I added the flag @skipmissing and re-factored the macro-flags a little bit.

julia> df = DataFrame(a = [1, missing], b = [3, 4]);

julia> @subset df @skipmissing begin 
           :a .== 1
           :b .== 3
       end
1×2 DataFrame
 Row │ a       b     
     │ Int64?  Int64 
─────┼───────────────
   1 │      1      3

But adding tests is still a pain, because the tests have missing so we would have to add @skipmissing everywhere. Is this a time when we should break with the DataFrames API and make skipmissing=true the default? That would make people's lives a bit easier when upgrading from @where to @subset.

@nalimilan This is a design decision, so I would appreciate your input.

pdeffebach · 2021-06-26T19:50:04Z

Okay, I've made skipmissing=true the default with subset and removed the @skipmissing flag. This seems like the easiest way forward. Unlike DataFrames.jl, we will treat missing as false in @subset.

All docs are added and tests added and pass. So this could be merged.

nalimilan

Thanks! That makes the API much more consistent with DataFrames.

Regarding skipmissing, I think it would be good to outline a general plan. Should DataFramesMeta automatically skip/propagate missing values everywhere? We discussed adding a keyword argument to do that in DataFrames at JuliaData/DataFrames.jl#2314. It hasn't been implemented at this point, but it would make sense to decide whether we would like to enable it by default eventually in DataFramesMeta.

docs/src/index.md

src/macros.jl

src/parsing.jl

nalimilan · 2021-06-27T10:09:46Z

src/parsing.jl

-   create_args_vector(arg) -> vec, wrap_byrow
-
-Normalize a single input to a vector of expressions,
-with a `wrap_byrow` flag indicating that the
-expressions should operate by row.
-
-If `arg` is a single `:block`, it is unnested.
-Otherwise, return a single-element array.
-Also removes line numbers.
-
-If `arg` is of the form `@byrow ...`, then
-`wrap_byrow` is returned as `true`.
+   create_args_vector(arg) -> vec, outer_flags


Why remove the contents of the docstring?

I will add a correct docstring.

nalimilan · 2021-06-27T10:10:47Z

test/dataframes.jl


    @test nrow(d) == 1

    d = @where df begin


Move this to deprecated.jl?

nalimilan · 2021-06-27T10:11:52Z

test/subset.jl

@@ -0,0 +1,143 @@
+module TestSubset


Can you add tests with GroupedDataFrame?

added, ported from @where.

test/subset.jl

Co-authored-by: Milan Bouchet-Valat <[email protected]>

…Meta.jl into add_subset

Co-authored-by: Milan Bouchet-Valat <[email protected]>

pdeffebach · 2021-06-27T23:43:21Z

w.r.t. missings.

I think that adding transform! with SubDataFrame goes along way to emulating Stata's if syntax. But you are right it doesn't help with missings.

I think something along the lines of This PR in Missings.jl is the solution. Since we are constructing anonymous functions we can just add a spreadmissing(anon) when we need to. I hope we can make it performant. I don't know if it should be default in case people compare the speed to data.table, but I am open to the idea. It may also help people who don't like row-wise since a lot of the benefit of row-wise functions is dealing with missings.

That's a long term strategy. Maybe in the meantime we should just continue to treat missings as false since it's the default behavior with @where currently.

src/parsing.jl

nalimilan · 2021-06-28T07:17:52Z

test/subset.jl

 end

-end # module
+@testset "@subset with a grouped data frame" begin


Also test @subset! with GroupedDataFrame?

Co-authored-by: Milan Bouchet-Valat <[email protected]>

…Meta.jl into add_subset

pdeffebach · 2021-06-28T22:51:51Z

Okay ready for merging.

pdeffebach · 2021-06-29T13:25:05Z

Thank you!

pdeffebach added 5 commits June 24, 2021 16:53

inital commit

4514eb0

fix tests

2e5de90

no more skipmissing

eca50d4

tests

2ddee38

update index.md

c815d1a

nalimilan reviewed Jun 27, 2021

View reviewed changes

pdeffebach and others added 5 commits June 27, 2021 15:32

add docstring

f2dde1e

Apply suggestions from code review

7c14a20

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Apply suggestions from code review

f83161f

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Merge branch 'add_subset' of https://github.com/pdeffebach/DataFrames…

430d398

…Meta.jl into add_subset

Update test/subset.jl

6f956dc

Co-authored-by: Milan Bouchet-Valat <[email protected]>

nalimilan reviewed Jun 28, 2021

View reviewed changes

pdeffebach and others added 5 commits June 28, 2021 14:36

switching

a75d817

@subset! with gd

7991b8c

Update src/parsing.jl

fa09626

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Update src/parsing.jl

1c485c8

Co-authored-by: Milan Bouchet-Valat <[email protected]>

Merge branch 'add_subset' of https://github.com/pdeffebach/DataFrames…

8d55422

…Meta.jl into add_subset

nalimilan approved these changes Jun 29, 2021

View reviewed changes

pdeffebach merged commit fea3dee into JuliaData:master Jun 29, 2021

pdeffebach deleted the add_subset branch June 29, 2021 13:25

etpinard mentioned this pull request Jul 23, 2021

Fix @where deprecation warning #271

Merged

Add @subset #263

Add @subset #263

Uh oh!

Conversation

pdeffebach commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pdeffebach commented Jun 26, 2021

Uh oh!

nalimilan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pdeffebach commented Jun 27, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pdeffebach commented Jun 28, 2021

Uh oh!

pdeffebach commented Jun 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add `@subset` #263

Add `@subset` #263

pdeffebach commented Jun 25, 2021 •

edited

Loading