-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial work on handling missing values #196
Conversation
I think that if you create the right methods for |
Thanks, that's the type of pointer I need. |
So it would be nice to define the usual |
Codecov Report
@@ Coverage Diff @@
## master #196 +/- ##
==========================================
- Coverage 94.55% 94.49% -0.07%
==========================================
Files 18 18
Lines 1158 1162 +4
==========================================
+ Hits 1095 1098 +3
- Misses 63 64 +1
Continue to review full report at Codecov.
|
@@ -6,6 +6,10 @@ end | |||
Base.show(io::IO, t::RandomEffectsTerm) = print(io, "($(t.lhs) | $(t.rhs))") | |||
StatsModels.is_matrix_term(::Type{RandomEffectsTerm}) = false | |||
|
|||
function StatsModels.termvars(t::RandomEffectsTerm) | |||
vcat(StatsModels.termvars(t.lhs), StatsModels.termvars(t.rhs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to say you might want to use union
(like StatsModels does) but there shouldn't be any duplication between the lhs and rhs so there's really no need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, duplication on the lhs and rhs would be very bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be worth checking that in the constructor actually...
@kleinschmidt are there any other important Terms methods that other StatsMods functionality depends on? It should be fairly straightforward to implement as many as possible while I'm thinking about this. |
Not that I can think of off the top of my head; most of them have frankly terrible, confusing names (I can say that because I named them) and are "internal" anyway and can be implemented as needed |
Actually, better add tests for this too. |
Working on tests for all this led me to something interesting -- slp = deepcopy(dat[:sleepstudy])
slp[!,:U] = Array{Union{Missing, Float64},1}(slp[!,:U])
slp[1,:U] = missing
julia> apply_schema(@formula(Y ~ 1 + U), schema(@formula(Y ~ 1 + U), dat[:sleepstudy]))
FormulaTerm
Response:
Y(continuous)
Predictors:
1
U(continuous)
julia> apply_schema(@formula(Y ~ 1 + U), schema(@formula(Y ~ 1 + U), slp))
FormulaTerm
Response:
Y(continuous)
Predictors:
1
U(DummyCoding:10→9) It seems the union datatype and not the presence of any julia> apply_schema(@formula(Y ~ 1 + U), schema(@formula(Y ~ 1 + U), dropmissing(slp,disallowmissing=false)))
FormulaTerm
Response:
Y(continuous)
Predictors:
1
U(DummyCoding:10→9) Or using an example from the StatsMods documentation: julia> concrete_term(term(:a), [1, 2, 3], nothing)
a(continuous)
julia> concrete_term(term(:a), [1, 2, missing], nothing)
a(DummyCoding:2→1) I suspect this hasn't come up in StatsMods because |
There's an open issue for this actually, JuliaStats/StatsModels.jl#145. We've discussed treating unions missing with continuous things as continuous, which I think is reasonable, but there are some tweaks necessary to get it to work. If you want to work up a PR there I'll happily review it, or otherwise it's at the top of my list for later this week when I'll have some Julia cycles... |
In general, we need a better way to handle missings in StatsModels. Things like |
Travis failed because of an issue in downloading a package, now everything looks good. Some parts will work better once some of the changes I'm making in JuliaStats/StatsModels.jl#153 land. |
Because
StatsModels.ModelFrame
is never called directly,StatsModels.missing_omit
is never invoked either. If you extract the fixed effects as well as the blocking variables, then you can use that to create aFormulaTerm
thatmissing_omit()
understands and not lose any columns you need in the next step.The extraction used here is rather kludgy at the moment.