-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should coeffnames be Symbols? #113
Comments
Another thing to consider is whether we want the type to change depending on the inputs,
|
I really like the idea of using |
Sure, why not? |
As noted by @kleinschmidt at #169, it's not clear what would be the advantage of doing this. Yet it's quite disruptive. Can you develop? |
Just to add to what @oxinabox said in the OP, terms are also using symbols as names so it seems very odd that the only place we are using strings to refer to variables is the coefficients. The actually reason I started looking at this was because it's easier to speed up operations on symbols than strings. I was looking at ways to query coefficients within models and formulas. I think that if we were discussing this in the absence of other package dependencies the proposed solution would be a pretty obvious improvement. I consider statistical modeling a pretty important feature of Julia so I understand that disrupting the existing ecosystem isn't a trivial thing. |
To repreat my original point: it puts us inline with how Tables.jl represents things. |
We should probably also address what we'd miss out on if we no longer used strings |
I see the point of consistency with Tables. OTOH, coefficients names are not exactly like terms, as you won't type them in a Julia-like syntax and they are often not valid Julia identifiers. This is clearly visible in the PR's tests: very often coefficient names cannot be typed nor printed directly using the julia> ["Intercept", "x1p: 6", "x1p: 7", "x1p: 8"]
4-element Array{String,1}:
"Intercept"
"x1p: 6"
"x1p: 7"
"x1p: 8"
julia> [:Intercept, Symbol("x1p: 6"), Symbol("x1p: 7"), Symbol("x1p: 8")]
4-element Array{Symbol,1}:
:Intercept
Symbol("x1p: 6")
Symbol("x1p: 7")
Symbol("x1p: 8") Can you explain in what context does the performance of symbols compared to strings really matters? |
Aye. I would not be opposed to it if they mapped to the table, but something like
has both the response and the predictor so it wouldn't map to a table. An even simpler case,
|
My point with performance was that using a string to index vs symbol should have a similar performance hit as this: julia> using BenchmarkTools
julia> x = [:a, :b, :c, :d, :e, :f, :g];
julia> y = string.(x);
julia> @btime findfirst(==(:e), x)
@ 82.534 ns (1 allocation: 16 bytes)
5
julia> @btime findfirst(==("e"), y)
111.964 ns (1 allocation: 16 bytes)
5 |
Well, if we're not using the traditional table interface then why do we need to use a |
But in practice, in what concrete cases does the 30ns overhead really matter?
Sorry, I don't get what you mean. |
I imagine it would only matter if a particular algorithm required frequently looking up coefficients via a
This was in reference to comments about the I understand these may seem like a lot of trivial points, but I still don't understand the benefit of using a string at all. Perhaps using a |
But coefficient names aren't terms. There's a one-to-one relationship between coefficients and terms only for simple continuous terms, but not for categorical terms nor for splines, etc. We could introduce another type which would also store a reference to the term if that's useful, but The performance advantage doesn't sound like a strong motivation to me, given that nobody seems to have a real use case for it. Compared to fitting a model, the cost is probably not that high. |
The performance thing was more of an example that there is some benefit to using symbols. I apologize for taking too much time on that.
This might actually be very useful. My original interest in this started with trying to get plots for statistic models (JuliaPlots/StatsPlots.jl#290). |
It is practically very common for me to make actual tables where the column names at the We don't provide a good interface for that yet. |
We've run into a very similar use case recently, wanting to generate table-like things from the outputs of simulations where there's one entry in a vector per coef, and want to make a table to collect the results. It's easy enough to do |
I guess that's a tradeoff depending on the use case. But in many instances strings will be more convenient than symbols since (as I noted) there's no direct way to enter symbols that are not valid identifiers using Maybe we could make it easier to create tables from names as string, automatically converting to symbols? |
It seems like the greatest friction here is being caused by the extra syntax that using symbols would require? This is really only a problem for categorical and interacting terms, which the user never has to actually type in themselves unless searching for them using coefnames. Perhaps this seems could be solved using some special syntax with terms like |
Not really. Any FunctionTerm would have that issue (e.g., |
Sorry, for some reason I thought you could do |
It might be a bit breaking,
but
Symbol
is the type generally used for The Name of a Thingand DataFrame column names are
Symbols
,as are
ColumnTables
The text was updated successfully, but these errors were encountered: