make InterceptTerms generate Bools to avoid unnecessary promotion of other columns#294
make InterceptTerms generate Bools to avoid unnecessary promotion of other columns#294kleinschmidt wants to merge 5 commits intomasterfrom
InterceptTerms generate Bools to avoid unnecessary promotion of other columns#294Conversation
|
I'm generally onboard with this, but we should probably do something like a pkgeval to see if we would actually break anything. But we're also not so near the bottom of the dependency tree as StatsBase so a formally breaking release wouldn't be as bad as watching e.g. the StatsBase 0.34 changes propagate. |
test/modelmatrix.jl
Outdated
| contr = DummyCoding(; base=4, levels=1:4) | ||
| d = (; y=rand(Float32, 4), x=rand(Float32, 4), z=[1:4; ]) |
There was a problem hiding this comment.
this isn't yet used below, right?
There was a problem hiding this comment.
it is used but it's redundant with the definition above...
There was a problem hiding this comment.
oh wait the contr part? yeah that's right.
|
+1 We could tag 1.0, as it was the plan already (removing |
| modelcols(t::InterceptTerm{true}, d::NamedTuple) = ones(Bool, size(first(d))) | ||
| modelcols(t::InterceptTerm{false}, d) = Matrix{Bool}(undef, size(first(d),1), 0) |
There was a problem hiding this comment.
Would it make sense to generate BitArrays here? I guess it doesn't matter much, but it would save some memory.
There was a problem hiding this comment.
I'd kinda hoped to be able to just use a scalar but hcat doesn't work with that
|
Is there anything holding this one back at this point? |
This is one possible fix for #293. As described in that thread, the bottleneck here is actually that
InterceptTerms generateArray{Float64}s, which then gethcatwith other columns and promote them to Float64. With this change, normal julia promotion rules apply, and values will only be promoted as needed.First adding tests to confirm that this behavior is currently broken, then will push teh fix and open for review when CI is green.
I will add that this is potentially breaking: users may end up seeing eltypes that are not even floats come out of
modelcolsif none of the columns in their input tables are floats. I think a lot of this can be remediated by having downstream dependents make sure they, xarrays returned bymodelcols(::FormulaTerm)are actually float arrays with the appropriate eltype...