Conversation
Codecov Report
@@ Coverage Diff @@
## master #322 +/- ##
==========================================
+ Coverage 94.83% 96.36% +1.53%
==========================================
Files 53 56 +3
Lines 968 991 +23
Branches 9 9
==========================================
+ Hits 918 955 +37
+ Misses 50 36 -14
Continue to review full report at Codecov.
|
|
@manuzhang thanks for the PR! Will be looking at this shortly (I hope!). |
| } | ||
|
|
||
| test("create() compiles only with correct inputs") { | ||
| illTyped("TypedOneHotEncoder.create[Double]()") |
There was a problem hiding this comment.
I guess you meant to use .apply[] instead of .create[]?
|
@manuzhang I had a quick look at the diff and it looks good to me, but this is an area for frameless I'm not very familiar with. @atamborrino Would you have time for a review? |
|
@manuzhang I am taking a closer look at the PR. I think this implementation can be made better and more type-safe. A nice API would be: val ds: TypedDataset[(String, Int, Long)] = ...
ds.transformOneHot(ds('_1)): TypedDataset[(String, Int, Long, Vector)]The above will not compile if you try to do one-hot encoding to any column that is not String or Char, so you can add further type-safety restriction on the type of column you want to transform. Also, note how the type of the resulting dataset correctly has a new column of type vector appended to the end. |
|
@imarios |
|
@manuzhang I totally dropped the ball on this PR. Let me go over this one more time and merge the PR. |
ml/src/main/scala/frameless/ml/feature/TypedOneHotEncoder.scala
Outdated
Show resolved
Hide resolved
ml/src/main/scala/frameless/ml/feature/TypedOneHotEncoder.scala
Outdated
Show resolved
Hide resolved
|
|
||
| case class Outputs(output: Vector) | ||
|
|
||
| sealed abstract class HandleInvalid(val sparkValue: String) |
There was a problem hiding this comment.
| sealed abstract class HandleInvalid(val sparkValue: String) | |
| final class HandleInvalid private(val sparkValue: String) extends AnyVal |
This adds a typed API over Spark ML's
OneHotEncoderEstimatorsince Spark 2.3.0