Add String API page, update access API overview and EP API#1664
Add String API page, update access API overview and EP API#1664AndreiKingsley merged 17 commits intomasterfrom
Conversation
|
|
||
| The String API is the simplest and unsafest of them all. The main advantage of it is that it can be | ||
| used at any time, including when accessing new columns in chain calls. So we can write something like: | ||
| Also, when using in the [IntelliJ IDEA](https://www.jetbrains.com/idea/) in the |
There was a problem hiding this comment.
in the Intellij in the Gradle... the code compeltion?...please check this entire sentence again XD
| However, note that after operations in which resulting columns cannot be inferred | ||
| by the Compiler Plugin (for example, [`pivot`](pivot.md)), | ||
| extension properties can't be inferred automatically either. For such operations, | ||
| you can use [`cast`](cast.md) to define a new data schema or use the String API. |
| # String API | ||
|
|
||
| The String API is the most basic and straightforward API | ||
| for selecting columns in the Kotlin DataFrame [operations](operations.md). |
There was a problem hiding this comment.
"The String API is the most basic and straightforward API for selecting columns in Kotlin DataFrame operations."
Maybe you can feel it better if you reorder the sentence like this:
"In Kotlin DataFrame operations, the String API is the most basic and straightforward API for selecting columns."
| ``` | ||
|
|
||
| The String API can also be used in any operation with a [row expression](DataRow.md#row-expressions). | ||
| You can access row values in specific columns by invoking `String` values with their names: |
There was a problem hiding this comment.
the way you write this, it's like the String invocation method only works for Row expressions, however, it also works fine in the columns selection dsl
There was a problem hiding this comment.
Ok, I add a mention about usage in CS DSL as well!
|
How about we make 2,5 groups of String "accessors". Their DataRow counterpart: Compact operator syntax for "and" operation (but to specify type need to fallback to invoke): .. and some other operations, right? Function style syntax for String column accessor Their DataRow counterpart: Or maybe we should prefer Hopefully it comprehensively covers how one might use String API in CS DSL to achieve most likely desired results, and option to choose a preference |
|
I recalled another useful style that goes hand in hard with String API: Int indexes :) Check this out: |
| // Select the "firstName" column from the "fullName" column group | ||
| // and the "age" column | ||
| df.select { "fullName"["firstName"]<String>() and "age"<Int>() } | ||
| // Takes only rows where the |
There was a problem hiding this comment.
could you add an extra newline in between these?
| } | ||
| // Get "fullName" column | ||
| df.fullName | ||
| // Rename "fullName" column into "name" |
There was a problem hiding this comment.
same here, a blank line after expressions make them easier to read and group
| <!---FUN simpleSelect--> | ||
|
|
||
| ```kotlin | ||
| // Select a sub-dataframe with the "name" and "info" columns |
There was a problem hiding this comment.
a "sub-dataframe"? I'm not sure about that name. It might draw incorrect parallels with dataframe views from pandas and the like. Maybe "Create a new dataframe with the "name" and "info" columns from df"?
|
|
||
| <!---END--> | ||
|
|
||
| Select the "age" subcolumn of the "info" column group and the "name" column |
There was a problem hiding this comment.
of from. I like "from" better because it hints you're "taking it out"
|
|
||
| <!---END--> | ||
|
|
||
| Calculate the mean value of the ("info"->"age") column; specify the column type as a `col` type argument |
There was a problem hiding this comment.
I'm not sure about the "->" I think in logs we use a different notation for subcolumns, I think it was with a "/", but I'm not sure
|
|
||
| ### Invoked String API | ||
|
|
||
| > This API is outdated and may be hard to read and refactor; |
| ### Invoked String API | ||
|
|
||
| > This API is outdated and may be hard to read and refactor; | ||
| > it may be changed in the future. |
|
|
||
| Alternatively, you can use the `String` invocation (optional typed argument) for column accessor creation. | ||
| It will create the same column accessors as in the Columns Selection DSL. | ||
| You can't specify the column kind in this case, but you can access nested columns using the |
There was a problem hiding this comment.
You can, with <DataRow<*>> or <DataFrame<*>>. But it's not as obvious
|
|
||
| // Calculate the mean value of the ("info"->"age") column; | ||
| // specify the column type as an invocation type argument | ||
| df.mean { "info" { "age"<Int>() } } |
There was a problem hiding this comment.
I'd only use {} if you have more than 1 column :)
There was a problem hiding this comment.
but then it would be the same example as the first... okay, you can keep it like this
Closes #1685