Skip to content

Add String API page, update access API overview and EP API#1664

Merged
AndreiKingsley merged 17 commits intomasterfrom
string-api-docs
Feb 23, 2026
Merged

Add String API page, update access API overview and EP API#1664
AndreiKingsley merged 17 commits intomasterfrom
string-api-docs

Conversation

@AndreiKingsley
Copy link
Copy Markdown
Collaborator

@AndreiKingsley AndreiKingsley commented Jan 21, 2026

Closes #1685

@AndreiKingsley AndreiKingsley changed the title Add String API page and update access API overview + EP API Add String API page, update access API overview and EP API Jan 21, 2026
Comment thread docs/StardustDocs/topics/extensionPropertiesApi.md Outdated
Comment thread docs/StardustDocs/topics/extensionPropertiesApi.md Outdated
Comment thread docs/StardustDocs/topics/extensionPropertiesApi.md Outdated
Comment thread docs/StardustDocs/topics/concepts/apiLevels.md Outdated
Comment thread docs/StardustDocs/topics/concepts/apiLevels.md Outdated
Comment thread docs/StardustDocs/topics/concepts/apiLevels.md Outdated

The String API is the simplest and unsafest of them all. The main advantage of it is that it can be
used at any time, including when accessing new columns in chain calls. So we can write something like:
Also, when using in the [IntelliJ IDEA](https://www.jetbrains.com/idea/) in the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the Intellij in the Gradle... the code compeltion?...please check this entire sentence again XD

However, note that after operations in which resulting columns cannot be inferred
by the Compiler Plugin (for example, [`pivot`](pivot.md)),
extension properties can't be inferred automatically either. For such operations,
you can use [`cast`](cast.md) to define a new data schema or use the String API.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note as before

# String API

The String API is the most basic and straightforward API
for selecting columns in the Kotlin DataFrame [operations](operations.md).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no "the"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The String API is the most basic and straightforward API for selecting columns in Kotlin DataFrame operations."

Maybe you can feel it better if you reorder the sentence like this:
"In Kotlin DataFrame operations, the String API is the most basic and straightforward API for selecting columns."

```

The String API can also be used in any operation with a [row expression](DataRow.md#row-expressions).
You can access row values in specific columns by invoking `String` values with their names:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way you write this, it's like the String invocation method only works for Row expressions, however, it also works fine in the columns selection dsl

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I add a mention about usage in CS DSL as well!

@koperagen
Copy link
Copy Markdown
Collaborator

koperagen commented Jan 23, 2026

How about we make 2,5 groups of String "accessors".
Operator style syntax for String column accessor

df.select { "col"() }, df.select { "col"<String>() }
df.select { "col"["nestedCol"]() }, df.select { "col"["nestedCol"]<String>() }
df.select { "col"() and "anotherCol"<String>() }

Their DataRow counterpart:

df.filter { "col"() } - looks like we don't/can't have "untyped" invoke, in which case type is inferred as Boolean here. Not like in CS DSL

df.filter { "col"<String>() == "abc" }
df.filter { "col"["nestedCol"]<String>() == "abc" }

Compact operator syntax for "and" operation (but to specify type need to fallback to invoke):

df.select { "col" and "anotherCol" }
df.select {  "col"["nestedCol"] and "anotherCol" }

.. and some other operations, right?

Function style syntax for String column accessor

df.select { col("col") }, df.select { col<String>("col") }
df.select { colGroup("col").col("nested") }, df.select { colGroup("col").col<String>("nested") }
df.select { cols("a", "b") }

Their DataRow counterpart:

df.filter { get("col") == "abc" }, or df.filter { it["col"] == "abc" }, df.filter { getValue<String>("col") == "abc" }
df.filter { getColumnGroup("col").getValue<String>("nestedCol") }

Or maybe we should prefer pathOf here? We should decide! Its syntax seems incomplete though, not fully thought through:

df.select { col(pathOf("col", "nestedCol")) }
df.filter {
    pathOf("abc").getValue(it) == "abc"
}

Hopefully it comprehensively covers how one might use String API in CS DSL to achieve most likely desired results, and option to choose a preference
@Jolanrensen @AndreiKingsley

@koperagen
Copy link
Copy Markdown
Collaborator

koperagen commented Jan 23, 2026

I recalled another useful style that goes hand in hard with String API: Int indexes :) Check this out:
#1666

@zaleslaw zaleslaw self-requested a review January 26, 2026 14:23
@AndreiKingsley AndreiKingsley marked this pull request as draft February 16, 2026 11:58
Comment thread docs/StardustDocs/topics/extensionPropertiesApi.md Outdated
@AndreiKingsley AndreiKingsley marked this pull request as ready for review February 19, 2026 11:29
// Select the "firstName" column from the "fullName" column group
// and the "age" column
df.select { "fullName"["firstName"]<String>() and "age"<Int>() }
// Takes only rows where the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add an extra newline in between these?

}
// Get "fullName" column
df.fullName
// Rename "fullName" column into "name"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, a blank line after expressions make them easier to read and group

<!---FUN simpleSelect-->

```kotlin
// Select a sub-dataframe with the "name" and "info" columns
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a "sub-dataframe"? I'm not sure about that name. It might draw incorrect parallels with dataframe views from pandas and the like. Maybe "Create a new dataframe with the "name" and "info" columns from df"?


<!---END-->

Select the "age" subcolumn of the "info" column group and the "name" column
Copy link
Copy Markdown
Collaborator

@Jolanrensen Jolanrensen Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of from. I like "from" better because it hints you're "taking it out"


<!---END-->

Calculate the mean value of the ("info"->"age") column; specify the column type as a `col` type argument
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the "->" I think in logs we use a different notation for subcolumns, I think it was with a "/", but I'm not sure


### Invoked String API

> This API is outdated and may be hard to read and refactor;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an opinion ;P

### Invoked String API

> This API is outdated and may be hard to read and refactor;
> it may be changed in the future.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine


Alternatively, you can use the `String` invocation (optional typed argument) for column accessor creation.
It will create the same column accessors as in the Columns Selection DSL.
You can't specify the column kind in this case, but you can access nested columns using the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can, with <DataRow<*>> or <DataFrame<*>>. But it's not as obvious


// Calculate the mean value of the ("info"->"age") column;
// specify the column type as an invocation type argument
df.mean { "info" { "age"<Int>() } }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only use {} if you have more than 1 column :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then it would be the same example as the first... okay, you can keep it like this

@AndreiKingsley AndreiKingsley merged commit fcb738c into master Feb 23, 2026
3 checks passed
@AndreiKingsley AndreiKingsley deleted the string-api-docs branch February 23, 2026 12:36
@AndreiKingsley AndreiKingsley restored the string-api-docs branch February 23, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add String API page

3 participants