Spark: Validate Z-order rewrite does not conflict with internal ICEZVALUE column name by YanivZalach · Pull Request #15706 · apache/iceberg

YanivZalach · 2026-03-20T21:38:19Z

Tables with a column named ICEZVALUE fail with a misleading error during
Z-order rewrite because SparkZOrderFileRewriteRunner internally uses a
column with that name.

Adds an early Preconditions.checkArgument in SparkZOrderFileRewriteRunner
before any DataFrame operation, throwing a clear exception
if the table schema already contains a column named ICEZVALUE.

Closes #15708

…ALUE column name

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java

wypoon · 2026-03-21T01:15:35Z

I think that it would be better to use a configurable column name than a hardcoded "ICEZVALUE" in SparkZOrderFileRewriteRunner.
We could add an option to https://iceberg.apache.org/docs/latest/spark-procedures/#options-for-sort-strategy-with-zorder-sort_order, for example, for configuring this column name. What do you think, @RussellSpitzer?
We should still throw the error as proposed here if the column coincides with an existing column in the table.

wypoon · 2026-03-21T01:20:11Z

For ease of review, it is better to make the changes only to the latest supported version of Spark, and then backport the changes to earlier versions once the PR is merged.

RussellSpitzer · 2026-03-21T02:30:53Z

I think that it would be better to use a configurable column name than a hardcoded "ICEZVALUE" in SparkZOrderFileRewriteRunner.

We could add an option to https://iceberg.apache.org/docs/latest/spark-procedures/#options-for-sort-strategy-with-zorder-sort_order, for example, for configuring this column name. What do you think, @RussellSpitzer?

We should still throw the error as proposed here if the column coincides with an existing column in the table.

I really don't think we need customization here, is this really that likely to be conflicting?

If we are really worried I would just randomly change the name if the column exists. My hesitation on all of this is that it doesn't seem like a very likely situation so while it is true it could happen is it worth adding code for the very rare chance it does?

So in terms of solutions I would consider

Randomly assign the column name (gen_zorder_xx) where if for some reason that exists we randomize the xx to something else
Just fail if the chosen column name is already in use
Let users customize it

The order here is

No user intervention needed, we hide the problem
We expose that it's a problem
We expose it's a problem and have to support a custom new parameter just for this special case

For all of these we still have the question of, "is it really worth adding more code to the project to avoid this"

YanivZalach · 2026-03-21T09:27:08Z

I agree that option 2 (fail with a clear error) is the right tradeoff here.
The name ICEZVALUE is unlikely to conflict in practice, but when it does,
the current behavior is silent data modification, followed by a completely
misleading error that gives the user no indication of what went wrong or how
to fix it, and when a rewrite did not happen because of the thresholds, no error at all.

The value of this PR is not preventing a common case — it's making a rare but
confusing failure mode diagnosable.

RussellSpitzer · 2026-03-21T12:33:33Z

Explain why not just auto fixing it or choosing a unique name for the column isn't better.

Express your opinion in a haiku

YanivZalach · 2026-03-21T15:44:16Z

I agree with you, and see a lot of value in consistent errors myself.

The haiku:

Fixed names shall stay true,

Clear errors guide the user,

Keep the system clean

RussellSpitzer · 2026-03-21T15:54:27Z

Iambic Pentameter, would also be clearer for end users.

RussellSpitzer · 2026-03-21T15:56:17Z

More seriously, I think the clearer message is probably fine , I think in reality there is almost zero chance this code path is ever executed, but if allow any schema is set and the user defined icezvalue is optional, this could silently erase that value. I doubt this will ever happen but I'll accept a precondition and test. It shouldn't add that much burden

github-actions bot added spark build labels Mar 20, 2026

YanivZalach force-pushed the fix/zorder-icezvalue-column-collision branch 4 times, most recently from a6adec9 to d4e22e5 Compare March 20, 2026 22:33

Spark: Validate Z-order rewrite does not conflict with internal ICEZV…

48cabea

…ALUE column name

YanivZalach force-pushed the fix/zorder-icezvalue-column-collision branch from d4e22e5 to 48cabea Compare March 21, 2026 00:38

wypoon reviewed Mar 21, 2026

View reviewed changes

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java Outdated Show resolved Hide resolved

YanivZalach requested a review from wypoon March 21, 2026 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Validate Z-order rewrite does not conflict with internal ICEZVALUE column name#15706

Spark: Validate Z-order rewrite does not conflict with internal ICEZVALUE column name#15706
YanivZalach wants to merge 1 commit intoapache:mainfrom
YanivZalach:fix/zorder-icezvalue-column-collision

YanivZalach commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

wypoon commented Mar 21, 2026

Uh oh!

wypoon commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

YanivZalach commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026 •

edited

Loading

Uh oh!

YanivZalach commented Mar 21, 2026 •

edited

Loading

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YanivZalach commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

wypoon commented Mar 21, 2026

Uh oh!

wypoon commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

YanivZalach commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

YanivZalach commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

RussellSpitzer commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YanivZalach commented Mar 20, 2026 •

edited

Loading

RussellSpitzer commented Mar 21, 2026 •

edited

Loading

YanivZalach commented Mar 21, 2026 •

edited

Loading