[SPARK-56113][PS] Improve pandas 3 string restoration in pandas-on-Spark by ueshin · Pull Request #54926 · apache/spark

ueshin · 2026-03-20T19:15:51Z

What changes were proposed in this pull request?

This PR updates string restoration in python/pyspark/pandas/data_type_ops/string_ops.py so string columns are restored with the pandas dtype carried in the internal field when converting back to pandas in pandas 3 environments.

This improves pandas 3 compatibility for string round-trips and also fixes downstream cases where restored string-related metadata could differ from pandas behavior.

Why are the changes needed?

pandas 3 is stricter about string dtype restoration and missing-value handling.

In pandas-on-Spark, converting string data back to pandas should preserve the intended pandas dtype instead of falling back to less precise restoration behavior. Without that, pandas 3 comparisons can fail even when the underlying values match.

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

Added the related tests and the other existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex GPT-5

ueshin · 2026-03-20T19:15:58Z

cc @gaogaotiantian @HyukjinKwon @zhengruifeng

Improve pandas 3 string restoration in pandas-on-Spark

57fc792

Fix.

5aef000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56113][PS] Improve pandas 3 string restoration in pandas-on-Spark#54926

[SPARK-56113][PS] Improve pandas 3 string restoration in pandas-on-Spark#54926
ueshin wants to merge 2 commits intoapache:masterfrom
ueshin:issues/SPARK-56113/string_restoration

ueshin commented Mar 20, 2026 •

edited

Loading

Uh oh!

ueshin commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ueshin commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

ueshin commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ueshin commented Mar 20, 2026 •

edited

Loading