fix: Inconsistent schemas when converting to pyarrow by nuno-faria · Pull Request #1315 · apache/datafusion-python

nuno-faria · 2025-12-01T18:22:10Z

Which issue does this PR close?

Closes #1314.

Rationale for this change

Allow the conversion of batches to pyarrow when there are inconsistencies with the DataFrame's schema (i.e., on the nullability of columns).

What changes are included in this PR?

The conversion to pyarrow now uses the RecordBatch's own schema instead of the DataFrame's (when possible): https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_batches.
Added a test that would previously fail.

Are there any user-facing changes?

No.

kosiew · 2025-12-03T09:10:25Z

python/tests/test_dataframe.py

+def test_parquet_non_null_column_to_pyarrow(ctx, tmp_path):
+    path = tmp_path.joinpath("t.parquet")
+
+    ctx.sql("create table t_(a int not null)").collect()
+    ctx.sql("insert into t_ values (1), (2), (3)").collect()
+    ctx.sql(f"copy (select * from t_) to '{path}'").collect()
+
+    ctx.register_parquet("t", path)
+    pyarrow_table = ctx.sql("select max(a) as m from t").to_arrow_table()
+    assert pyarrow_table.to_pydict() == {"m": [3]}
+
+


I think we should also add

a regression test for the empty-batch case to ensure when there are zero record batches the DataFrame schema is still applied (because that’s the behavior the code has preserved).

a test covering an edge-case where an aggregation yields a NULL (e.g., max on an empty input) and ensure to_arrow_table correctly represents that ensures we didn't hide other nullability issues

Good call, I've added those two new tests.

kosiew

LGTM

Fix inconsistent schemas when converting to pyarrow

1c2c6f6

kosiew reviewed Dec 3, 2025

View reviewed changes

Add extra tests

c3eb8dc

IshaGudewar mentioned this pull request Dec 4, 2025

Inconsistencies between RecordBatch and DataFrame schemas cause to_arrow_table to fail #1314

Closed

kosiew approved these changes Dec 4, 2025

View reviewed changes

timsaucer added 2 commits January 5, 2026 08:29

Merge branch 'main' into fix_nullable_to_arrow_table

961ba16

Change deprecated type

5487208

timsaucer merged commit 1df6db2 into apache:main Jan 5, 2026
17 checks passed

nuno-faria deleted the fix_nullable_to_arrow_table branch January 5, 2026 17:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Inconsistent schemas when converting to pyarrow#1315

fix: Inconsistent schemas when converting to pyarrow#1315
timsaucer merged 4 commits intoapache:mainfrom
nuno-faria:fix_nullable_to_arrow_table

nuno-faria commented Dec 1, 2025

Uh oh!

kosiew Dec 3, 2025

Uh oh!

nuno-faria Dec 3, 2025

Uh oh!

kosiew left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nuno-faria commented Dec 1, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

kosiew Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

nuno-faria Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants