fix: Disallows dropping duplicate keys when using full outer join by renato2099 · Pull Request #1320 · apache/datafusion-python

renato2099 · 2025-12-13T22:26:56Z

Which issue does this PR close?

Rationale for this change

Datafusion-python should follow datafusion implementation and disallow dropping keys when doing a full outer join as both keys are not equivalent thus they can't just be dropped. Users can then decide on how to proceed based on their use case.

What changes are included in this PR?

fix + unit test

Are there any user-facing changes?

yes, disallowing dropping keys when doing a full outer join as that is not semantically correct.

kosiew

@renato2099

Thanks for working on this.

Documentation gaps:

The doc string for join in python/datafusion/dataframe.py states:

drop_duplicate_keys: When True, the columns from the right DataFrame
    that have identical names in the ``on`` fields to the left DataFrame
    will be dropped.

It does not mention the full join exception. Users reading this may assume the parameter works the same for all join types.

Similarly, docs/source/user-guide/common-operations/joins.rst should also document for the full join and drop_duplicate_keys behaviour.

renato2099 · 2026-01-04T18:21:41Z

Hi @kosiew , thanks for taking a look at the PR!

I have added some documentation notes. Let me know if this is sufficient, otherwise I can add more explanations.

renato2099 · 2026-01-04T18:51:02Z

I am thinking that we could have a follow up on this path to be more ergonomic though + a more future-proof API (non-breaking path). Basically, we could introduce an enum-like parameter alongside the boolean, deprecating the latter later on, we could have something like:

join_key_behavior: Literal[
    "drop_right",     # current drop_duplicate_keys=True
    "keep_both",      # current drop_duplicate_keys=False
    "coalesce",       # coalesce both columns if that is really what user wants to do
] | None = None

Then

If join_key_behavior is provided, then we would ignore drop_duplicate_keys
FULL JOIN would allow only "keep_both"
INNER / LEFT / RIGHT would allow "drop_right" or "keep_both"

but we could do that in a follow up PR, wdyt @kosiew ?

timsaucer · 2026-01-05T13:25:05Z

Closing this PR since the consensus has landed on using the coalesce approach instead. Thank you for the PR and helpful discussions!

Disallows dropping duplicate keys when using full outer join

290c62e

renato2099 mentioned this pull request Dec 13, 2025

Full join on dataframe with only index yields dropped rows #1305

Closed

renato2099 added 2 commits December 14, 2025 13:58

Making ruff happy

30aab18

Making ruff happy

4f88065

kosiew reviewed Dec 28, 2025

View reviewed changes

Improving docs

9c71d0e

timsaucer closed this Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Disallows dropping duplicate keys when using full outer join #1320

fix: Disallows dropping duplicate keys when using full outer join #1320
renato2099 wants to merge 4 commits intoapache:mainfrom
renato2099:renato2099/1305

renato2099 commented Dec 13, 2025

Uh oh!

kosiew left a comment

Uh oh!

renato2099 commented Jan 4, 2026 •

edited

Loading

Uh oh!

renato2099 commented Jan 4, 2026

Uh oh!

timsaucer commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

renato2099 commented Dec 13, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Documentation gaps:

Uh oh!

renato2099 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

renato2099 commented Jan 4, 2026

Uh oh!

timsaucer commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

renato2099 commented Jan 4, 2026 •

edited

Loading