Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/source/user-guide/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,16 @@ when DataFusion might be suitable or unsuitable for your needs:
database system. Like DataFusion it is also written in Rust and
utilizes the Apache Arrow memory model, but unlike DataFusion it
targets end-users rather than developers of other database systems.

## Why do my query results come back in a different order between runs?

DataFusion only guarantees row order when the query includes an `ORDER BY`
clause.

Without `ORDER BY`, operators such as joins, `GROUP BY`, `UNION`, and
parallel file scans may emit the same rows in a different order across runs.
This is normal for a parallel execution engine and does not mean the query
result is incorrect.

If you need stable output ordering, add an explicit `ORDER BY` clause to the
outermost query.
17 changes: 16 additions & 1 deletion docs/source/user-guide/sql/select.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,13 @@ SELECT a FROM table WHERE a > 10

## JOIN clause

DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, `FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`, `RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`.
DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`,
`FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`,
`RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`.

Unless you add an `ORDER BY` clause, joins do not guarantee the order of
the returned rows. DataFusion executes queries in parallel, so the same
join query may produce the same rows in a different order across runs.

The following examples are based on this table:

Expand Down Expand Up @@ -246,6 +252,10 @@ Example:
SELECT a, b, MAX(c) FROM table GROUP BY a, b
```

`GROUP BY` determines how rows are grouped for aggregation, but it does not
determine the order of the output rows. If you need a stable row order, add
an `ORDER BY` clause to the outer query.

Some aggregation functions accept optional ordering requirement, such as `ARRAY_AGG`. If a requirement is given,
aggregation is calculated in the order of the requirement.

Expand Down Expand Up @@ -294,6 +304,11 @@ FROM table2
Orders the results by the referenced expression. By default it uses ascending order (`ASC`).
This order can be changed to descending by adding `DESC` after the order-by expressions.

Without `ORDER BY`, DataFusion does not guarantee the order of result rows.
This is especially important for queries involving joins, `GROUP BY`,
`UNION`, or parallel file scans, where rows may be returned in a different
order between runs even when the data itself has not changed.

Examples:

```sql
Expand Down
Loading