diff --git a/docs/source/user-guide/faq.md b/docs/source/user-guide/faq.md index d803b11333f0e..018fbba42d19b 100644 --- a/docs/source/user-guide/faq.md +++ b/docs/source/user-guide/faq.md @@ -63,3 +63,16 @@ when DataFusion might be suitable or unsuitable for your needs: database system. Like DataFusion it is also written in Rust and utilizes the Apache Arrow memory model, but unlike DataFusion it targets end-users rather than developers of other database systems. + +## Why do my query results come back in a different order between runs? + +DataFusion only guarantees row order when the query includes an `ORDER BY` +clause. + +Without `ORDER BY`, operators such as joins, `GROUP BY`, `UNION`, and +parallel file scans may emit the same rows in a different order across runs. +This is normal for a parallel execution engine and does not mean the query +result is incorrect. + +If you need stable output ordering, add an explicit `ORDER BY` clause to the +outermost query. diff --git a/docs/source/user-guide/sql/select.md b/docs/source/user-guide/sql/select.md index baacf432f5fde..6987b0f2b02ec 100644 --- a/docs/source/user-guide/sql/select.md +++ b/docs/source/user-guide/sql/select.md @@ -86,7 +86,13 @@ SELECT a FROM table WHERE a > 10 ## JOIN clause -DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, `FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`, `RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`. +DataFusion supports `INNER JOIN`, `LEFT OUTER JOIN`, `RIGHT OUTER JOIN`, +`FULL OUTER JOIN`, `NATURAL JOIN`, `CROSS JOIN`, `LEFT SEMI JOIN`, +`RIGHT SEMI JOIN`, `LEFT ANTI JOIN`, and `RIGHT ANTI JOIN`. + +Unless you add an `ORDER BY` clause, joins do not guarantee the order of +the returned rows. DataFusion executes queries in parallel, so the same +join query may produce the same rows in a different order across runs. The following examples are based on this table: @@ -246,6 +252,10 @@ Example: SELECT a, b, MAX(c) FROM table GROUP BY a, b ``` +`GROUP BY` determines how rows are grouped for aggregation, but it does not +determine the order of the output rows. If you need a stable row order, add +an `ORDER BY` clause to the outer query. + Some aggregation functions accept optional ordering requirement, such as `ARRAY_AGG`. If a requirement is given, aggregation is calculated in the order of the requirement. @@ -294,6 +304,11 @@ FROM table2 Orders the results by the referenced expression. By default it uses ascending order (`ASC`). This order can be changed to descending by adding `DESC` after the order-by expressions. +Without `ORDER BY`, DataFusion does not guarantee the order of result rows. +This is especially important for queries involving joins, `GROUP BY`, +`UNION`, or parallel file scans, where rows may be returned in a different +order between runs even when the data itself has not changed. + Examples: ```sql