Summary
Add translation support for common SQL window functions into MongoDB’s $setWindowFields (MongoDB ≥ 5.0). Cover ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(n) (best-effort), running/tumbling SUM/AVG/MIN/MAX/COUNT, and partitioned order-aware windows. Provide graceful fallback/errors for unsupported server versions.
Motivation
- Window functions are standard in analytics SQL; many migrations stall when they hit
OVER (PARTITION BY … ORDER BY …).
- MongoDB’s
$setWindowFields is the canonical equivalent; supporting it unlocks powerful reporting in-place without ETL.
Scope (v1)
Supported SQL patterns:
ROW_NUMBER() OVER (PARTITION BY … ORDER BY …)
RANK() / DENSE_RANK() over partition+order
- Running aggregates:
SUM/AVG/MIN/MAX/COUNT with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- Basic frame variants:
CURRENT ROW, UNBOUNDED PRECEDING; (optional) N PRECEDING where feasible
NTILE(n) (emit bucket index via computed percentile approximation)
Out of scope (follow-ups):
- Window frames with
FOLLOWING, RANGE (value-based) frames, and complex expressions
LAG/LEAD (separate issue)
API
from sql_mongo_converter import sql_to_pipeline
pipeline = sql_to_pipeline("""
SELECT
u.country,
u.user_id,
ROW_NUMBER() OVER (PARTITION BY u.country ORDER BY u.created_at) AS rn,
RANK() OVER (PARTITION BY u.country ORDER BY u.total_spend DESC) AS rnk,
SUM(u.total_spend) OVER (
PARTITION BY u.country
ORDER BY u.created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_spend
FROM users u
WHERE u.status = 'ACTIVE';
""", options={
"mongoServerVersion": "6.0", # enable $setWindowFields
"emitCompatibilityNotes": True
})
Example Output (abridged)
{
"collection": "users",
"pipeline": [
{"$match": {"status": "ACTIVE"}},
{"$setWindowFields": {
"partitionBy": "$country",
"sortBy": {"created_at": 1},
"output": {
"rn": {"$documentNumber": {}}, // ROW_NUMBER
"running_spend": {
"$sum": "$total_spend",
"window": {"documents": ["unbounded","current"]}
}
}
}},
{"$setWindowFields": {
"partitionBy": "$country",
"sortBy": {"total_spend": -1},
"output": {
"rnk": {"$rank": {}} // RANK (DENSE_RANK → $denseRank)
}
}},
{"$project": {"country": 1, "user_id": 1, "rn": 1, "rnk": 1, "running_spend": 1, "_id": 0}}
]
}
Note: multiple $setWindowFields stages may be emitted to accommodate different ORDER BY specs per window.
Translation Rules
- PARTITION BY →
partitionBy: <expr>
- ORDER BY →
sortBy: { … }
- ROW_NUMBER →
"$documentNumber": {}
- RANK →
"$rank": {}, DENSE_RANK → "$denseRank": {}
- Running aggregates → accumulator +
window.documents: ["unbounded","current"]
N PRECEDING (best-effort) → window.documents: [-N, "current"] (warn when not supported)
- NTILE(n) (approx): compute bucket index using
$rank/$denseRank + derived math; document approximation caveats.
Dialect & Validation
- Gate feature behind
options.mongoServerVersion >= 5.0; otherwise raise a clear, actionable error with a suggestion to precompute or upgrade.
- Validate that each window has a single
ORDER BY and compatible frame; emit warnings for unsupported frames.
Tests
- Row number by partition (stable order)
- Rank vs dense_rank with ties
- Running sum/avg by time order
- Mixed windows (different ORDER BYs) on same query
- Version guard (4.4 → friendly error)
- NTILE(n) sanity checks on uniform partitions
README / Docs
- New section: “Window Functions → $setWindowFields”
- Matrix table (SQL function → Mongo expression)
- Two full copy-paste examples + caveats (frames, server version)
- Performance note: ensure supporting indexes on
partitionBy and sortBy keys
Acceptance Criteria
Summary
Add translation support for common SQL window functions into MongoDB’s
$setWindowFields(MongoDB ≥ 5.0). CoverROW_NUMBER(),RANK(),DENSE_RANK(),NTILE(n)(best-effort), running/tumblingSUM/AVG/MIN/MAX/COUNT, and partitioned order-aware windows. Provide graceful fallback/errors for unsupported server versions.Motivation
OVER (PARTITION BY … ORDER BY …).$setWindowFieldsis the canonical equivalent; supporting it unlocks powerful reporting in-place without ETL.Scope (v1)
Supported SQL patterns:
ROW_NUMBER() OVER (PARTITION BY … ORDER BY …)RANK()/DENSE_RANK()over partition+orderSUM/AVG/MIN/MAX/COUNTwithROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWCURRENT ROW,UNBOUNDED PRECEDING; (optional)N PRECEDINGwhere feasibleNTILE(n)(emit bucket index via computed percentile approximation)Out of scope (follow-ups):
FOLLOWING,RANGE(value-based) frames, and complex expressionsLAG/LEAD(separate issue)API
Example Output (abridged)
{ "collection": "users", "pipeline": [ {"$match": {"status": "ACTIVE"}}, {"$setWindowFields": { "partitionBy": "$country", "sortBy": {"created_at": 1}, "output": { "rn": {"$documentNumber": {}}, // ROW_NUMBER "running_spend": { "$sum": "$total_spend", "window": {"documents": ["unbounded","current"]} } } }}, {"$setWindowFields": { "partitionBy": "$country", "sortBy": {"total_spend": -1}, "output": { "rnk": {"$rank": {}} // RANK (DENSE_RANK → $denseRank) } }}, {"$project": {"country": 1, "user_id": 1, "rn": 1, "rnk": 1, "running_spend": 1, "_id": 0}} ] }Translation Rules
partitionBy: <expr>sortBy: { … }"$documentNumber": {}"$rank": {}, DENSE_RANK →"$denseRank": {}window.documents: ["unbounded","current"]N PRECEDING(best-effort) →window.documents: [-N, "current"](warn when not supported)$rank/$denseRank+ derived math; document approximation caveats.Dialect & Validation
options.mongoServerVersion >= 5.0; otherwise raise a clear, actionable error with a suggestion to precompute or upgrade.ORDER BYand compatible frame; emit warnings for unsupported frames.Tests
README / Docs
partitionByandsortBykeysAcceptance Criteria
sql_to_pipeline()translates listed window functions into valid$setWindowFieldsstages.ORDER BYare supported (multiple stages emitted).