Rectify strategy skelton PR (#212)#214
Conversation
There was a problem hiding this comment.
Pull request overview
Adds initial scaffolding for a “rectify strategy” utility module in utilsforecast, including a working residual-computation helper and a test suite covering pandas/polars compatibility.
Changes:
- Added
utilsforecast/rectify.pywithcompute_rectify_residualsimplemented andalign_rectify_features/rectifystubbed. - Added
tests/test_rectify.pywith passing coverage for residual computation and skipped tests for the stubbed APIs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
utilsforecast/rectify.py |
Introduces the rectify utilities module and implements residual computation; other APIs are currently placeholders. |
tests/test_rectify.py |
Adds tests validating residual computation for pandas/polars and outlines (skipped) expectations for upcoming functionality. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| horizon_expr = ( | ||
| nw.col(time_col).rank().over(id_col).cast(nw.Int32).alias("horizon") | ||
| ) | ||
| result = merged.select( | ||
| [id_col, time_col, horizon_expr, *residual_exprs] | ||
| ).sort(id_col, time_col) |
There was a problem hiding this comment.
horizon is computed via rank() without specifying a deterministic tie-breaking method. In pandas/polars the default rank method can yield non-integer values when there are duplicate time_col values within an id_col, and the subsequent cast(Int32) would silently truncate/duplicate horizons. Consider generating horizons using a deterministic per-group row index after sorting by time_col (or explicitly set an ordinal/dense rank method) so horizon is always 1..n with no ambiguity.
There was a problem hiding this comment.
Implemented a deterministic way using cum_count instead of rank method
| actuals = nw.from_native(df) | ||
| forecasts = nw.from_native(forecasts_df) | ||
| merged = actuals.select([id_col, time_col, target_col]).join( | ||
| forecasts.select([id_col, time_col, *models]), | ||
| on=[id_col, time_col], |
There was a problem hiding this comment.
compute_rectify_residuals doesn’t validate input schemas/dtypes (required columns, time dtype) the way other public helpers do (e.g., validate_format usage in utilsforecast/processing.py:673). Adding validate_format checks for df and forecasts_df (and verifying models columns exist) would give clearer error messages and keep behavior consistent across the library.
There was a problem hiding this comment.
Input validation to be part of logic implementation
| def rectify( | ||
| df: IntoDataFrameT, | ||
| models: List[str], | ||
| correction_models: Dict[int, Dict[str, Any]], |
There was a problem hiding this comment.
The correction_models type annotation (Dict[int, Dict[str, Any]]) conflicts with the docstring, which says that in horizon_aware mode it should be {model_name: fitted_model}. Consider updating the annotation to a union that matches both modes (or splitting the API) so static type checkers and users aren’t misled.
| correction_models: Dict[int, Dict[str, Any]], | |
| correction_models: Union[Dict[int, Dict[str, Any]], Dict[str, Any]], |
There was a problem hiding this comment.
type annotation fix will be implemented with actual logic
| Returns: | ||
| dict or tuple: Training data for rectification models. | ||
| """ | ||
| raise NotImplementedError |
There was a problem hiding this comment.
align_rectify_features is exported in __all__ but currently raises NotImplementedError unconditionally. If this is meant to be a skeleton only, consider not exporting it (or making it private) until implemented, or raise a more informative error indicating it’s experimental/unavailable in this release.
| raise NotImplementedError | |
| raise NotImplementedError( | |
| "align_rectify_features is not implemented in this version and is " | |
| "currently experimental/unavailable." | |
| ) |
| Returns: | ||
| pandas or polars DataFrame: Corrected forecasts with same schema | ||
| as input df. | ||
| """ | ||
| raise NotImplementedError |
There was a problem hiding this comment.
rectify is exported in __all__ but currently raises NotImplementedError unconditionally. Shipping a public API entry point in this state can break user code at runtime; consider implementing the function before release, or avoid exporting it / clearly mark it as experimental to prevent accidental use.
There was a problem hiding this comment.
PR Review: feat/rectify-strategy — Rectify Skeleton
Overall: Good foundation. The module structure, docstrings, and test contracts are well-designed and worth building on. The one implemented function (compute_rectify_residuals) has a correctness risk that should be fixed before the stubs are filled in, as everything else will build on top of it.
Issues to Fix
1. Horizon computation has a window-function ordering risk (rectify.py:50-57)
The horizon_expr using cum_count().over(id_col) is applied in the same select as the outer sort. In Polars, over does not guarantee it sees rows in the sorted order of the enclosing frame — the sort and the window function are evaluated independently. This means horizons could be assigned non-deterministically per series.
Fix: compute the horizon in a separate step after the sort, or use order_by inside the window expression (Polars ≥ 0.20 supports over(..., order_by=time_col)):
sorted_merged = merged.sort(id_col, time_col)
horizon_expr = nw.col(time_col).cum_count().over(id_col, order_by=time_col).cast(nw.Int32).alias("horizon")2. No handling for multi-fold CV inputs (rectify.py:43-48)
The join on [id_col, time_col] produces duplicate or mismatched rows when passed cross-validation output from mlforecast/statsforecast/neuralforecast, since those stack all folds and a (unique_id, ds) pair appears once per fold.
Suggest adding an optional cutoff_col parameter. When provided, include it in the join key and compute horizon within each (unique_id, cutoff) group. Without this, the primary real-world use case (training correction models from CV output) is broken.
3. Row-order bug in test_residual_values (test_rectify.py:57-62)
expected = actuals_nw["y"].to_numpy() - forecasts_nw[model].to_numpy()
actual = result_nw.sort("unique_id", "ds")[model].to_numpy()result_nw is sorted but actuals_nw and forecasts_nw are not, so expected can be in a different order than actual. The test passes today only because generate_series happens to return sorted data. Fix: sort all three frames before comparison.
Design Suggestions
4. align_rectify_features return type is hard to consume
Union[Dict[int, Tuple[np.ndarray, Dict[str, np.ndarray]]], Tuple[np.ndarray, Dict[str, np.ndarray]]] means callers must introspect the return type based on the mode argument. Consider two separate functions (align_per_horizon / align_horizon_aware) or a typed result dataclass. This also makes the mode parameter in rectify cleaner.
5. correction_models type hint doesn't reflect horizon_aware mode
The docstring says {horizon: {model_name: fitted_model}} for per_horizon mode, but horizon_aware mode would need {model_name: fitted_model}. The current type hint Dict[int, Dict[str, Any]] is wrong for one of the two modes. This should be clarified before implementing.
Out of Scope
6. User-facing documentation
The "bring your own model" design (user trains correction models externally, passes fitted objects with a .predict(X) interface) is a meaningful API contract that warrants a dedicated documentation page covering the full workflow — from CV output to correction model training to inference. This is best tracked as a follow-up issue rather than blocking this PR.
Verdict: Fix items 1–3 before working on it, address 4–5 during stub implementation. The skeleton is otherwise ready to build on.
3e49ca3 to
96dfed9
Compare
|
Thanks for the detailed review @nasaul. I addressed items 1-3:
Items 4-5 (split align function, fix correction_models type) - I will |
ReviewThe implementation is clean, narwhals-idiomatic, and the CV Design Issues1. Both
The user can pass
2. Row-aligned Passing 3. Dead parameters in
4.
Test Gaps (confirmed by coverage —
|
… added, methods stubbed for review. Tests included
f94f263 to
0c054a6
Compare
|
Thanks for the detailed review @nasaul. I addressed the latest points:
Also rebased the branch on latest Validation:
|


Description
Skeleton PR for rectify strategy utilities as discussed in #212.
Code changes
utilsforecast/rectify.pywith three functions:compute_rectify_residuals- computes per-horizon residuals (actual - forecast) - implementedalign_rectify_features- aligns feature matrices with horizon-indexed residuals - stubbedrectify- applies fitted correction models to base forecasts - stubbedtests/test_rectify.py- 5 passing tests (pandas + polars), 8 skipped for stubbed functionsDesign decisions
rectify.pymodule rather than include withinpostprocessing.py.predict(X)works as a correction modelper_horizon(separate model per h) andhorizon_aware(single model with horizon as feature)Review needed on:
Feedback on the API shape before filling in
align_rectify_featuresandrectify. Specifically:rectify.pymodule make sense vs another location?References
utilsforecast: Rectify strategy #212