Fix MAG L1C gap-fill bugs for validation tests T015 and T016#2899
Fix MAG L1C gap-fill bugs for validation tests T015 and T016#2899sapols wants to merge 3 commits intoIMAP-Science-Operations-Center:devfrom
Conversation
Six localized bug fixes to the existing MAG L1C gap-filling algorithm, plus targeted regression tests. Keeps the current production algorithm shape — no architectural rewrite. Fixes: - Rate-aware gap cadence in generate_missing_timestamps() - 2D array indexing in interpolate_gaps() (column 0 explicitly) - Off-by-one burst slice endpoint - CIC filter edge clipping via extrapolate=True in linear_filtered() - Spurious micro-gaps at Config transitions via _find_rate_segments() - Burst-only trailing-edge trim for T024 exact row count - Remove dead np.delete() no-op Refs IMAP-Science-Operations-Center#1993 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Codex <noreply@openai.com>
|
One scope clarification: My read is that this PR is enough to close the issue, but we haven't completed every historical MAG L1C TODO that came up in the discussion. Relative to @maxinelasp’s earlier checklist:
Relative to @alastairtree’s broader “L1C should” list:
|
|
Another clarification: we're probably sweeping a lot under the rug by not counting day-boundary / inherited-timeline work (T017-T019) as in-scope for this ticket. E.g. Alastair's comment about gaps in the Dec 5-6 data appears to be directly related to L1C not treating leading/trailing burst-only intervals on mixed norm+burst days as fillable gaps. In those windows the parent burst L1B data does exist, but the current mixed-input L1C path collapses the output window to the existing normal-mode coverage instead of extending it across those burst-only spans. |
|
@sapols Agreed, would you be able to open a follow-up ticket for the day boundary coverage if one does not exist? |
|
Initial thoughts from reading through the PR description only:
Nice work! I'll review the code more thoroughly in a bit. |
Summary
Six localized bug fixes to the existing MAG L1C gap-filling code. These fixes un-skip and pass validation tests T015 and T016 (both mago and magi) while preserving the existing passes on T013, T014, and T024. All 10 L1C validation cases now pass with exact row-count matches against the checked-in expected outputs.
This PR intentionally keeps the current production algorithm shape. It does not rewrite the gap-fill pipeline, add new dataclasses, or replace the simplified L1C flow that was developed collaboratively between SDC and the MAG team. It only patches the specific correctness bugs that were causing T015 and T016 to fail.
Background
What was failing and why
Current dev skips T015 and T016 because they fail. The root causes are six bugs in the gap-fill path, each described below. T013, T014, and T024 were already passing on dev.
The validation tests exercise increasingly complex mode transitions:
T015 and T016 were the failing cases: they test mixed-rate transitions where the instrument changes its normal-mode sample rate across a burst window, passing through Config mode. These are the cases that exposed all six bugs.
How this was investigated
This issue was investigated using two AI coding agents (Claude Code and Codex) in parallel, guided by manual review. The process was:
Relationship to the algorithm document
The MAG Science Algorithm Document (section 7.3.4) describes a six-step L1C process. The current code implements a simplified version of these steps that was developed collaboratively between SDC and the MAG team. This PR preserves that simplified implementation. Specifically:
Step 3 (Generate a new NM time sequence) is the area with the largest gap between the document and the code. The document describes a burst-driven approach: find the BM anchor time tC closest to the gap start tA, decimate BM timestamps, shift by (tA - tC), and trim. The current code instead generates gap timestamps arithmetically from the gap boundaries at the appropriate NM cadence, then interpolates burst data onto those timestamps.
Both approaches produce a regular NM time sequence that bridges the gap. The current approach is simpler and was explicitly approved by the MAG team. The key fix in this PR is that the cadence used for gap-fill timestamps now respects the actual NM rate declared in the gap metadata, rather than always using 0.5s (2 Hz). This aligns with the document's intent that "the rate depends on the rate of the NM data before the gap."
Step 5 (Identify remaining gaps with >1.1s threshold) is not changed by this PR. The current implementation uses a per-rate tolerance check rather than a fixed 1.1s threshold, which is arguably more correct for non-2 Hz rates.
The six bugs and their fixes
Fix 1: Rate-aware gap cadence (
generate_missing_timestamps)Bug: Gap-fill timestamps were always generated at 0.5s intervals (2 Hz) regardless of the actual normal mode rate. For non-2 Hz gaps, this was wrong by construction.
Fix: Read the rate from
gap[2](the third element of each gap tuple) and compute the cadence asint(1e9 / rate). Fall back to 0.5s if no rate is provided. Use integer arithmetic throughout to avoid float precision loss at TTJ2000-scale nanosecond timestamps.Which tests this fixed: Covered directly by
test_generate_missing_timestamps_uses_gap_rate()and the updated rate-aware case intest_generate_timeline(). In isolated validation replay, reverting this fix alone did not reintroduce a row-count delta once the other transition fixes were present, so it is best understood as a direct correctness fix for non-2 Hz gaps rather than the sole driver of a specific T015/T016 failure.Spec alignment: This directly implements the document's instruction that the interpolated rate should depend on "the rate of the NM data before the gap."
Fix 2: 2D array indexing in
interpolate_gapsBug:
filled_norm_timelineis an(n, 8)array, but the gap-timeline extraction compared the entire 2D array against scalar timestamps without specifying a column. The boolean mask broadcasted across all 8 columns instead of just the epoch column (index 0).Fix: Extract
norm_epochs = filled_norm_timeline[:, 0]once at the top of the function and filter against it explicitly.Which tests this fixed: No isolated validation delta reproduced when this line was reverted in the current patch set. This is still a real correctness fix: it makes the gap-timeline mask operate on the epoch column by construction instead of relying on non-epoch columns being numerically unrelated to TTJ2000 timestamps.
Fix 3: Off-by-one burst slice endpoint
Bug:
burst_endwas computed as an inclusive index (viamin(len-1, ...)), but then used in a Python sliceburst_epochs[burst_start:burst_end]which is end-exclusive, silently dropping the last intended burst sample from the interpolation window.Fix: Make
burst_endexclusive from the start:burst_end = min(len(burst_epochs), burst_gap_end + burst_buffer + 1).Which tests this fixed: T016. In isolated replay, reverting this fix preserved row counts but changed the first mismatched interpolated vector at the burst-resume boundary for both mago and magi.
Fix 4: CIC filter edge clipping (
linear_filtered)Bug: After the CIC filter compensates for group delay, the filtered timestamp range is shorter than the input by delay samples at the tail.
linear()then callsremove_invalid_output_timestamps(), which clips output timestamps to this shortened filtered range, dropping valid gap timestamps near the burst boundary.Fix: Pass
extrapolate=Truetolinear()insidelinear_filtered(). This matches the behavior of the spec's reference CIC implementation, which usesInterpolatedUnivariateSpline(extrapolates by default).Which tests this fixed: T016. In isolated replay, reverting this fix alone causes both T016-mago and T016-magi to lose the final interpolated row.
Interaction with T024: Adding
extrapolate=Truealone would cause a +1 row regression on T024 (burst-only). Fix 6 below compensates for this.Fix 5: Spurious micro-gaps at Config mode transitions (
_find_rate_segments)Bug: When MAG transitions between modes via Config mode, the
vectors_per_secondmetadata records a transition timestamp that does not perfectly align with where the cadence actually changes in the data. The existingfind_all_gaps()split the timeline at the declared metadata timestamps, andfind_gaps()misidentified the boundary misalignment as 1-sample phantom gaps. These caused extra rows in the T015 output.Fix: New
_find_rate_segments()function that walks each configured transition backward while the observed cadence already matches the new rate. This aligns rate segment boundaries to the actual cadence in the data, preventing spurious micro-gaps from being generated in the first place. Also extracts the existing 0.075 tolerance magic number into a sharedGAP_TOLERANCEconstant used byfind_gaps().Which tests this fixed: T015-magi (had 5 extra rows due to phantom gaps at the Config boundary). Reverting the walkback logic alone reintroduces that exact +5 row failure.
Why not a post-hoc filter: An earlier approach filtered out gaps shorter than 2 cadence periods after detection. This is simpler (around 5 lines vs around 35 lines) but cannot distinguish Config-transition artifacts from genuine single-sample data loss. The root-cause fix via
_find_rate_segments()preserves detection of legitimate short gaps while eliminating only the artifacts.Fix 6: Burst-only trailing-edge trim for T024
Bug: Fix 4 (
extrapolate=True) allows interpolation slightly past the CIC-filtered range, which is correct for T016 but causes T024 (burst-only, no normal data) to produce one extra trailing sample beyond the expected output.Fix: Detect the burst-only case by checking whether any samples in the filled timeline have
ModeFlags.NORMstatus (has_norm_context). When there is no normal-mode context, shorten the burst coverage upper bound by one output cadence to compensate for the CIC filter delay. This removes the extra trailing sample without affecting the normal+burst path.Which tests this fixed: T024-mago and T024-magi (regressed to 687 rows with Fix 4 alone, expected 686). Reverting this trim alone reproduces that exact +1 row failure.
Additional cleanup: Dead
np.deleteremovalThe existing code called
np.delete(filled_norm_timeline, timeline_index)without assigning the result. Sincenp.delete()returns a new array, this was a no-op. Removed.Defensive additions
find_gaps(): Returns empty whentimeline_datahas fewer than 2 elements, preventing crashes on single-timestamp rate segments.dtype=np.int64throughout to prevent silent float precision loss on nanosecond timestamps.generate_timeline()bug fix: Updatelast_indexwhen generated timestamps are empty, preventing duplicate epoch data copies with adjacent gaps.Test changes
pytest.skip()guard intest_mag_l1c_validation.assertcheckinglen(expected_output.index) == len(l1c["epoch"].data)before the per-row value checks. This catches off-by-one regressions immediately rather than silently passing when the output has extra rows.test_process_mag_l1cexpectations: The existing unit test now expects rate-aware cadence (17 timestamps instead of 15 in the 4 Hz gap region, with updated flag indices).test_find_all_gaps_uses_observed_transition_boundary(): Regression test for the_find_rate_segments()micro-gap fix.test_generate_missing_timestamps_uses_gap_rate(): Regression test for rate-aware cadence, including the 2 Hz fallback.test_generate_timeline(): Verifies 4 Hz gap timestamps are spaced at 0.25s.test_generate_timeline(): Verifies no duplicate timestamps when adjacent gaps share a real epoch boundary, with a mixed-rate transition.Files changed
imap_processing/mag/l1c/mag_l1c.py+131/-32numpydocimap_processing/mag/l1c/interpolation_methods.py+1/-1extrapolate=True)imap_processing/tests/mag/test_mag_l1c.py+73/-6imap_processing/tests/mag/test_mag_validation.py+1/-4poetry.lock+10/-0poetry-lockhook refreshWhat this PR does NOT change
GapFillPlandataclass orBM-anchor/shift/trimstep-3 implementationValidation results
Rerun locally on
mag-l1c-fix:Exact row-count validation (previously skipped T015/T016 now pass):