Fix hi goodtimes CDF to txt convertor#2888
Fix hi goodtimes CDF to txt convertor#2888subagonsouth wants to merge 2 commits intoIMAP-Science-Operations-Center:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates IMAP-Hi goodtimes interval extraction and TXT export so the generated TXT represents good time/bin regions (rather than writing out culled regions), matching the requested downstream “IBEX-style” goodtimes text format conversion workflow.
Changes:
- Reworked
GoodtimesAccessor.get_good_intervals()to group by ESA sweeps and emit intervals only for good bin regions. - Updated
write_txt()output formatting and updated unit tests to reflect sweep-based grouping and “good-only” intervals. - Adjusted tests to use
xarray.Dataset.sizeswhere appropriate.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
imap_processing/hi/hi_goodtimes.py |
Changes interval extraction to sweep-grouping and updates TXT writer formatting to output good-only intervals. |
imap_processing/tests/hi/test_hi_goodtimes.py |
Updates expectations for the new sweep-based grouping and good-only interval output, plus minor xarray API usage tweaks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
imap_processing/hi/hi_goodtimes.py
Outdated
| where cull_pattern is a tuple representation of the cull flags array. | ||
| This signature is hashable and can be compared for equality. | ||
| """ | ||
| return tuple((esa, tuple(cull.tolist())) for esa, cull, _ in sorted(sweep)) |
There was a problem hiding this comment.
get_sweep_signature() uses sorted(sweep) where sweep elements are (esa_step, np.ndarray, met). If an ESA step occurs more than once within a sweep (common if multiple histogram packets share the same ESA step), tuple sorting will compare the np.ndarray elements when esa_step ties, raising a TypeError/ValueError. Use an explicit key= (e.g., sort by (esa_step, met) or preserve original order), or pre-aggregate per-ESA-step patterns before building the signature so it stays hashable and robust to duplicates.
| return tuple((esa, tuple(cull.tolist())) for esa, cull, _ in sorted(sweep)) | |
| return tuple( | |
| (esa, tuple(cull.tolist())) | |
| for esa, cull, _ in sorted(sweep, key=lambda entry: (entry[0], entry[2])) | |
| ) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if len(bad_vals) > 0: | ||
| bad_cull_value |= int(np.bitwise_or.reduce(bad_vals)) | ||
| region_cull = int(bad_vals[0]) | ||
| else: | ||
| region_cull = 0 |
There was a problem hiding this comment.
In _generate_intervals_for_pattern, region_cull is derived from bad_vals[0], which can drop cull bits when different bins within the same ESA step have different non-zero cull codes (or combined bit flags). Since mark_bad_times() combines flags via bitwise OR, cull_pattern can legitimately contain multiple distinct non-zero values. Consider computing region_cull as the bitwise-OR of all bad_vals (or otherwise aggregating all non-zero codes for that ESA step) so the interval’s cull_value accurately reflects everything that was culled.
| if len(bad_vals) > 0: | |
| bad_cull_value |= int(np.bitwise_or.reduce(bad_vals)) | |
| region_cull = int(bad_vals[0]) | |
| else: | |
| region_cull = 0 | |
| # Aggregate all non-zero cull codes for this ESA step so that | |
| # the region cull value reflects every flag that contributed. | |
| region_cull = int(np.bitwise_or.reduce(bad_vals)) | |
| bad_cull_value |= region_cull |
| f"{interval['met_start']:0.1f} " | ||
| f"{interval['met_end']:0.1f} " |
There was a problem hiding this comment.
write_txt() formats met_start/met_end with :0.1f, which rounds METs to 0.1 s. But esa_step_met is computed using milliseconds (seconds + milliseconds), so real MET values can have 0.001 s resolution; this output will lose timing precision in the exported txt. Consider writing METs with sufficient precision (e.g., at least 3 decimal places, or preserving full float precision) to avoid information loss during CDF→txt conversion.
| f"{interval['met_start']:0.1f} " | |
| f"{interval['met_end']:0.1f} " | |
| f"{interval['met_start']:.3f} " | |
| f"{interval['met_end']:.3f} " |
| f"{pointing:05d} " | ||
| f"{int(interval['met_start'])} " | ||
| f"{int(interval['met_end'])}\t" | ||
| f"{interval['spin_bin_low']} " | ||
| f"{interval['spin_bin_high']} " | ||
| f"{sensor}\t" | ||
| f"{esa_step_flags}\t" | ||
| f"{interval['cull_value']}" | ||
| f"{interval['met_start']:0.1f} " | ||
| f"{interval['met_end']:0.1f} " | ||
| f"{interval['spin_bin_low']:2d} " | ||
| f"{interval['spin_bin_high']:2d} " | ||
| f"{sensor} " | ||
| f"{esa_step_flags} " | ||
| f"{interval['cull_value']:3d}" |
There was a problem hiding this comment.
The write_txt() docstring specifies tab-delimited output ("MET_endtabspin_bin_low..."), but the updated line formatting now emits only spaces (no \t). Either update the docstring to match the actual delimiter behavior or restore tab separators if downstream tools rely on the documented format.
| # First interval should start at first MET | ||
| assert float(met_start) == goodtimes_instance.coords["met"].values[0] | ||
| assert int(bin_low) == 0 | ||
| assert int(bin_high) == 89 | ||
| assert sensor == "45" | ||
| assert cull_value == "0" | ||
|
|
||
| # Check ESA step flags - should have 1s for all unique ESA steps | ||
| assert cull_value == "0" # All good, no culled ESA steps |
There was a problem hiding this comment.
The updated write_txt assertions only exercise integer MET values from the goodtimes_instance fixture. In production, esa_step_met includes millisecond fractions; adding a test case with fractional METs would catch rounding/formatting regressions (and validate whatever precision policy the txt output is supposed to follow).
Change Summary
This PR only changes logic used to write CDF files to txt. This is not used in producing goodtimes products. It is only used by the IT to convert the CDF to a txt format used on IBEX so that Paul's tools will work with the goodtimes txt file.
The previous logic was not what Paul wanted. He wants the list to indicate the goodtimes only. I was previously writing out the culled times.