Fix BinaryParameter truncation in xarray dataset creation#244
Fix BinaryParameter truncation in xarray dataset creation#244mwatwood-cu wants to merge 5 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #244 +/- ##
==========================================
+ Coverage 94.82% 94.84% +0.01%
==========================================
Files 48 48
Lines 3868 3880 +12
==========================================
+ Hits 3668 3680 +12
Misses 200 200 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Fixes a data-loss issue in the xarray “extras” path where BinaryParameter (a bytes subclass) values could lead NumPy to build undersized fixed-width byte arrays, truncating binary payloads during dataset creation.
Changes:
- Normalize parsed
BinaryParametervalues to plainbytesbefore appending intodata_dictfor dataset construction. - Add a focused regression test asserting binary field width/content are preserved in the resulting dataset.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
space_packet_parser/xarr.py |
Converts common.BinaryParameter values to bytes prior to np.asarray/xarray.Dataset construction to prevent truncation. |
tests/unit/test_xarr.py |
Adds a regression test ensuring a fixed-size binary parameter remains full-width and unmodified in the dataset. |
| # Convert BinaryParameter to plain bytes to prevent numpy truncation | ||
| if isinstance(val, common.BinaryParameter): | ||
| val = bytes(val) | ||
|
|
||
| data_dict[apid][key].append(val) |
There was a problem hiding this comment.
bytes(val) will allocate a new bytes object for every BinaryParameter (it’s a bytes subclass), which can add significant extra copying for large/streamed datasets. Consider avoiding per-element conversion by (a) choosing an explicit fixed-width NumPy dtype for fixed-size binary encodings (e.g., S{nbytes}) or (b) deferring conversion to a single pass right before np.asarray so the list only gets copied once.
| assert dataset["BIN_FIELD"].values.dtype.itemsize == 8 | ||
| assert dataset["BIN_FIELD"].values.tolist() == [packet_data] |
There was a problem hiding this comment.
The assertion dataset["BIN_FIELD"].values.dtype.itemsize == 8 can produce false positives if the array ends up as object dtype (its itemsize is pointer-size, often 8 on 64-bit). To make this regression test robust across platforms and dtype outcomes, also assert the dtype kind/type is a fixed-width bytes dtype (e.g., dtype.kind == "S" or dtype == np.dtype("S8")).
| assert dataset["BIN_FIELD"].values.dtype.itemsize == 8 | |
| assert dataset["BIN_FIELD"].values.tolist() == [packet_data] | |
| values = dataset["BIN_FIELD"].values | |
| # Ensure we truly have a fixed-width bytes dtype (not an object array whose itemsize matches pointer size). | |
| assert values.dtype.kind == "S" | |
| assert values.dtype.itemsize == 8 | |
| assert values.tolist() == [packet_data] |
* Fix Github Release Workflow ...for the billionth time. Maybe we'll get it right some day. * Add node_modules to .gitignore
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v7...v8) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v6...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v5...v6) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Convert BinaryParameter values to plain bytes in xarr.py:240 Add a focused regression test in test_xarr.py:181
4f31614 to
4dac600
Compare
|
Re-opening a new PR from this branch to merge into release/6.1 for a patch release 6.1.2 |
Checklist
Check these items and delete this block:
Summary
This PR fixes a data-loss bug in xarray dataset creation where binary parameter values could be truncated to 4 bytes during NumPy array construction.
When a parsed value is a BinaryParameter subclass, NumPy can infer an incorrect fixed-width byte dtype when building arrays. The fix converts BinaryParameter values to plain bytes before they are appended for dataset construction, preserving the full payload width.
Changes
Convert BinaryParameter values to plain bytes in xarr.py:240
Add a focused regression test in test_xarr.py:181
Details
In xarr.py:233, values are now normalized so BinaryParameter instances are converted with bytes(...) before storing in data_dict.
New regression test test_xarr.py:181 builds a minimal fixed-length binary container and verifies:
The resulting dtype itemsize matches the expected binary width (8 bytes)
The stored value content is unchanged
Why this is needed
Without this conversion, downstream arrays may be created with an undersized byte dtype, causing silent truncation and data loss for binary fields.
Validation
Ran: pytest test_xarr.py -q
Result: 19 passed
Risk / Compatibility
Change is scoped to BinaryParameter handling only
No behavior change for non-binary parameter types