Skip to content

Fix BinaryParameter truncation in xarray dataset creation#244

Closed
mwatwood-cu wants to merge 5 commits intomainfrom
bugfix/BinaryParameter-Truncation-fix
Closed

Fix BinaryParameter truncation in xarray dataset creation#244
mwatwood-cu wants to merge 5 commits intomainfrom
bugfix/BinaryParameter-Truncation-fix

Conversation

@mwatwood-cu
Copy link
Copy Markdown
Member

Checklist

Check these items and delete this block:

  • The changelog.md has been updated if necessary (Is this needed for this PR?)

Summary

This PR fixes a data-loss bug in xarray dataset creation where binary parameter values could be truncated to 4 bytes during NumPy array construction.

When a parsed value is a BinaryParameter subclass, NumPy can infer an incorrect fixed-width byte dtype when building arrays. The fix converts BinaryParameter values to plain bytes before they are appended for dataset construction, preserving the full payload width.

Changes

Convert BinaryParameter values to plain bytes in xarr.py:240
Add a focused regression test in test_xarr.py:181
Details

In xarr.py:233, values are now normalized so BinaryParameter instances are converted with bytes(...) before storing in data_dict.
New regression test test_xarr.py:181 builds a minimal fixed-length binary container and verifies:
The resulting dtype itemsize matches the expected binary width (8 bytes)
The stored value content is unchanged
Why this is needed
Without this conversion, downstream arrays may be created with an undersized byte dtype, causing silent truncation and data loss for binary fields.

Validation

Ran: pytest test_xarr.py -q
Result: 19 passed

Risk / Compatibility

Change is scoped to BinaryParameter handling only
No behavior change for non-binary parameter types

@mwatwood-cu mwatwood-cu requested a review from medley56 as a code owner March 30, 2026 16:03
Copilot AI review requested due to automatic review settings March 30, 2026 16:03
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.84%. Comparing base (84d2a76) to head (4dac600).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
+ Coverage   94.82%   94.84%   +0.01%     
==========================================
  Files          48       48              
  Lines        3868     3880      +12     
==========================================
+ Hits         3668     3680      +12     
  Misses        200      200              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss issue in the xarray “extras” path where BinaryParameter (a bytes subclass) values could lead NumPy to build undersized fixed-width byte arrays, truncating binary payloads during dataset creation.

Changes:

  • Normalize parsed BinaryParameter values to plain bytes before appending into data_dict for dataset construction.
  • Add a focused regression test asserting binary field width/content are preserved in the resulting dataset.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
space_packet_parser/xarr.py Converts common.BinaryParameter values to bytes prior to np.asarray/xarray.Dataset construction to prevent truncation.
tests/unit/test_xarr.py Adds a regression test ensuring a fixed-size binary parameter remains full-width and unmodified in the dataset.

Comment on lines +240 to 244
# Convert BinaryParameter to plain bytes to prevent numpy truncation
if isinstance(val, common.BinaryParameter):
val = bytes(val)

data_dict[apid][key].append(val)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bytes(val) will allocate a new bytes object for every BinaryParameter (it’s a bytes subclass), which can add significant extra copying for large/streamed datasets. Consider avoiding per-element conversion by (a) choosing an explicit fixed-width NumPy dtype for fixed-size binary encodings (e.g., S{nbytes}) or (b) deferring conversion to a single pass right before np.asarray so the list only gets copied once.

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +213
assert dataset["BIN_FIELD"].values.dtype.itemsize == 8
assert dataset["BIN_FIELD"].values.tolist() == [packet_data]
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion dataset["BIN_FIELD"].values.dtype.itemsize == 8 can produce false positives if the array ends up as object dtype (its itemsize is pointer-size, often 8 on 64-bit). To make this regression test robust across platforms and dtype outcomes, also assert the dtype kind/type is a fixed-width bytes dtype (e.g., dtype.kind == "S" or dtype == np.dtype("S8")).

Suggested change
assert dataset["BIN_FIELD"].values.dtype.itemsize == 8
assert dataset["BIN_FIELD"].values.tolist() == [packet_data]
values = dataset["BIN_FIELD"].values
# Ensure we truly have a fixed-width bytes dtype (not an object array whose itemsize matches pointer size).
assert values.dtype.kind == "S"
assert values.dtype.itemsize == 8
assert values.tolist() == [packet_data]

Copilot uses AI. Check for mistakes.
@medley56 medley56 self-assigned this Apr 1, 2026
medley56 and others added 5 commits April 1, 2026 16:41
* Fix Github Release Workflow

...for the billionth time. Maybe we'll get it right some day.

* Add node_modules to .gitignore
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 7 to 8.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v7...v8)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 6 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v6...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](codecov/codecov-action@v5...v6)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Convert BinaryParameter values to plain bytes in xarr.py:240
Add a focused regression test in test_xarr.py:181
@medley56 medley56 force-pushed the bugfix/BinaryParameter-Truncation-fix branch from 4f31614 to 4dac600 Compare April 1, 2026 16:41
@medley56
Copy link
Copy Markdown
Member

medley56 commented Apr 1, 2026

Re-opening a new PR from this branch to merge into release/6.1 for a patch release 6.1.2

@medley56 medley56 closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants