Fix sugar::unique for signed zeroes#1404
Conversation
eddelbuettel
left a comment
There was a problem hiding this comment.
That looks pretty good at a glance, chapeau!
The "default" approach of squash-merging one at a time, and then rebasing whatever is outstanding or pending, works fine. The other open one may also take longer as we probably need to create and watch over a bunch of (simple) PRs in other repos. |
|
Reverse-depends are running, slowly as always. Currently just over 700 packages out of 3100 done, and the four failures are packages with known "expected" issues or in one case a spurious network issue. Fingers crossed. |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug where signed zeros (-0.0 and +0.0) in doubles were not being treated as equal during hash operations. The fix refactors the normalization logic into a dedicated normalize method and ensures it is consistently applied during value insertion and comparison.
- Introduced a
normalizemethod to centralize double normalization for signed zeros, NA, and NaN - Updated hash insertion logic to normalize values before hashing and comparison
- Added test coverage for signed zero handling in the unique function
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| inst/include/Rcpp/hash/SelfHash.h | Refactored normalization into a dedicated method and applied it consistently in hash operations |
| inst/include/Rcpp/hash/IndexHash.h | Applied the same normalization refactoring as in SelfHash.h |
| inst/tinytest/test_sugar.R | Added test case to verify correct handling of signed zeros |
| ChangeLog | Documented the changes |
Comments suppressed due to low confidence (2)
inst/include/Rcpp/hash/SelfHash.h:94
- The
get_indexmethod does not normalize the inputvaluebefore callingget_addr, and uses direct equality comparison instead ofnot_equal. This creates inconsistency withadd_value_get_indexwhich normalizes values. For doubles, this means looking up -0.0 may fail to find +0.0 that was stored. The value should be normalized at the start:value = normalize(value);and the comparison should use thenot_equalhelper for consistency.
unsigned int get_index(STORAGE value) const {
unsigned int addr = get_addr(value) ;
while (data[addr]) {
if (src[data[addr] - 1] == value)
return data[addr];
addr++;
if (addr == static_cast<unsigned int>(m)) addr = 0;
}
return NA_INTEGER;
}
inst/include/Rcpp/hash/IndexHash.h:199
- The
get_indexmethod does not normalize the inputvaluebefore callingget_addr, and uses direct equality comparison instead ofnot_equal. This creates inconsistency withadd_valuewhich normalizes values. For doubles, this means looking up -0.0 may fail to find +0.0 that was stored. The value should be normalized at the start:value = normalize(value);and the comparison should use thenot_equalhelper for consistency.
inline uint32_t get_index(STORAGE value) const {
uint32_t addr = get_addr(value) ;
while (data[addr]) {
if (src[data[addr] - 1] == value)
return data[addr];
addr++;
if (addr == static_cast<uint32_t>(m)) addr = 0;
}
return NA_INTEGER;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I quickly turned copilot on here [as a one-off, for this PR] as I am starting to find it useful even just for typos like the ones it found here. I have so far been cheeky and then committed by hand, feel free to either ignore, or commits its suggestions and quickly drop a new commit. Hope you don't mind. 😉 |
|
Very useful indeed, thanks! |
|
Yep. So far it's always been one-offs from the 'review' box but it's generally been on point. Not bad. |
|
This one looks good per the commit just made containing the results from a full run (see RcppCore/rcpp-logs@aa8cc60). I think we can squash merge this and then rebase the other open PR. Which I need to re-run incrementally to get the install and check logs refreshed. |
Closes #1340. Not the cleanest of solutions, but minimal disruption of current code.
I can rebase this or the other one later depending on what you want to merge first.
Checklist
R CMD checkstill passes all tests