fix: is_nullword("-") now correctly returns True#173
Open
leonhandreke wants to merge 1 commit intomainfrom
Open
fix: is_nullword("-") now correctly returns True#173leonhandreke wants to merge 1 commit intomainfrom
is_nullword("-") now correctly returns True#173leonhandreke wants to merge 1 commit intomainfrom
Conversation
When `normalize=False`, `is_nullword` was loading the nullwords set with
the slug normalizer, which strips all punctuation — so entries like
`-`, `---`, `?` were never inserted and could never match.
The fix distinguishes the two lookup paths: when `normalize=True`, both
the form and the wordlist go through the caller-supplied normalizer as
before; when `normalize=False`, the wordlist is loaded with `noop_normalizer`
so non-normalized entries survive intact. This makes the canonical case
`is_nullword("-")` return True as expected.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
is_nullword("-") now correctly returns True
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
is_nullword("-")was returningFalseeven though"-"is listed inNULLWORDS. Same for"---","?", and any other punctuation-only entry.Why
When
normalize=False,is_nullwordwas still loading the lookup set by running everyNULLWORDSentry through the defaultnormalize_textnormalizer — which callscategory_replacewithSLUG_CATEGORIES. That maps all punctuation to whitespace, so"-"becomes""and is silently dropped from the set. The lookup then always misses.Nobody noticed in the real world because the deprecated
rigour.names.is_nullwordpassthrough always explicitly passesnormalizer=normalize_name, andnormalize_namedoesn't normalize away punctuation — so both the form and the set survived consistently. The mismatch only bites with the newerrigour.text.is_nullwordwhen using the defaultnormalize_text.The existing tests were actually asserting the broken behavior (
assert not is_nullword("---")), though I'm not quite sure what reasoning was behind that.Fix
Distinguish the two paths in
is_nullword: whennormalize=False(the caller says the form is already in its final state), load the lookup set withnoop_normalizerso non-normalized entries survive intact. Whennormalize=True, both the form and the set go through the same normalizer as before.Follow-up worth considering
The
normalize=Truepath still usesnormalize_textas its default, which meansis_nullword(" - ", normalize=True)would still returnFalse— the slug normalizer strips the-before the lookup. A lax normalizer variant that only casefolds and squashes whitespace (withoutcategory_replace) would fix this and make a natural default foris_nullwordspecifically. Let me know if I should do that follow-up.🤖 Generated with Claude Code