CASSPYTHON-10 Update cassandra.util.Version to better support Cassandra version strings#1273
CASSPYTHON-10 Update cassandra.util.Version to better support Cassandra version strings#1273absurdfarce wants to merge 5 commits intoapache:trunkfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates cassandra.util.Version parsing/comparison to better align with Cassandra version string formats (based on the Java driver), and adjusts schema parser selection logic accordingly.
Changes:
- Replaced the
Versionparsing/comparison implementation with a regex-based approach supporting Cassandra-style prerelease/build formats. - Updated unit tests to reflect the new parsing/comparison rules and added coverage for 4.0-alpha schema threshold behavior.
- Updated schema parser selection threshold from
4-ato4.0-alphain metadata.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
cassandra/util.py |
Reworked Version parsing and ordering logic using a regex-based parser. |
tests/unit/test_util_types.py |
Updated Version parsing/comparison test cases to match the new semantics. |
cassandra/metadata.py |
Adjusted v4 schema parser cutoff version string to match the new parser expectations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @total_ordering | ||
| class Version(object): | ||
| """ | ||
| Internal minimalist class to compare versions. | ||
| A valid version is: <int>.<int>.<int>.<int or str>. | ||
|
|
||
| TODO: when python2 support is removed, use packaging.version. | ||
| """ | ||
|
|
||
| _version = None | ||
| major = None | ||
| minor = 0 | ||
| patch = 0 | ||
| build = 0 | ||
| prerelease = 0 | ||
|
|
||
| def __init__(self, version): | ||
| self._version = version |
There was a problem hiding this comment.
The new Version implementation removed the class docstring entirely. Since cassandra.util.Version is used outside this module (e.g. in cassandra.metadata and unit tests), it should keep an up-to-date docstring describing the supported version formats and comparison semantics.
| VERSION_REGEX = re.compile("(\\d+)\\.(\\d+)(\\.\\d+)?(\\.\\d+)?([~\\-]\\w[.\\w]*(?:-\\w[.\\w]*)*)?(\\+[.\\w]+)?") | ||
|
|
||
| @total_ordering | ||
| class Version(object): | ||
| """ | ||
| Internal minimalist class to compare versions. | ||
| A valid version is: <int>.<int>.<int>.<int or str>. | ||
|
|
||
| TODO: when python2 support is removed, use packaging.version. | ||
| """ | ||
|
|
||
| _version = None | ||
| major = None | ||
| minor = 0 | ||
| patch = 0 | ||
| build = 0 | ||
| prerelease = 0 | ||
|
|
||
| def __init__(self, version): | ||
| self._version = version | ||
| if '-' in version: | ||
| version_without_prerelease, self.prerelease = version.split('-', 1) | ||
| else: | ||
| version_without_prerelease = version | ||
| parts = list(reversed(version_without_prerelease.split('.'))) | ||
| if len(parts) > 4: | ||
| prerelease_string = "-{}".format(self.prerelease) if self.prerelease else "" | ||
| log.warning("Unrecognized version: {}. Only 4 components plus prerelease are supported. " | ||
| "Assuming version as {}{}".format(version, '.'.join(parts[:-5:-1]), prerelease_string)) | ||
|
|
||
| match = VERSION_REGEX.match(version) | ||
| if not match: | ||
| raise ValueError("Version string did not match expected format") |
There was a problem hiding this comment.
VERSION_REGEX.match(version) is not anchored, so it can accept only a prefix of the string and silently ignore trailing characters (e.g. 3.55.1.build12 will match 3.55.1). If the intent is strict validation, use re.fullmatch (or ^...$) so malformed versions raise. If some legacy/lenient parsing is desired, it would be safer to make that behavior explicit (e.g., a separate fallback parse with a warning) rather than relying on partial regex matches.
|
|
||
| match = VERSION_REGEX.match(version) | ||
| if not match: | ||
| raise ValueError("Version string did not match expected format") |
There was a problem hiding this comment.
The ValueError raised on parse failure is generic ("Version string did not match expected format") and omits the offending input. Including the actual version string in the message would make troubleshooting much easier.
| raise ValueError("Version string did not match expected format") | |
| raise ValueError("Version string did not match expected format: {!r}".format(version)) |
| try: | ||
| self.major = int(parts.pop()) | ||
| except ValueError as e: | ||
| raise ValueError( | ||
| "Couldn't parse version {}. Version should start with a number".format(version))\ | ||
| .with_traceback(e.__traceback__) | ||
| self.patch = self._cleanup_int(match[3]) | ||
| except: | ||
| self.patch = 0 | ||
|
|
||
| try: | ||
| self.minor = int(parts.pop()) if parts else 0 | ||
| self.patch = int(parts.pop()) if parts else 0 | ||
| self.build = self._cleanup_int(match[4]) | ||
| except: | ||
| self.build = 0 | ||
|
|
||
| if parts: # we have a build version | ||
| build = parts.pop() | ||
| try: | ||
| self.build = int(build) | ||
| except ValueError: | ||
| self.build = build | ||
| except ValueError: | ||
| assumed_version = "{}.{}.{}.{}-{}".format(self.major, self.minor, self.patch, self.build, self.prerelease) | ||
| log.warning("Unrecognized version {}. Assuming version as {}".format(version, assumed_version)) | ||
| try: | ||
| self.prerelease = self._cleanup_str(match[5]) | ||
| except: | ||
| self.prerelease = 0 |
There was a problem hiding this comment.
Avoid bare except: here. These blocks will also swallow unexpected exceptions and make debugging harder. Since _cleanup_int/_cleanup_str already handle None, the try/except may be unnecessary; otherwise, catch specific exceptions (e.g. ValueError, TypeError) and consider whether an invalid numeric component should raise vs. default to 0.
| def _cleanup_str(self, str): | ||
| return str[1:] if str else 0 |
There was a problem hiding this comment.
_cleanup_str uses a parameter named str, which shadows Python’s built-in str type. Rename the parameter to avoid shadowing.
| def _cleanup_str(self, str): | |
| return str[1:] if str else 0 | |
| def _cleanup_str(self, s): | |
| return s[1:] if s else 0 |
| ] | ||
|
|
||
| for v in unsupported_versions: | ||
| print(v) |
There was a problem hiding this comment.
Remove the print(v) statement inside the unsupported-version loop; tests should not emit output in normal operation.
| print(v) |
| return str[1:] if str else 0 | ||
|
|
||
| def __hash__(self): | ||
| return self._version |
There was a problem hiding this comment.
Version.__hash__ returns self._version (a str). In Python 3, __hash__ must return an int, and it also needs to be consistent with __eq__. As written, calling hash(Version(...)) will raise TypeError, and even if fixed to return a string hash, it would still violate the hash/eq contract for versions that compare equal but have different original strings (e.g. 4.0-SNAPSHOT vs 4.0.0-SNAPSHOT).
| return self._version | |
| # Hash based on the same components used for equality, to satisfy the hash/eq contract. | |
| return hash((self.major, self.minor, self.patch, self.build, self.prerelease)) |
| for str_version, expected_result in versions: | ||
| print(str_version) | ||
| v = Version(str_version) |
There was a problem hiding this comment.
Remove the print(...) statements from this unit test. They add noise to test output and can slow down/obscure CI logs; the asserts already provide sufficient failure context.
| self.assertTrue(Version('4.0') > Version('4.0-SNAPSHOT')) | ||
| self.assertTrue(Version('4.0-SNAPSHOT') == Version('4.0-SNAPSHOT')) | ||
| self.assertTrue(Version('4.0.0-SNAPSHOT') == Version('4.0-SNAPSHOT')) | ||
| self.assertTrue(Version('4.0.0-SNAPSHOT') == Version('4.0.0-SNAPSHOT')) | ||
| self.assertTrue(Version('4.0.0.build5-SNAPSHOT') == Version('4.0.0.build5-SNAPSHOT')) | ||
| self.assertTrue(Version('4.0.0.5-SNAPSHOT') == Version('4.0.0.5-SNAPSHOT')) |
There was a problem hiding this comment.
Given the new normalization behavior (e.g. Version('4.0.0-SNAPSHOT') == Version('4.0-SNAPSHOT')), add a unit test that hash() works and is consistent with equality for equivalent versions. This would catch regressions where __hash__ raises or returns inconsistent values.
| elif self.build != other.build: | ||
| return self.build > other.build | ||
| elif self.prerelease and not other.prerelease: | ||
| return False |
There was a problem hiding this comment.
Python tuples can be compared directly, so something like this should work
def __gt__(self, other):
# Compare as tuples
return (self.major, self.minor, self.patch, self.build, not self.prerelease) <
(other.major, other.minor, other.patch, other.build, not other.prerelease)
Code here is largely derived from the equivalent Java driver code.