Bazel: fix Rust deps patching for semver build metadata#21287
Conversation
Handle crate versions containing `+` build metadata (e.g., `0.9.11+spec-1.1.0`). Bazel repo names use `-` instead of `+`, so the generated labels need patching to reference the correct repo name. Also adds documentation for both patching issues handled by patch_defs.py.
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix Rust dependency patching to handle semver build metadata (versions with +). It adds documentation for two issues: 1) the upstream bazelbuild/rules_rust#3255 issue where crates_vendor generates incorrect label formats, and 2) a new issue where semver versions with + build metadata don't match Bazel's repo naming (which uses - instead of +).
Changes:
- Added comprehensive documentation explaining both patching problems
- Updated regex pattern to match versions containing
+build metadata - Modified version replacement logic to convert
+to-in repo names
| # ERROR: no such package '@@[unknown repo 'vendor_ts__toml-0.9.11+spec-1.1.0' | ||
| # requested from @@ (did you mean 'vendor_ts__toml-0.9.11-spec-1.1.0'?)]// | ||
|
|
||
| label_re = re.compile(r'"@(vendor.*)//:([^+]+)-([\d.]+(?:\+.*)?)"') |
There was a problem hiding this comment.
The regex pattern ([^+]+) will incorrectly capture package names for versions with + metadata. For input "@vendor//:toml-0.9.11+spec-1.1.0", the pattern will match with group 2 = toml-0.9 instead of the correct package name toml. This happens because [^+]+ greedily matches up to the +, then backtracks to find a - that's followed by [\d.]+, resulting in capturing part of the version number as part of the package name. The correct pattern should be (.+) instead of ([^+]+), as .+ with the anchoring pattern at the end will correctly match just the package name.
| label_re = re.compile(r'"@(vendor.*)//:([^+]+)-([\d.]+(?:\+.*)?)"') | |
| label_re = re.compile(r'"@(vendor.*)//:(.+)-([\d.]+(?:\+.*)?)"') |
| label_re = re.compile(r'"@(vendor.*)//:([^+]+)-([\d.]+(?:\+.*)?)"') | ||
|
|
||
| file = pathlib.Path(sys.argv[1]) | ||
| temp = file.with_suffix(f'{file.suffix}.tmp') | ||
|
|
||
|
|
||
| with open(file) as input, open(temp, "w") as output: | ||
| for line in input: | ||
| line = label_re.sub(lambda m: f'"@{m[1]}__{m[2]}-{m[3]}//:{m[2].replace("-", "_")}"', line) | ||
| line = label_re.sub(lambda m: f'"@{m[1]}__{m[2]}-{m[3].replace("+", "-")}//:{m[2].replace("-", "_")}"', line) |
There was a problem hiding this comment.
The patch_defs.py script lacks test coverage. Given the complexity of the regex pattern matching and the critical nature of this script (it patches generated Bazel dependency files), tests should be added to verify correct behavior for various input scenarios including: 1) basic package-version format, 2) packages with hyphens in names (e.g., tree-sitter), 3) versions with build metadata (e.g., 0.9.11+spec-1.1.0), and 4) combinations thereof. The repository has comprehensive Python testing infrastructure in place (see misc/codegen/test/ and python/extractor/tests/), so tests should follow those patterns.
paldepind
left a comment
There was a problem hiding this comment.
Thanks! This fixes the problem I was experiencing.
Handle crate versions containing
+build metadata (e.g.,0.9.11+spec-1.1.0). Bazel repo names use-instead of+, so the generated labels need patching to reference the correct repo name.Also adds documentation for both patching issues handled by
patch_defs.py:Problem 1 (bazelbuild/rules_rust#3255):
crates_vendorgenerates broken labels indefs.bzl- uses@vendor//:pkg-versioninstead of@vendor__pkg-version//:pkgProblem 2: Semver versions with
+build metadata cause repo name mismatches - Bazel uses-in repo names, but generated labels reference the+version