feat(keras): detect StringLookup external vocabulary paths#727
feat(keras): detect StringLookup external vocabulary paths#727mldangelo-oai merged 6 commits intomainfrom
Conversation
Add CVE-2025-12058 coverage for .keras archives by flagging StringLookup layers whose vocabulary is a scalar external path or URL, while keeping inline vocabulary lists clean. Include a maintainer PR plan for the remaining major model-format CVE gaps. Co-authored-by: Codex <noreply@openai.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
WalkthroughAdds CVE-2025-12058 coverage: changelog entry and maintainer plan, a new explanation helper, scanner logic detecting external vocabulary references in Keras StringLookup with version gating, and regression tests covering vulnerable, fixed, benign, and Windows/path variants. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Scanner Runner
participant KZ as KerasZipScanner
participant Parser as Model Config Parser
participant Verifier as VersionChecker
participant Reporter as Findings Reporter
Runner->>KZ: provide model archive + metadata
KZ->>Parser: parse model_config.json
Parser->>KZ: return layers list
KZ->>KZ: identify layers where class_name == "StringLookup"
KZ->>KZ: extract layer.config.vocabulary
KZ->>Verifier: does vocabulary look like URL/absolute/traversal?
Verifier->>KZ: yes/no
KZ->>Verifier: check keras_version vulnerable (<3.12.0)?
Verifier->>KZ: vulnerable / fixed / unknown
KZ->>Reporter: emit CVE-2025-12058 finding (FAILED / INFO / WARNING) with explanation
Reporter->>Runner: aggregated findings
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 651aa97ce9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| patch_digits = "".join(ch for ch in parts[2] if ch.isdigit()) | ||
| if patch_digits: | ||
| patch = int(patch_digits) | ||
| return (major, minor, patch) < (3, 12, 0) |
There was a problem hiding this comment.
Parse Keras versions with PEP 440 semantics
_is_vulnerable_to_cve_2025_12058 concatenates all digits from the patch segment, so prereleases like 3.12.0rc1/3.12.0.dev1 are parsed as patch 1 and treated as non-vulnerable. _check_stringlookup_vocabulary_path then emits a passing version check, creating a security false negative for vulnerable prerelease builds.
Useful? React with 👍 / 👎.
Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
CHANGELOG.md (1)
74-82:⚠️ Potential issue | 🟡 MinorRun the markdown formatter before merge.
The docs check is already failing on this file, so this entry should be included in the repo’s Prettier pass before merge. As per coding guidelines, "
**/*.{md,yaml,yml,json}: When formatting markdown/json/yaml, usenpm ci --ignore-scripts && npx prettier --write \"**/*.{md,yaml,yml,json}\"if instructed or if formatting drifts`."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 74 - 82, The CHANGELOG.md entry under "## [Unreleased]" is not formatted per repo style; run the markdown formatter on CHANGELOG.md before merging by following the repo's markdown/json/yaml formatting step (run npm ci --ignore-scripts then npx prettier --write "**/*.{md,yaml,yml,json}") so the added bullets (tests/security/keras lines) conform to Prettier rules and the docs check passes; re-run the docs check and commit the formatted CHANGELOG.md.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/maintainers/cve-gap-pr-plan-2026-03-20.md`:
- Around line 9-13: The QA recipe currently uses a host-specific example
(`/etc/passwd`) when demonstrating a malicious `.keras` with `StringLookup` and
config.vocabulary; replace that with a deterministic temporary-file created via
a tmp_path-style fixture (e.g., create a temp file under tmp_path and point
config.vocabulary to that file) so tests remain portable and CI-safe, update the
benign example note to explicitly mention inline vocabulary lists/tuples remain
allowed, and ensure the doc text around `StringLookup`, `config.json`, and
`fix/keras-stringlookup-cve-2025-12058` reflects the tmp_path usage.
In `@modelaudit/scanners/keras_zip_scanner.py`:
- Around line 986-994: The home-relative check currently uses
candidate.startswith("~/") which misses Windows-style "~\\..." paths; use the
already-computed normalized (normalized = candidate.replace("\\", "/")) for that
check instead—i.e., change the condition to normalized.startswith("~/") in the
boolean expression that also references _URL_SCHEME_PATTERN,
_WINDOWS_ABSOLUTE_PATH_PATTERN and the "../" checks so "~\\vocab.txt" is
detected correctly.
- Around line 951-962: The StringLookup CVE check currently emits passed=True
when keras_version is a string; change it to passed=False and update the message
to indicate "metadata-only assessment is inconclusive" (mirroring the
TorchModuleWrapper behavior) so archive-supplied metadata cannot produce a false
pass; keep details keys (layer_name, layer_class, keras_version) and location
but flip the verdict and message in the result.add_check call where
keras_version is a str. Also harden the external vocabulary path detection by
normalizing the candidate path before the tilde check (e.g., replace backslashes
with forward slashes or otherwise canonicalize) so candidate.startswith("~/")
correctly catches Windows-style "~\\..." variants in the external path check
that currently uses candidate.startswith("~/").
---
Outside diff comments:
In `@CHANGELOG.md`:
- Around line 74-82: The CHANGELOG.md entry under "## [Unreleased]" is not
formatted per repo style; run the markdown formatter on CHANGELOG.md before
merging by following the repo's markdown/json/yaml formatting step (run npm ci
--ignore-scripts then npx prettier --write "**/*.{md,yaml,yml,json}") so the
added bullets (tests/security/keras lines) conform to Prettier rules and the
docs check passes; re-run the docs check and commit the formatted CHANGELOG.md.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5da1a73a-3a86-48e3-a8c0-0036c092c777
📒 Files selected for processing (5)
CHANGELOG.mddocs/maintainers/cve-gap-pr-plan-2026-03-20.mdmodelaudit/config/explanations.pymodelaudit/scanners/keras_zip_scanner.pytests/scanners/test_keras_zip_scanner.py
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/maintainers/cve-gap-pr-plan-2026-03-20.md (1)
13-13:⚠️ Potential issue | 🟡 MinorReplace host-specific
/etc/passwdexample with a deterministic temp-path fixture.Please update this QA example to use a portable fixture path (e.g.,
tmp_path / "vocab.txt") instead of/etc/passwd, so guidance stays CI-safe and deterministic.Suggested wording
- - QA: malicious `.keras` with `StringLookup` + `/etc/passwd`; benign `.keras` with inline vocabulary list + - QA: malicious `.keras` with `StringLookup` + a `tmp_path / "vocab.txt"` external fixture; benign `.keras` with inline vocabulary list/tupleBased on learnings: "Use deterministic fixtures only in tests; never reference host paths like
/etc/passwd; create all targets undertmp_path."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/maintainers/cve-gap-pr-plan-2026-03-20.md` at line 13, Update the QA example that references a host-specific path by replacing the `/etc/passwd` usage with a deterministic test fixture path (e.g., use tmp_path / "vocab.txt") in both the "malicious `.keras` with `StringLookup`" and the "benign `.keras` with inline vocabulary list" examples; ensure the prose and any sample commands mention creating the vocab file under tmp_path (or similar CI-safe temp fixture) so the example is portable and deterministic across environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/maintainers/cve-gap-pr-plan-2026-03-20.md`:
- Line 13: Update the QA example that references a host-specific path by
replacing the `/etc/passwd` usage with a deterministic test fixture path (e.g.,
use tmp_path / "vocab.txt") in both the "malicious `.keras` with `StringLookup`"
and the "benign `.keras` with inline vocabulary list" examples; ensure the prose
and any sample commands mention creating the vocab file under tmp_path (or
similar CI-safe temp fixture) so the example is portable and deterministic
across environments.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 88221257-e71a-4f94-9fbc-5cb71e10e476
📒 Files selected for processing (1)
docs/maintainers/cve-gap-pr-plan-2026-03-20.md
Use normalized paths for Windows-style home-relative vocabulary detection, replace the host-specific plan example with a tmp_path fixture, and downgrade fixed-version archive metadata to an inconclusive INFO finding instead of a false pass. Co-authored-by: Codex <noreply@openai.com>
|
Addressed the actionable review points in b849edd:
Validation:
........................................................................ [ 2%]
|
|
Corrected summary for the latest push in
Validation:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelaudit/scanners/keras_zip_scanner.py`:
- Around line 1236-1252: The _is_vulnerable_to_cve_2025_12058 function currently
strips non-digits from the patch and therefore treats prerelease tags like
"3.12.0rc1" as a released 3.12.0 (non-vulnerable); update its parsing to detect
PEP 440 prerelease suffixes (a, b, rc, dev) in parts[2] similar to
_is_vulnerable_keras_3_11_x: extract numeric patch and also capture any alpha
suffix, and if the numeric (major, minor, patch) equals (3, 12, 0) but a
prerelease suffix exists (or if a prerelease makes the version logically
earlier), return True (vulnerable); otherwise retain the existing numeric
comparison logic and keep the try/except ValueError behavior.
In `@tests/scanners/test_keras_zip_scanner.py`:
- Around line 289-320: Add a prerelease-version test for CVE-2025-12058 that
mirrors the existing fixed-version test: create a KerasZipScanner instance and
use create_configured_keras_zip with prerelease keras_version values (e.g.,
"3.12.0a0", "3.12.0rc1", "3.12.0.dev0") containing a StringLookup layer with an
external vocabulary path, call scanner.scan(...) and assert the CVE-2025-12058
check is present and has severity IssueSeverity.WARNING (and that
vulnerable-version tests still produce a failed check); place this as a new test
function named like test_stringlookup_prerelease_versions_treated_as_vulnerable
so it follows the existing pattern and references the same check/filter logic
used in test_stringlookup_external_vocabulary_path_is_passing_on_fixed_keras.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 49f8a326-e1a9-4905-bdce-20c1916b71f3
📒 Files selected for processing (3)
docs/maintainers/cve-gap-pr-plan-2026-03-20.mdmodelaudit/scanners/keras_zip_scanner.pytests/scanners/test_keras_zip_scanner.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b849eddcca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return ( | ||
| bool(_URL_SCHEME_PATTERN.match(candidate)) | ||
| or candidate.startswith("/") | ||
| or normalized.startswith("~/") | ||
| or bool(_WINDOWS_ABSOLUTE_PATH_PATTERN.match(candidate)) | ||
| or normalized.startswith("../") | ||
| or "/../" in normalized | ||
| ) |
There was a problem hiding this comment.
Detect relative StringLookup vocabulary paths
_is_external_stringlookup_vocabulary only matches URL/absolute/home/traversal forms and ignores plain relative filenames like "vocab.txt". StringLookup still treats those as filesystem paths, so a crafted vulnerable model can bypass CVE-2025-12058 detection by using a non-traversal relative path.
Useful? React with 👍 / 👎.
Co-authored-by: Codex <noreply@openai.com>
… CVE Handle prerelease suffixes like rc, a, b, and dev in the CVE-2025-12058 version gate so 3.12.0 prereleases are not misclassified as the final fixed release, and add regression coverage for prerelease metadata. Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fff611eba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not version_match: | ||
| return False |
There was a problem hiding this comment.
Treat unparseable Keras versions as unknown risk
_is_vulnerable_to_cve_2025_12058() returns False when version parsing fails, and _check_stringlookup_vocabulary_path() interprets that as “outside vulnerable range”. Non-canonical metadata like keras-3.11.3/v3.11.3 would therefore be downgraded to an INFO metadata check instead of a CVE warning, creating a false negative path for vulnerable archives.
Useful? React with 👍 / 👎.
…cve-2025-12058 Co-authored-by: Codex <noreply@openai.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9ed2f31aba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| or bool(_WINDOWS_ABSOLUTE_PATH_PATTERN.match(candidate)) | ||
| or normalized.startswith("../") |
There was a problem hiding this comment.
Flag rooted Windows StringLookup paths as external
_is_external_stringlookup_vocabulary only recognizes Windows drive-letter/UNC prefixes. Rooted paths like \vocab.txt are not matched, but on Windows they are absolute paths, so a malicious .keras can still trigger local file reads while bypassing CVE-2025-12058 detection.
Useful? React with 👍 / 👎.
Summary
docs/maintainers/cve-gap-pr-plan-2026-03-20.mdCVE-2025-12058in.kerasarchives whenStringLookupuses a scalar external vocabulary path or URLFalse-positive controls
vocabularystrings that are clearly external: URL-like, absolute path, home-relative, or traversal-likeQA
uv run pytest tests/scanners/test_keras_zip_scanner.py -quv run ruff check modelaudit/ tests/uv run ruff format --check modelaudit/ tests/uv run mypy modelaudit/.keras: exit 1, severities['warning'], CVEs['CVE-2025-12058'].keras: exit 0, no issuesuv run pytest -n auto -m "not slow and not integration" --maxfail=1Summary by CodeRabbit
New Features
Tests
Documentation