ENG-2488: Validate user_geography to prevent malicious data persistence#7889
ENG-2488: Validate user_geography to prevent malicious data persistence#7889mikeGarifullin wants to merge 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7889 +/- ##
=======================================
Coverage 85.08% 85.08%
=======================================
Files 627 627
Lines 40794 40802 +8
Branches 4742 4744 +2
=======================================
+ Hits 34708 34716 +8
Misses 5017 5017
Partials 1069 1069 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8b97824 to
ba1685f
Compare
There was a problem hiding this comment.
Code Review: ENG-2488 — Validate user_geography
Summary
Solid, well-scoped security fix. The regex is correct, the test coverage is thorough (including SQL injection and XSS payloads), and the changelog entry is accurate. No blockers — three minor items worth addressing.
Findings
Nit — regex intent undocumented (custom_types.py:191)
The eea special-case in the regex is non-obvious. A one-line comment explaining that it's a synthetic (non-ISO) region code would help maintainers understand the intent and boundary cases (e.g. why eea_fr is invalid but ie_eea is valid).
Minor — redundant str() cast (custom_types.py:204)
After the None guard, v is guaranteed to be a str. The str(v) cast is harmless but could silently coerce non-string types rather than surfacing a type error. Worth removing or replacing with an explicit isinstance check since this is a BeforeValidator that can receive arbitrary input.
Minor — type alias / validator contract mismatch (custom_types.py:211)
UserGeography = Annotated[str, ...] declares a non-optional string, but the underlying validator accepts and returns None. This works fine in the Optional[UserGeography] usage pattern, but a future field declared as UserGeography (without Optional) would silently allow None through Pydantic's non-null check. Consider either making the type Annotated[Optional[str], ...] to match the validator signature, or removing None handling from the validator and pushing nullability to the field declaration.
What's not in fides — informational only
UserGeography is defined here but not applied to any Pydantic schema within the fides codebase (the user_geography column in privacy_preference.py is a plain SQLAlchemy Column(String)). Per the PR description, enforcement happens in fidesplus. That's a fine architecture, but worth keeping in mind: any path that writes user_geography outside of fidesplus's API boundary would bypass this validation.
Tests
Test coverage is excellent — VALID_USER_GEOGRAPHIES and INVALID_USER_GEOGRAPHIES lists are comprehensive, and both function-level and model-level tests are included. The malicious string entries ('; DROP TABLE users; --', <script>alert(1)</script>) are a nice explicit signal of what the fix is guarding against.
🔬 Codegraph: connected (42469 nodes)
💡 Write /code-review in a comment to re-run this review.
2696b0c to
32123e1
Compare
rayharnett
left a comment
There was a problem hiding this comment.
🚨 Critical Issues (Must Fix)
No critical issues found.
⚠️ Warnings (Should Fix)
No warnings found.
💡 Suggestions (Nitpicks/Improvements)
1. Regex Robustness - Locale code pattern
File: src/fides/api/custom_types.py
Description: The regex ^([a-z]{2}(_[a-z0-9]{1,3})?|eea)$ is quite restrictive on the sub-region part ([a-z0-9]{1,3}). While this might be intentional for your specific use case, ensure that standard locale formats like en_US (uppercase) or longer sub-regions are intentionally excluded.
Rationale: The test suite shows rejection of "us_CA" (uppercase), which is correct per the regex but might be unexpected if the system eventually needs to support standard ISO/IETF tags. Since this is a security fix to prevent "malicious data", being strict is good, but ensure it doesn't break legitimate future use cases.
Suggested Solution: If you want to allow more flexibility later, consider making the regex case-insensitive or adding an explicit .lower() call in the validator before matching.
def validate_user_geography(v: Optional[str]) -> Optional[str]:
if v is None:
return None
# Normalize to lowercase if you want to support 'US_CA'
normalized_v = v.lower()
if not _USER_GEOGRAPHY_RE.match(normalized_v):
raise ValueError(...)
return normalized_v2. Explicit Error Message Detail
File: src/fides/api/custom_types.py
Description: The error message includes the raw regex string: "user_geography must be a locale code matching '([a-z]{2}(_[a-z0-9]{1,3})?|eea)'".
Rationale: While helpful for developers, exposing internal regex patterns in production error messages (if these reach an API consumer) is generally considered bad practice. It's better to provide a descriptive error message about the format rather than the implementation details.
Suggested Solution:
raise ValueError(
"user_geography must be a valid locale code (e.g., 'fr', 'us_ca', or 'eea')"
)Summary of Review
The change effectively implements a whitelist-based validation for the user_geography field, which significantly reduces the attack surface for injection attacks via this field. The test coverage is excellent, covering both valid and invalid edge cases (including SQLi and XSS attempts).
Ticket ENG-2488
Description Of Changes
Reported by SurveyMonkey: their
privacypreferencehistorytable contained malicious injection strings in theuser_geographycolumn. This PR adds aUserGeographycustom Pydantic type that validates values against a locale-code pattern before they can be persisted to the DB.The
UserGeographyannotated type is already imported and applied at the Pydantic schema level in fidesplus (BasePrivacyPreferencesRequest), so this validation fires at the API boundary on every consent preferences endpoint.Code Changes
_USER_GEOGRAPHY_REregex andvalidate_user_geography()validator tocustom_types.pyUserGeography = Annotated[str, BeforeValidator(validate_user_geography)]custom typeTestValidateUserGeographyandTestUserGeographytest classes covering valid locale codes,Nonepassthrough, and 15 invalid/malicious inputs (uppercase, wrong separators, too-long regions, SQL injection, XSS payloads)Steps to Confirm
/privacy-preferenceswithuser_geographyset to a malicious string (e.g.,"'; DROP TABLE users; --") — expect a 422 validation error"us_ca","fr","eea") — expect it to be accepted and persisted normallyuser_geographyentirely — expectnullto be acceptedPre-Merge Checklist
CHANGELOG.mdupdated