Skip to content

ENG-2488: Validate user_geography to prevent malicious data persistence#7889

Open
mikeGarifullin wants to merge 1 commit intomainfrom
ENG-2488
Open

ENG-2488: Validate user_geography to prevent malicious data persistence#7889
mikeGarifullin wants to merge 1 commit intomainfrom
ENG-2488

Conversation

@mikeGarifullin
Copy link
Copy Markdown

@mikeGarifullin mikeGarifullin commented Apr 10, 2026

Ticket ENG-2488

Description Of Changes

Reported by SurveyMonkey: their privacypreferencehistory table contained malicious injection strings in the user_geography column. This PR adds a UserGeography custom Pydantic type that validates values against a locale-code pattern before they can be persisted to the DB.

The UserGeography annotated type is already imported and applied at the Pydantic schema level in fidesplus (BasePrivacyPreferencesRequest), so this validation fires at the API boundary on every consent preferences endpoint.

Code Changes

  • Added _USER_GEOGRAPHY_RE regex and validate_user_geography() validator to custom_types.py
  • Added UserGeography = Annotated[str, BeforeValidator(validate_user_geography)] custom type
  • Added TestValidateUserGeography and TestUserGeography test classes covering valid locale codes, None passthrough, and 15 invalid/malicious inputs (uppercase, wrong separators, too-long regions, SQL injection, XSS payloads)

Steps to Confirm

  1. Send a PATCH to /privacy-preferences with user_geography set to a malicious string (e.g., "'; DROP TABLE users; --") — expect a 422 validation error
  2. Send with a valid locale code (e.g., "us_ca", "fr", "eea") — expect it to be accepted and persisted normally
  3. Omit user_geography entirely — expect null to be accepted

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • No followup issues
  • Database migrations:
    • No migrations
  • Documentation:
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Apr 13, 2026 7:35am
fides-privacy-center Ignored Ignored Apr 13, 2026 7:35am

Request Review

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.08%. Comparing base (e07701b) to head (32123e1).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7889   +/-   ##
=======================================
  Coverage   85.08%   85.08%           
=======================================
  Files         627      627           
  Lines       40794    40802    +8     
  Branches     4742     4744    +2     
=======================================
+ Hits        34708    34716    +8     
  Misses       5017     5017           
  Partials     1069     1069           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mikeGarifullin added a commit that referenced this pull request Apr 10, 2026
@mikeGarifullin mikeGarifullin marked this pull request as ready for review April 10, 2026 10:32
@mikeGarifullin mikeGarifullin requested a review from a team as a code owner April 10, 2026 10:32
@mikeGarifullin mikeGarifullin requested review from johnewart and removed request for a team April 10, 2026 10:32
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: ENG-2488 — Validate user_geography

Summary

Solid, well-scoped security fix. The regex is correct, the test coverage is thorough (including SQL injection and XSS payloads), and the changelog entry is accurate. No blockers — three minor items worth addressing.


Findings

Nit — regex intent undocumented (custom_types.py:191)
The eea special-case in the regex is non-obvious. A one-line comment explaining that it's a synthetic (non-ISO) region code would help maintainers understand the intent and boundary cases (e.g. why eea_fr is invalid but ie_eea is valid).

Minor — redundant str() cast (custom_types.py:204)
After the None guard, v is guaranteed to be a str. The str(v) cast is harmless but could silently coerce non-string types rather than surfacing a type error. Worth removing or replacing with an explicit isinstance check since this is a BeforeValidator that can receive arbitrary input.

Minor — type alias / validator contract mismatch (custom_types.py:211)
UserGeography = Annotated[str, ...] declares a non-optional string, but the underlying validator accepts and returns None. This works fine in the Optional[UserGeography] usage pattern, but a future field declared as UserGeography (without Optional) would silently allow None through Pydantic's non-null check. Consider either making the type Annotated[Optional[str], ...] to match the validator signature, or removing None handling from the validator and pushing nullability to the field declaration.


What's not in fides — informational only

UserGeography is defined here but not applied to any Pydantic schema within the fides codebase (the user_geography column in privacy_preference.py is a plain SQLAlchemy Column(String)). Per the PR description, enforcement happens in fidesplus. That's a fine architecture, but worth keeping in mind: any path that writes user_geography outside of fidesplus's API boundary would bypass this validation.


Tests

Test coverage is excellent — VALID_USER_GEOGRAPHIES and INVALID_USER_GEOGRAPHIES lists are comprehensive, and both function-level and model-level tests are included. The malicious string entries ('; DROP TABLE users; --', <script>alert(1)</script>) are a nice explicit signal of what the fix is guarding against.

🔬 Codegraph: connected (42469 nodes)


💡 Write /code-review in a comment to re-run this review.

Comment thread src/fides/api/custom_types.py
Comment thread src/fides/api/custom_types.py Outdated
Comment thread src/fides/api/custom_types.py Outdated
Copy link
Copy Markdown

@rayharnett rayharnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 Critical Issues (Must Fix)

No critical issues found.

⚠️ Warnings (Should Fix)

No warnings found.

💡 Suggestions (Nitpicks/Improvements)

1. Regex Robustness - Locale code pattern

File: src/fides/api/custom_types.py

Description: The regex ^([a-z]{2}(_[a-z0-9]{1,3})?|eea)$ is quite restrictive on the sub-region part ([a-z0-9]{1,3}). While this might be intentional for your specific use case, ensure that standard locale formats like en_US (uppercase) or longer sub-regions are intentionally excluded.

Rationale: The test suite shows rejection of "us_CA" (uppercase), which is correct per the regex but might be unexpected if the system eventually needs to support standard ISO/IETF tags. Since this is a security fix to prevent "malicious data", being strict is good, but ensure it doesn't break legitimate future use cases.

Suggested Solution: If you want to allow more flexibility later, consider making the regex case-insensitive or adding an explicit .lower() call in the validator before matching.

def validate_user_geography(v: Optional[str]) -> Optional[str]:
    if v is None:
        return None
    
    # Normalize to lowercase if you want to support 'US_CA'
    normalized_v = v.lower() 
    if not _USER_GEOGRAPHY_RE.match(normalized_v):
         raise ValueError(...)
    return normalized_v

2. Explicit Error Message Detail

File: src/fides/api/custom_types.py

Description: The error message includes the raw regex string: "user_geography must be a locale code matching '([a-z]{2}(_[a-z0-9]{1,3})?|eea)'".

Rationale: While helpful for developers, exposing internal regex patterns in production error messages (if these reach an API consumer) is generally considered bad practice. It's better to provide a descriptive error message about the format rather than the implementation details.

Suggested Solution:

raise ValueError(
    "user_geography must be a valid locale code (e.g., 'fr', 'us_ca', or 'eea')"
)

Summary of Review
The change effectively implements a whitelist-based validation for the user_geography field, which significantly reduces the attack surface for injection attacks via this field. The test coverage is excellent, covering both valid and invalid edge cases (including SQLi and XSS attempts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants