Skip to content

feat: DSAR, AI anonymization, MCP and search disclosures (#545-552)#743

Merged
2witstudios merged 10 commits intomasterfrom
ppg/data-compliance
Mar 11, 2026
Merged

feat: DSAR, AI anonymization, MCP and search disclosures (#545-552)#743
2witstudios merged 10 commits intomasterfrom
ppg/data-compliance

Conversation

@2witstudios
Copy link
Copy Markdown
Owner

@2witstudios 2witstudios commented Feb 28, 2026

Summary

  • DSAR endpoints ([Compliance] DSAR controls #545): Admin API to export all user data (GET /api/admin/users/[userId]/export) and delete/anonymize user data (DELETE /api/admin/users/[userId]/data) with withAdminAuth, CSRF protection, and audit logging
  • AI anonymization ([Compliance] AI usage anonymization misses metadata-stored prompts #546): Removed raw prompt/completion logging from AI usage logs; added PII scrubber utility for defense-in-depth redaction of emails, SSNs, credit cards, phone numbers
  • MCP trust disclosure ([Compliance] MCP trust disclosure #551): Security disclosure documenting MCP token trust boundaries, hash-only storage, fail-closed scoping, known hybrid route gaps, and user recommendations
  • Search disclosure ([Compliance] Search disclosure #552): Documented plaintext content storage tradeoff; added excludeFromSearch boolean to pages schema with migration; updated all search queries (multi-drive, regex, glob, main) to respect the flag

Review feedback addressed

  • TOCTOU race condition: Replaced separate check-then-delete with atomic checkAndDeleteSoloDrives() DB transaction
  • PII in audit logs: Added maskEmail() helper to avoid logging raw email in activity metadata
  • Cache-Control header: Added no-store to DSAR export response to prevent browser/proxy caching
  • Credit card detection: Broadened from 4x4 16-digit regex to 13-19 digit PAN candidate + Luhn validation
  • Plaintext disclosure: Added file processing and activity digest to server-side access list; clarified activity log snapshots retain unredacted content
  • Test coverage: Added phone number and AmEx card regression tests for PII scrubber

Test plan

  • DSAR export endpoint returns user data JSON (3 tests)
  • DSAR deletion endpoint anonymizes and deletes (5 tests)
  • PII scrubber redacts emails, SSNs, credit cards, phone numbers (11 tests)
  • No raw prompt/completion in AI usage logs
  • excludeFromSearch field exists on pages schema (4 tests)
  • All 246 test files pass (2 pre-existing DB integration failures unrelated)

How to validate

  1. Run pnpm test:unit — all unit tests pass
  2. Check DSAR deletion test: npx vitest run apps/web/src/app/api/admin/users/\[userId\]/data/__tests__/route.test.ts
  3. Check PII scrubber tests: npx vitest run packages/lib/src/compliance/__tests__/pii-scrubber.test.ts

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Admins can now export and delete user data with automatic anonymization for compliance
    • Pages can opt out of search indexing with per-page exclusion controls
  • Documentation

    • Added security guidance on MCP token trust boundaries and plaintext search data storage considerations
  • Tests

    • Expanded test coverage for admin data operations and search exclusion functionality
  • Chores

    • Database schema updated to support per-page search exclusion
    • Anonymization utilities refactored into shared library

2witstudios and others added 4 commits February 27, 2026 22:56
Admin endpoints for data subject access requests:
- GET /api/admin/users/[userId]/export — exports all user data as JSON
- DELETE /api/admin/users/[userId]/data — anonymizes and deletes user data

Both endpoints use withAdminAuth wrapper with CSRF protection,
audit logging, and proper GDPR compliance patterns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Stop logging raw prompt/completion content to aiUsageLogs
- Remove prompt/completion from metadata JSONB in ai-monitoring
- Add PII scrubber utility for defense-in-depth redaction of
  emails, SSNs, credit cards, and phone numbers
- 8 unit tests for PII scrubber

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the MCP token security model including:
- Hash-only token storage and fail-closed scoping
- Trust boundaries between MCP clients and PageSpace
- Known scope enforcement gaps on hybrid routes
- Recommendations for users creating MCP tokens

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docs/security/search-plaintext-disclosure.md documenting the
  plaintext content storage tradeoff
- Add excludeFromSearch boolean to pages schema (defaults to false)
- Update multi-drive, regex, glob, and main search queries to
  exclude pages with excludeFromSearch=true
- Generate migration 0090_smooth_king_bedlam.sql

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 28, 2026

Warning

Rate limit exceeded

@2witstudios has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 24 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b729e7ef-b2ff-4881-89ed-280754fce594

📥 Commits

Reviewing files that changed from the base of the PR and between 85c78b8 and 063cd89.

📒 Files selected for processing (11)
  • apps/web/src/app/api/admin/users/[userId]/data/__tests__/route.test.ts
  • apps/web/src/app/api/admin/users/[userId]/data/route.ts
  • apps/web/src/app/api/admin/users/[userId]/export/route.ts
  • apps/web/src/app/api/pages/[pageId]/tasks/__tests__/route.test.ts
  • apps/web/src/app/api/tasks/__tests__/route.test.ts
  • docs/security/search-plaintext-disclosure.md
  • packages/lib/package.json
  • packages/lib/src/compliance/__tests__/pii-scrubber.test.ts
  • packages/lib/src/compliance/pii-scrubber.ts
  • packages/lib/src/monitoring/ai-monitoring.ts
  • packages/lib/src/repositories/account-repository.ts
📝 Walkthrough

Walkthrough

Introduces admin DSAR export and deletion endpoints, adds per-page search exclusion functionality, implements PII scrubbing and anonymization utilities in shared library, removes prompt/completion from AI usage tracking, and adds security documentation for MCP token trust boundaries and plaintext search storage.

Changes

Cohort / File(s) Summary
Admin DSAR Export Endpoint
apps/web/src/app/api/admin/users/[userId]/export/route.ts, apps/web/src/app/api/admin/users/[userId]/export/__tests__/route.test.ts
New GET endpoint with admin auth that collects and returns all user data (profile, drives, pages, messages, files, activity, AI usage, tasks) augmented with exportedAt timestamp and exportedBy admin ID; includes 404 for missing user, 500 error handling, and comprehensive test coverage for auth, success, and not-found scenarios.
Admin DSAR Deletion Endpoint
apps/web/src/app/api/admin/users/[userId]/data/route.ts, apps/web/src/app/api/admin/users/[userId]/data/__tests__/route.test.ts
New DELETE endpoint with admin auth that anonymizes and deletes user data; prevents self-deletion, validates drive ownership (rejects if multi-member drives exist), deletes solo drives, anonymizes activity logs, removes AI usage logs, and deletes user record; includes comprehensive logging and edge-case handling with test coverage for all scenarios.
Account Deletion Refactoring
apps/web/src/app/api/account/route.ts
Migrated createAnonymizedActorEmail function from local implementation to import from shared compliance module, reducing code duplication.
Page Search Exclusion
packages/db/src/schema/core.ts, packages/db/drizzle/0090_smooth_king_bedlam.sql, packages/db/drizzle/meta/_journal.json
Added excludeFromSearch boolean column (default false, NOT NULL) to pages table with accompanying SQL migration and journal entry.
Search Filter Integration
apps/web/src/app/api/search/route.ts, apps/web/src/app/api/search/multi-drive/route.ts, packages/lib/src/services/drive-search-service.ts, apps/web/src/app/api/search/__tests__/search-exclusion.test.ts
Integrated excludeFromSearch filtering across all search endpoints (single-drive, multi-drive, drive-search service); added test suite validating field behavior (default value, column name, NOT NULL constraint).
Compliance Module
packages/lib/src/compliance/anonymize.ts, packages/lib/src/compliance/pii-scrubber.ts, packages/lib/src/compliance/__tests__/pii-scrubber.test.ts
New shared compliance utilities: createAnonymizedActorEmail for deterministic user anonymization via SHA-256 hash, and scrubPII for redacting emails, SSNs, credit cards, and phone numbers in logs; comprehensive test coverage for all PII patterns.
Package Exports
packages/lib/package.json
Exported new compliance module path and types mappings for ./compliance/anonymize and ./compliance/pii-scrubber.
AI Usage Logging Reduction
packages/lib/src/monitoring/ai-monitoring.ts, apps/web/src/app/api/ai/chat/route.ts
Removed prompt and completion fields from AIUsageData interface and eliminated token-level input/output logging in trackAIUsage, reducing exposed raw text in usage tracking.
Test Fixtures
apps/web/src/app/api/ai/chat/messages/[messageId]/__tests__/route.test.ts, apps/web/src/app/api/tasks/__tests__/route.test.ts
Updated mock factories to include excludeFromSearch field with default false, aligning test fixtures with schema changes.
Security Documentation
docs/security/mcp-trust-boundaries.md, docs/security/search-plaintext-disclosure.md
Added documentation detailing MCP token trust boundaries (storage, auth flow, scope enforcement, limitations) and plaintext search disclosure (storage rationale, access controls, per-page opt-out behavior, security considerations).

Sequence Diagrams

sequenceDiagram
    participant Admin as Admin
    participant API as GET /api/admin/users/[userId]/export
    participant Auth as Admin Auth
    participant DB as Database
    participant Logger as Logger

    Admin->>API: GET request with userId
    API->>Auth: Check admin credentials
    Auth-->>API: Auth verified
    API->>DB: collectAllUserData(userId)
    DB-->>API: User data (profile, drives, pages, etc.)
    alt User not found
        API-->>Admin: 404 {error: 'User not found'}
    else Success
        API->>Logger: Log export (admin ID, target user ID)
        API->>API: Augment data with exportedAt, exportedBy
        API-->>Admin: 200 {data, exportedAt, exportedBy}
    end
Loading
sequenceDiagram
    participant Admin as Admin
    participant API as DELETE /api/admin/users/[userId]/data
    participant Auth as Admin Auth
    participant DB as Database
    participant Logger as Logger
    participant Anon as Anonymize

    Admin->>API: DELETE request with userId, reason
    API->>Auth: Check admin credentials
    Auth-->>API: Auth verified
    API->>DB: Lookup user
    alt Self-deletion attempt
        API-->>Admin: 403 Forbidden
    else User not found
        API-->>Admin: 404 Not found
    else Multi-member drives owned
        API-->>Admin: 400 {drives requiring transfer}
    else Proceed
        API->>DB: Delete solo-member drives (parallel)
        API->>Logger: Log deletion entry (anonymized actor)
        API->>DB: Anonymize activity logs
        API->>DB: Delete AI usage logs (non-fatal errors)
        API->>DB: Delete user record
        API->>Logger: Log success
        API-->>Admin: 200 {deleted: true}
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • [Compliance] Search disclosure #552 — Implements "Search disclosure" documentation and per-page search exclusion (excludeFromSearch) field directly addressing the search plaintext storage disclosure requirements.
  • [Epic] Data Compliance #593 — Implements multiple tasks from the Data Compliance epic including DSAR export/delete endpoints, AI usage anonymization, MCP trust-boundary and plaintext-search security disclosures, and page search exclusion filters.

Possibly related PRs

Poem

🐰 A rabbit hops through pages and compliance,
Deleting traces with data appliance,
Search exclusions and trust boundaries clear,
PII scrubbed clean, no secrets here!
GDPR compliant, the admin shall roam,
With exports and deletions, now users own their home. 🏠✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: DSAR admin endpoints, AI anonymization, MCP trust documentation, and search disclosure with opt-out feature.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ppg/data-compliance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

2witstudios and others added 3 commits February 28, 2026 08:13
- Remove unused AdminRouteContext type import from export and data routes
- Remove unused withAdminAuth import from export test
- Remove unused userPromptContent variable from AI chat route

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove invalid adminAction/reason fields from logUserActivity call
- Add excludeFromSearch to page mock fixtures in tasks and chat message tests
- Fix UserAccount mock shape (remove nonexistent name/password fields)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract duplicated createAnonymizedActorEmail to packages/lib/src/compliance/anonymize.ts
- Fix unsafe body type cast with runtime validation in admin data route
- Remove dead prompt/completion fields from AIUsageData interface
- Remove dead token estimation code that referenced prompt/completion
- Remove redundant prompt/completion undefined assignments from tracking
- Deduplicate regexSearchPages where-clause branches into single composition
- Parallelize solo drive deletion in admin DSAR endpoint
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (6)
docs/security/search-plaintext-disclosure.md (1)

39-39: Consider clarifying that the flag has zero impact on storage.

The phrase "still stored in plaintext (exclusion affects search queries, not storage)" is accurate, but the word "still" may imply the flag could have changed storage behavior. Consider rephrasing to make it clearer that the flag only affects query-time filtering:

-- The page content is **still stored in plaintext** (exclusion affects search queries, not storage).
++ The page content **remains stored in plaintext** — the flag only affects query-time filtering, not storage or retention.

This is a minor clarity improvement and entirely optional.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/security/search-plaintext-disclosure.md` at line 39, Reword the sentence
"still stored in plaintext (exclusion affects search queries, not storage)" to
remove "still" and explicitly state the flag only changes query-time behavior
and does not alter how or where page content is stored; locate the exact text in
the docs line containing "still stored in plaintext (exclusion affects search
queries, not storage)" and replace it with a clearer phrasing that mentions "no
impact on storage" and "only affects query-time filtering (search exclusion)".
packages/lib/src/monitoring/ai-monitoring.ts (1)

447-452: Minor: Remove empty line artifact.

Line 448 appears to be a leftover empty line from the removed token estimation logic.

🧹 Suggested cleanup
   try {
     // Calculate tokens if not provided
     let { inputTokens, outputTokens, totalTokens } = data;
-
     // Calculate total if not provided
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/monitoring/ai-monitoring.ts` around lines 447 - 452, Remove
the leftover blank line between the destructuring assignment and the following
comment/if block: keep the line "let { inputTokens, outputTokens, totalTokens }
= data;" directly followed by the comment and the if-check that computes
totalTokens; this cleans up the empty line artifact around the
inputTokens/outputTokens/totalTokens code.
packages/lib/src/compliance/__tests__/pii-scrubber.test.ts (1)

4-52: Add regression coverage for phones and non-16-digit cards.

The new scrubber logic in packages/lib/src/compliance/pii-scrubber.ts isn't exercised for PHONE_PATTERN, and the only card case here is a 16-digit Visa-like number. This suite would still pass if phone redaction or an AmEx-style PAN path broke.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/lib/src/compliance/__tests__/pii-scrubber.test.ts` around lines 4 -
52, Add tests exercising phone redaction and non-16-digit PANs by extending the
scrubPII test suite: add at least one case calling scrubPII with phone formats
(e.g., "(555) 123-4567" and "555-123-4567") and assert the result contains
"[PHONE_REDACTED]" and no raw digits (covers PHONE_PATTERN), and add a case with
a 15-digit AmEx-style card (e.g., "378282246310005" or with spaces) and assert
it becomes "[CC_REDACTED]" (covers non-16-digit PAN paths in the scrubber);
reference scrubPII and PHONE_PATTERN so the tests directly validate those code
paths.
apps/web/src/app/api/tasks/__tests__/route.test.ts (1)

101-143: Keep createPageFixture overridable for excludeFromSearch.

The fixture now hardcodes excludeFromSearch: false, but the override type does not expose that field. That makes it awkward to create an excluded-page variant in this suite if task/search interactions ever need coverage.

♻️ Small diff
 const createPageFixture = (overrides: Partial<{
   id: string;
   driveId: string;
   title: string;
+  excludeFromSearch: boolean;
 }> = {}) => ({
   id: overrides.id ?? 'page_1',
   driveId: overrides.driveId ?? 'drive_1',
   title: overrides.title ?? 'Test Page',
@@
-  excludeFromSearch: false,
+  excludeFromSearch: overrides.excludeFromSearch ?? false,
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/app/api/tasks/__tests__/route.test.ts` around lines 101 - 143,
The fixture createPageFixture hardcodes excludeFromSearch: false but the
overrides type doesn't expose that field; update the Partial type used for
overrides to include excludeFromSearch?: boolean and change the object to use
overrides.excludeFromSearch ?? false so callers can override excludeFromSearch
when creating fixtures (refer to createPageFixture, the overrides Partial<...>
and the excludeFromSearch property).
apps/web/src/app/api/ai/chat/messages/[messageId]/__tests__/route.test.ts (1)

85-163: The page lookup fixture is turning into a second schema definition.

Adding excludeFromSearch here means every new pages column has to be hand-copied into both PageLookupResult and mockPageLookup. Consider deriving this from a shared test factory or an existing page lookup contract so schema changes do not require manual sync in every suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/app/api/ai/chat/messages/`[messageId]/__tests__/route.test.ts
around lines 85 - 163, The test defines a duplicate schema via PageLookupResult
and mockPageLookup which drifts from the canonical pages schema; refactor to
derive the fixture from the single source of truth by replacing the inline
PageLookupResult/mockPageLookup with a shared factory or contract importer
(e.g., import the real Page lookup type or a test factory and call
createPageLookup({ ...overrides })) so tests use the same type/shape as
production — update usages of PageLookupResult and mockPageLookup to use the
shared factory/type and remove the ad-hoc duplicate definitions.
apps/web/src/app/api/search/__tests__/search-exclusion.test.ts (1)

4-21: Add one runtime assertion alongside these schema checks.

These tests lock the column shape, but they will not fail if /api/search stops honoring excludeFromSearch. A small route-level test with one excluded page and one visible page would cover the user-facing behavior added in this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/web/src/app/api/search/__tests__/search-exclusion.test.ts` around lines
4 - 21, Add a runtime assertion to the search-exclusion test that verifies the
/api/search route respects pages.excludeFromSearch: create two test pages (one
with pages.excludeFromSearch = true and one false), call the route handler
(e.g., issue a GET to '/api/search' or invoke the search handler used in tests),
and assert that the excluded page is not present in the response while the
visible page is; reuse existing test fixtures/setups and reference
pages.excludeFromSearch and the '/api/search' endpoint so the test fails if
route-level filtering stops honoring the schema flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/web/src/app/api/admin/users/`[userId]/data/route.ts:
- Around line 68-74: The audit currently logs raw PII: pass a masked target
email and a bounded reason code to logging instead of raw values. Before calling
logUserActivity and api.info, derive maskedEmail =
`${localPart.slice(0,3)}***@${domain}` from user.email and replace any free-form
reason text with a fixed enum/code (e.g., reasonCode) or a validated short
token; then call logUserActivity(adminUser.id, 'account_delete', { targetUserId:
userId, targetUserEmail: maskedEmail }, actorInfo) and ensure api.info writes
reasonCode (not the free-form reason). Apply the same masking/reason-code
replacement for the other occurrence around the api.info call referenced at the
second location.
- Around line 36-66: The current TOCTOU between accountRepository.getOwnedDrives
/ getDriveMemberCount and the subsequent deleteDrive calls can delete a drive
that gains members mid-check and also performs multiple independent irreversible
writes that may leave erasure half-complete; refactor by moving the entire drive
ownership-check + deletion into a single atomic repository operation or a
resumable deletion job (e.g., add a new method like
accountRepository.deleteOwnedDrivesAtomically(userId) or
accountRepository.startResumableDeletionJob(userId)) that (1) verifies member
counts and deletes drives inside a DB transaction or coordinated job so the
check-and-delete cannot be interleaved, (2) performs all related cleanup (drive
deletion, AI-log cleanup, user data erasure) as part of the same transactional
flow or job steps, and (3) surfaces any failure (fail-closed) so the route
returns an error instead of reporting success when any step fails.

In `@apps/web/src/app/api/admin/users/`[userId]/export/route.ts:
- Around line 30-34: The response returned by the DSAR export success path
doesn't set Cache-Control and may be cached; update the final Response.json call
in route.ts (the line returning Response.json({... exportedAt, exportedBy ...}))
to include a response init with headers that set "Cache-Control": "no-store" (or
use a Response wrapper with that header) so the exported user data is never
cached by browsers or intermediates.

In `@docs/security/search-plaintext-disclosure.md`:
- Around line 13-19: The disclosure omits two server-side plaintext access
patterns: update the document to include (4) File processing — note the
processor service updates pages.content directly when extracting text
(reference: apps/processor/src/db.ts and the UPDATE pages SET content = $1 call
in the processor DB logic) and (5) Activity digest generation — note the
pulse/generate route reads stored content snapshots (reference:
apps/web/src/app/api/pulse/generate/route.ts and the contentSnapshot usage that
builds activity diffs/summaries); add concise bullet points describing these
operations and their requirement for plaintext access so the disclosure is
complete.
- Line 57: Replace the ambiguous bullet titled "**Content redaction in logs**"
(the line "- **Content redaction in logs** — Activity logs may contain content
snapshots for audit trail purposes (limited to 1MB per entry).") with an
explicit statement that activity log snapshots retain raw, unredacted plaintext;
e.g. update the bullet to read "**Content in activity logs** — Activity logs
preserve raw content snapshots for audit and rollback purposes (limited to 1MB
per entry). Content snapshots are not redacted and may contain sensitive data."
to make clear these snapshots are not PII-scrubbed.

In `@packages/lib/src/compliance/pii-scrubber.ts`:
- Around line 13-22: CREDIT_CARD_PATTERN and scrubPII currently only match 4x4
16-digit cards; update to first find PAN candidates using a regex that matches
13–19 digits allowing spaces/dashes (e.g. groups of digits separated by optional
separators), then in scrubPII replace matches only after validating the
candidate with a Luhn check: implement a helper luhnValidate(numberString) and
in scrubPII iterate matches of the new PAN candidate regex, strip separators,
run luhnValidate, and replace only those that pass with '[CC_REDACTED]' while
leaving non-Luhn matches untouched; reference CREDIT_CARD_PATTERN (rename or
replace) and scrubPII for changes and add the new luhnValidate helper.

---

Nitpick comments:
In `@apps/web/src/app/api/ai/chat/messages/`[messageId]/__tests__/route.test.ts:
- Around line 85-163: The test defines a duplicate schema via PageLookupResult
and mockPageLookup which drifts from the canonical pages schema; refactor to
derive the fixture from the single source of truth by replacing the inline
PageLookupResult/mockPageLookup with a shared factory or contract importer
(e.g., import the real Page lookup type or a test factory and call
createPageLookup({ ...overrides })) so tests use the same type/shape as
production — update usages of PageLookupResult and mockPageLookup to use the
shared factory/type and remove the ad-hoc duplicate definitions.

In `@apps/web/src/app/api/search/__tests__/search-exclusion.test.ts`:
- Around line 4-21: Add a runtime assertion to the search-exclusion test that
verifies the /api/search route respects pages.excludeFromSearch: create two test
pages (one with pages.excludeFromSearch = true and one false), call the route
handler (e.g., issue a GET to '/api/search' or invoke the search handler used in
tests), and assert that the excluded page is not present in the response while
the visible page is; reuse existing test fixtures/setups and reference
pages.excludeFromSearch and the '/api/search' endpoint so the test fails if
route-level filtering stops honoring the schema flag.

In `@apps/web/src/app/api/tasks/__tests__/route.test.ts`:
- Around line 101-143: The fixture createPageFixture hardcodes
excludeFromSearch: false but the overrides type doesn't expose that field;
update the Partial type used for overrides to include excludeFromSearch?:
boolean and change the object to use overrides.excludeFromSearch ?? false so
callers can override excludeFromSearch when creating fixtures (refer to
createPageFixture, the overrides Partial<...> and the excludeFromSearch
property).

In `@docs/security/search-plaintext-disclosure.md`:
- Line 39: Reword the sentence "still stored in plaintext (exclusion affects
search queries, not storage)" to remove "still" and explicitly state the flag
only changes query-time behavior and does not alter how or where page content is
stored; locate the exact text in the docs line containing "still stored in
plaintext (exclusion affects search queries, not storage)" and replace it with a
clearer phrasing that mentions "no impact on storage" and "only affects
query-time filtering (search exclusion)".

In `@packages/lib/src/compliance/__tests__/pii-scrubber.test.ts`:
- Around line 4-52: Add tests exercising phone redaction and non-16-digit PANs
by extending the scrubPII test suite: add at least one case calling scrubPII
with phone formats (e.g., "(555) 123-4567" and "555-123-4567") and assert the
result contains "[PHONE_REDACTED]" and no raw digits (covers PHONE_PATTERN), and
add a case with a 15-digit AmEx-style card (e.g., "378282246310005" or with
spaces) and assert it becomes "[CC_REDACTED]" (covers non-16-digit PAN paths in
the scrubber); reference scrubPII and PHONE_PATTERN so the tests directly
validate those code paths.

In `@packages/lib/src/monitoring/ai-monitoring.ts`:
- Around line 447-452: Remove the leftover blank line between the destructuring
assignment and the following comment/if block: keep the line "let { inputTokens,
outputTokens, totalTokens } = data;" directly followed by the comment and the
if-check that computes totalTokens; this cleans up the empty line artifact
around the inputTokens/outputTokens/totalTokens code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aafd7660-b4ac-4690-8074-58d4493b2d12

📥 Commits

Reviewing files that changed from the base of the PR and between d6f7bef and 85c78b8.

📒 Files selected for processing (23)
  • apps/web/src/app/api/account/route.ts
  • apps/web/src/app/api/admin/users/[userId]/data/__tests__/route.test.ts
  • apps/web/src/app/api/admin/users/[userId]/data/route.ts
  • apps/web/src/app/api/admin/users/[userId]/export/__tests__/route.test.ts
  • apps/web/src/app/api/admin/users/[userId]/export/route.ts
  • apps/web/src/app/api/ai/chat/messages/[messageId]/__tests__/route.test.ts
  • apps/web/src/app/api/ai/chat/route.ts
  • apps/web/src/app/api/search/__tests__/search-exclusion.test.ts
  • apps/web/src/app/api/search/multi-drive/route.ts
  • apps/web/src/app/api/search/route.ts
  • apps/web/src/app/api/tasks/__tests__/route.test.ts
  • docs/security/mcp-trust-boundaries.md
  • docs/security/search-plaintext-disclosure.md
  • packages/db/drizzle/0090_smooth_king_bedlam.sql
  • packages/db/drizzle/meta/0090_snapshot.json
  • packages/db/drizzle/meta/_journal.json
  • packages/db/src/schema/core.ts
  • packages/lib/package.json
  • packages/lib/src/compliance/__tests__/pii-scrubber.test.ts
  • packages/lib/src/compliance/anonymize.ts
  • packages/lib/src/compliance/pii-scrubber.ts
  • packages/lib/src/monitoring/ai-monitoring.ts
  • packages/lib/src/services/drive-search-service.ts
💤 Files with no reviewable changes (1)
  • apps/web/src/app/api/ai/chat/route.ts

Comment thread apps/web/src/app/api/admin/users/[userId]/data/route.ts Outdated
Comment thread apps/web/src/app/api/admin/users/[userId]/data/route.ts
Comment thread apps/web/src/app/api/admin/users/[userId]/export/route.ts Outdated
Comment thread docs/security/search-plaintext-disclosure.md Outdated
Comment thread docs/security/search-plaintext-disclosure.md Outdated
Comment thread packages/lib/src/compliance/pii-scrubber.ts Outdated
- DSAR deletion: use atomic checkAndDeleteSoloDrives transaction to
  eliminate TOCTOU race between member count check and drive deletion
- DSAR deletion: mask target email in audit logs using established
  localPart.slice(0,3)***@Domain pattern to avoid PII retention
- DSAR deletion: propagate AI usage log cleanup errors (fail-closed)
- DSAR export: add Cache-Control: no-store header to prevent browser
  caching of exported user data
- PII scrubber: broaden credit card detection to 13-19 digit PANs with
  Luhn validation; add phone and AmEx test coverage
- Docs: add file processing and activity digest to plaintext disclosure;
  clarify that activity log snapshots are unredacted
- Clean up blank line artifact in ai-monitoring.ts
- Make tasks test fixture overridable for excludeFromSearch
@2witstudios
Copy link
Copy Markdown
Owner Author

Nitpick follow-ups (from CodeRabbit summary review)

All 6 actionable review comments have been addressed and resolved in commit c384abb. Of the 6 nitpick suggestions:

Addressed (4/6):

  1. ✅ Clarified "still stored in plaintext" wording in search disclosure doc
  2. ✅ Removed empty line artifact in ai-monitoring.ts
  3. ✅ Added phone number and AmEx regression tests for PII scrubber
  4. ✅ Made excludeFromSearch overridable in createPageFixture

Deferred (2/6) — out of scope for this PR:
5. Shared test factory for PageLookupResult — Valid suggestion. Creating a shared test factory across suites is a broader refactor that should be its own follow-up PR to avoid scope creep.
6. Runtime search exclusion assertion — Good idea. Adding a route-level integration test for /api/search respecting excludeFromSearch requires more substantial mock infrastructure and is better suited as a follow-up.

Both deferred items are tracked for future work.

Master added checkMCPPageScope to the tasks route (b592511) but the
test mock wasn't updated. This caused CI failures when merging.
@2witstudios 2witstudios merged commit 0ae1593 into master Mar 11, 2026
12 of 14 checks passed
@2witstudios 2witstudios deleted the ppg/data-compliance branch March 11, 2026 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant