feat: DSAR, AI anonymization, MCP and search disclosures (#545-552)#743
feat: DSAR, AI anonymization, MCP and search disclosures (#545-552)#7432witstudios merged 10 commits intomasterfrom
Conversation
Admin endpoints for data subject access requests: - GET /api/admin/users/[userId]/export — exports all user data as JSON - DELETE /api/admin/users/[userId]/data — anonymizes and deletes user data Both endpoints use withAdminAuth wrapper with CSRF protection, audit logging, and proper GDPR compliance patterns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Stop logging raw prompt/completion content to aiUsageLogs - Remove prompt/completion from metadata JSONB in ai-monitoring - Add PII scrubber utility for defense-in-depth redaction of emails, SSNs, credit cards, and phone numbers - 8 unit tests for PII scrubber Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Documents the MCP token security model including: - Hash-only token storage and fail-closed scoping - Trust boundaries between MCP clients and PageSpace - Known scope enforcement gaps on hybrid routes - Recommendations for users creating MCP tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add docs/security/search-plaintext-disclosure.md documenting the plaintext content storage tradeoff - Add excludeFromSearch boolean to pages schema (defaults to false) - Update multi-drive, regex, glob, and main search queries to exclude pages with excludeFromSearch=true - Generate migration 0090_smooth_king_bedlam.sql Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughIntroduces admin DSAR export and deletion endpoints, adds per-page search exclusion functionality, implements PII scrubbing and anonymization utilities in shared library, removes prompt/completion from AI usage tracking, and adds security documentation for MCP token trust boundaries and plaintext search storage. Changes
Sequence DiagramssequenceDiagram
participant Admin as Admin
participant API as GET /api/admin/users/[userId]/export
participant Auth as Admin Auth
participant DB as Database
participant Logger as Logger
Admin->>API: GET request with userId
API->>Auth: Check admin credentials
Auth-->>API: Auth verified
API->>DB: collectAllUserData(userId)
DB-->>API: User data (profile, drives, pages, etc.)
alt User not found
API-->>Admin: 404 {error: 'User not found'}
else Success
API->>Logger: Log export (admin ID, target user ID)
API->>API: Augment data with exportedAt, exportedBy
API-->>Admin: 200 {data, exportedAt, exportedBy}
end
sequenceDiagram
participant Admin as Admin
participant API as DELETE /api/admin/users/[userId]/data
participant Auth as Admin Auth
participant DB as Database
participant Logger as Logger
participant Anon as Anonymize
Admin->>API: DELETE request with userId, reason
API->>Auth: Check admin credentials
Auth-->>API: Auth verified
API->>DB: Lookup user
alt Self-deletion attempt
API-->>Admin: 403 Forbidden
else User not found
API-->>Admin: 404 Not found
else Multi-member drives owned
API-->>Admin: 400 {drives requiring transfer}
else Proceed
API->>DB: Delete solo-member drives (parallel)
API->>Logger: Log deletion entry (anonymized actor)
API->>DB: Anonymize activity logs
API->>DB: Delete AI usage logs (non-fatal errors)
API->>DB: Delete user record
API->>Logger: Log success
API-->>Admin: 200 {deleted: true}
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Remove unused AdminRouteContext type import from export and data routes - Remove unused withAdminAuth import from export test - Remove unused userPromptContent variable from AI chat route Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove invalid adminAction/reason fields from logUserActivity call - Add excludeFromSearch to page mock fixtures in tasks and chat message tests - Fix UserAccount mock shape (remove nonexistent name/password fields) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract duplicated createAnonymizedActorEmail to packages/lib/src/compliance/anonymize.ts - Fix unsafe body type cast with runtime validation in admin data route - Remove dead prompt/completion fields from AIUsageData interface - Remove dead token estimation code that referenced prompt/completion - Remove redundant prompt/completion undefined assignments from tracking - Deduplicate regexSearchPages where-clause branches into single composition - Parallelize solo drive deletion in admin DSAR endpoint
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (6)
docs/security/search-plaintext-disclosure.md (1)
39-39: Consider clarifying that the flag has zero impact on storage.The phrase "still stored in plaintext (exclusion affects search queries, not storage)" is accurate, but the word "still" may imply the flag could have changed storage behavior. Consider rephrasing to make it clearer that the flag only affects query-time filtering:
-- The page content is **still stored in plaintext** (exclusion affects search queries, not storage). ++ The page content **remains stored in plaintext** — the flag only affects query-time filtering, not storage or retention.This is a minor clarity improvement and entirely optional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/security/search-plaintext-disclosure.md` at line 39, Reword the sentence "still stored in plaintext (exclusion affects search queries, not storage)" to remove "still" and explicitly state the flag only changes query-time behavior and does not alter how or where page content is stored; locate the exact text in the docs line containing "still stored in plaintext (exclusion affects search queries, not storage)" and replace it with a clearer phrasing that mentions "no impact on storage" and "only affects query-time filtering (search exclusion)".packages/lib/src/monitoring/ai-monitoring.ts (1)
447-452: Minor: Remove empty line artifact.Line 448 appears to be a leftover empty line from the removed token estimation logic.
🧹 Suggested cleanup
try { // Calculate tokens if not provided let { inputTokens, outputTokens, totalTokens } = data; - // Calculate total if not provided🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/monitoring/ai-monitoring.ts` around lines 447 - 452, Remove the leftover blank line between the destructuring assignment and the following comment/if block: keep the line "let { inputTokens, outputTokens, totalTokens } = data;" directly followed by the comment and the if-check that computes totalTokens; this cleans up the empty line artifact around the inputTokens/outputTokens/totalTokens code.packages/lib/src/compliance/__tests__/pii-scrubber.test.ts (1)
4-52: Add regression coverage for phones and non-16-digit cards.The new scrubber logic in
packages/lib/src/compliance/pii-scrubber.tsisn't exercised forPHONE_PATTERN, and the only card case here is a 16-digit Visa-like number. This suite would still pass if phone redaction or an AmEx-style PAN path broke.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/lib/src/compliance/__tests__/pii-scrubber.test.ts` around lines 4 - 52, Add tests exercising phone redaction and non-16-digit PANs by extending the scrubPII test suite: add at least one case calling scrubPII with phone formats (e.g., "(555) 123-4567" and "555-123-4567") and assert the result contains "[PHONE_REDACTED]" and no raw digits (covers PHONE_PATTERN), and add a case with a 15-digit AmEx-style card (e.g., "378282246310005" or with spaces) and assert it becomes "[CC_REDACTED]" (covers non-16-digit PAN paths in the scrubber); reference scrubPII and PHONE_PATTERN so the tests directly validate those code paths.apps/web/src/app/api/tasks/__tests__/route.test.ts (1)
101-143: KeepcreatePageFixtureoverridable forexcludeFromSearch.The fixture now hardcodes
excludeFromSearch: false, but the override type does not expose that field. That makes it awkward to create an excluded-page variant in this suite if task/search interactions ever need coverage.♻️ Small diff
const createPageFixture = (overrides: Partial<{ id: string; driveId: string; title: string; + excludeFromSearch: boolean; }> = {}) => ({ id: overrides.id ?? 'page_1', driveId: overrides.driveId ?? 'drive_1', title: overrides.title ?? 'Test Page', @@ - excludeFromSearch: false, + excludeFromSearch: overrides.excludeFromSearch ?? false, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/app/api/tasks/__tests__/route.test.ts` around lines 101 - 143, The fixture createPageFixture hardcodes excludeFromSearch: false but the overrides type doesn't expose that field; update the Partial type used for overrides to include excludeFromSearch?: boolean and change the object to use overrides.excludeFromSearch ?? false so callers can override excludeFromSearch when creating fixtures (refer to createPageFixture, the overrides Partial<...> and the excludeFromSearch property).apps/web/src/app/api/ai/chat/messages/[messageId]/__tests__/route.test.ts (1)
85-163: The page lookup fixture is turning into a second schema definition.Adding
excludeFromSearchhere means every newpagescolumn has to be hand-copied into bothPageLookupResultandmockPageLookup. Consider deriving this from a shared test factory or an existing page lookup contract so schema changes do not require manual sync in every suite.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/app/api/ai/chat/messages/`[messageId]/__tests__/route.test.ts around lines 85 - 163, The test defines a duplicate schema via PageLookupResult and mockPageLookup which drifts from the canonical pages schema; refactor to derive the fixture from the single source of truth by replacing the inline PageLookupResult/mockPageLookup with a shared factory or contract importer (e.g., import the real Page lookup type or a test factory and call createPageLookup({ ...overrides })) so tests use the same type/shape as production — update usages of PageLookupResult and mockPageLookup to use the shared factory/type and remove the ad-hoc duplicate definitions.apps/web/src/app/api/search/__tests__/search-exclusion.test.ts (1)
4-21: Add one runtime assertion alongside these schema checks.These tests lock the column shape, but they will not fail if
/api/searchstops honoringexcludeFromSearch. A small route-level test with one excluded page and one visible page would cover the user-facing behavior added in this PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/web/src/app/api/search/__tests__/search-exclusion.test.ts` around lines 4 - 21, Add a runtime assertion to the search-exclusion test that verifies the /api/search route respects pages.excludeFromSearch: create two test pages (one with pages.excludeFromSearch = true and one false), call the route handler (e.g., issue a GET to '/api/search' or invoke the search handler used in tests), and assert that the excluded page is not present in the response while the visible page is; reuse existing test fixtures/setups and reference pages.excludeFromSearch and the '/api/search' endpoint so the test fails if route-level filtering stops honoring the schema flag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/web/src/app/api/admin/users/`[userId]/data/route.ts:
- Around line 68-74: The audit currently logs raw PII: pass a masked target
email and a bounded reason code to logging instead of raw values. Before calling
logUserActivity and api.info, derive maskedEmail =
`${localPart.slice(0,3)}***@${domain}` from user.email and replace any free-form
reason text with a fixed enum/code (e.g., reasonCode) or a validated short
token; then call logUserActivity(adminUser.id, 'account_delete', { targetUserId:
userId, targetUserEmail: maskedEmail }, actorInfo) and ensure api.info writes
reasonCode (not the free-form reason). Apply the same masking/reason-code
replacement for the other occurrence around the api.info call referenced at the
second location.
- Around line 36-66: The current TOCTOU between accountRepository.getOwnedDrives
/ getDriveMemberCount and the subsequent deleteDrive calls can delete a drive
that gains members mid-check and also performs multiple independent irreversible
writes that may leave erasure half-complete; refactor by moving the entire drive
ownership-check + deletion into a single atomic repository operation or a
resumable deletion job (e.g., add a new method like
accountRepository.deleteOwnedDrivesAtomically(userId) or
accountRepository.startResumableDeletionJob(userId)) that (1) verifies member
counts and deletes drives inside a DB transaction or coordinated job so the
check-and-delete cannot be interleaved, (2) performs all related cleanup (drive
deletion, AI-log cleanup, user data erasure) as part of the same transactional
flow or job steps, and (3) surfaces any failure (fail-closed) so the route
returns an error instead of reporting success when any step fails.
In `@apps/web/src/app/api/admin/users/`[userId]/export/route.ts:
- Around line 30-34: The response returned by the DSAR export success path
doesn't set Cache-Control and may be cached; update the final Response.json call
in route.ts (the line returning Response.json({... exportedAt, exportedBy ...}))
to include a response init with headers that set "Cache-Control": "no-store" (or
use a Response wrapper with that header) so the exported user data is never
cached by browsers or intermediates.
In `@docs/security/search-plaintext-disclosure.md`:
- Around line 13-19: The disclosure omits two server-side plaintext access
patterns: update the document to include (4) File processing — note the
processor service updates pages.content directly when extracting text
(reference: apps/processor/src/db.ts and the UPDATE pages SET content = $1 call
in the processor DB logic) and (5) Activity digest generation — note the
pulse/generate route reads stored content snapshots (reference:
apps/web/src/app/api/pulse/generate/route.ts and the contentSnapshot usage that
builds activity diffs/summaries); add concise bullet points describing these
operations and their requirement for plaintext access so the disclosure is
complete.
- Line 57: Replace the ambiguous bullet titled "**Content redaction in logs**"
(the line "- **Content redaction in logs** — Activity logs may contain content
snapshots for audit trail purposes (limited to 1MB per entry).") with an
explicit statement that activity log snapshots retain raw, unredacted plaintext;
e.g. update the bullet to read "**Content in activity logs** — Activity logs
preserve raw content snapshots for audit and rollback purposes (limited to 1MB
per entry). Content snapshots are not redacted and may contain sensitive data."
to make clear these snapshots are not PII-scrubbed.
In `@packages/lib/src/compliance/pii-scrubber.ts`:
- Around line 13-22: CREDIT_CARD_PATTERN and scrubPII currently only match 4x4
16-digit cards; update to first find PAN candidates using a regex that matches
13–19 digits allowing spaces/dashes (e.g. groups of digits separated by optional
separators), then in scrubPII replace matches only after validating the
candidate with a Luhn check: implement a helper luhnValidate(numberString) and
in scrubPII iterate matches of the new PAN candidate regex, strip separators,
run luhnValidate, and replace only those that pass with '[CC_REDACTED]' while
leaving non-Luhn matches untouched; reference CREDIT_CARD_PATTERN (rename or
replace) and scrubPII for changes and add the new luhnValidate helper.
---
Nitpick comments:
In `@apps/web/src/app/api/ai/chat/messages/`[messageId]/__tests__/route.test.ts:
- Around line 85-163: The test defines a duplicate schema via PageLookupResult
and mockPageLookup which drifts from the canonical pages schema; refactor to
derive the fixture from the single source of truth by replacing the inline
PageLookupResult/mockPageLookup with a shared factory or contract importer
(e.g., import the real Page lookup type or a test factory and call
createPageLookup({ ...overrides })) so tests use the same type/shape as
production — update usages of PageLookupResult and mockPageLookup to use the
shared factory/type and remove the ad-hoc duplicate definitions.
In `@apps/web/src/app/api/search/__tests__/search-exclusion.test.ts`:
- Around line 4-21: Add a runtime assertion to the search-exclusion test that
verifies the /api/search route respects pages.excludeFromSearch: create two test
pages (one with pages.excludeFromSearch = true and one false), call the route
handler (e.g., issue a GET to '/api/search' or invoke the search handler used in
tests), and assert that the excluded page is not present in the response while
the visible page is; reuse existing test fixtures/setups and reference
pages.excludeFromSearch and the '/api/search' endpoint so the test fails if
route-level filtering stops honoring the schema flag.
In `@apps/web/src/app/api/tasks/__tests__/route.test.ts`:
- Around line 101-143: The fixture createPageFixture hardcodes
excludeFromSearch: false but the overrides type doesn't expose that field;
update the Partial type used for overrides to include excludeFromSearch?:
boolean and change the object to use overrides.excludeFromSearch ?? false so
callers can override excludeFromSearch when creating fixtures (refer to
createPageFixture, the overrides Partial<...> and the excludeFromSearch
property).
In `@docs/security/search-plaintext-disclosure.md`:
- Line 39: Reword the sentence "still stored in plaintext (exclusion affects
search queries, not storage)" to remove "still" and explicitly state the flag
only changes query-time behavior and does not alter how or where page content is
stored; locate the exact text in the docs line containing "still stored in
plaintext (exclusion affects search queries, not storage)" and replace it with a
clearer phrasing that mentions "no impact on storage" and "only affects
query-time filtering (search exclusion)".
In `@packages/lib/src/compliance/__tests__/pii-scrubber.test.ts`:
- Around line 4-52: Add tests exercising phone redaction and non-16-digit PANs
by extending the scrubPII test suite: add at least one case calling scrubPII
with phone formats (e.g., "(555) 123-4567" and "555-123-4567") and assert the
result contains "[PHONE_REDACTED]" and no raw digits (covers PHONE_PATTERN), and
add a case with a 15-digit AmEx-style card (e.g., "378282246310005" or with
spaces) and assert it becomes "[CC_REDACTED]" (covers non-16-digit PAN paths in
the scrubber); reference scrubPII and PHONE_PATTERN so the tests directly
validate those code paths.
In `@packages/lib/src/monitoring/ai-monitoring.ts`:
- Around line 447-452: Remove the leftover blank line between the destructuring
assignment and the following comment/if block: keep the line "let { inputTokens,
outputTokens, totalTokens } = data;" directly followed by the comment and the
if-check that computes totalTokens; this cleans up the empty line artifact
around the inputTokens/outputTokens/totalTokens code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: aafd7660-b4ac-4690-8074-58d4493b2d12
📒 Files selected for processing (23)
apps/web/src/app/api/account/route.tsapps/web/src/app/api/admin/users/[userId]/data/__tests__/route.test.tsapps/web/src/app/api/admin/users/[userId]/data/route.tsapps/web/src/app/api/admin/users/[userId]/export/__tests__/route.test.tsapps/web/src/app/api/admin/users/[userId]/export/route.tsapps/web/src/app/api/ai/chat/messages/[messageId]/__tests__/route.test.tsapps/web/src/app/api/ai/chat/route.tsapps/web/src/app/api/search/__tests__/search-exclusion.test.tsapps/web/src/app/api/search/multi-drive/route.tsapps/web/src/app/api/search/route.tsapps/web/src/app/api/tasks/__tests__/route.test.tsdocs/security/mcp-trust-boundaries.mddocs/security/search-plaintext-disclosure.mdpackages/db/drizzle/0090_smooth_king_bedlam.sqlpackages/db/drizzle/meta/0090_snapshot.jsonpackages/db/drizzle/meta/_journal.jsonpackages/db/src/schema/core.tspackages/lib/package.jsonpackages/lib/src/compliance/__tests__/pii-scrubber.test.tspackages/lib/src/compliance/anonymize.tspackages/lib/src/compliance/pii-scrubber.tspackages/lib/src/monitoring/ai-monitoring.tspackages/lib/src/services/drive-search-service.ts
💤 Files with no reviewable changes (1)
- apps/web/src/app/api/ai/chat/route.ts
- DSAR deletion: use atomic checkAndDeleteSoloDrives transaction to eliminate TOCTOU race between member count check and drive deletion - DSAR deletion: mask target email in audit logs using established localPart.slice(0,3)***@Domain pattern to avoid PII retention - DSAR deletion: propagate AI usage log cleanup errors (fail-closed) - DSAR export: add Cache-Control: no-store header to prevent browser caching of exported user data - PII scrubber: broaden credit card detection to 13-19 digit PANs with Luhn validation; add phone and AmEx test coverage - Docs: add file processing and activity digest to plaintext disclosure; clarify that activity log snapshots are unredacted - Clean up blank line artifact in ai-monitoring.ts - Make tasks test fixture overridable for excludeFromSearch
Nitpick follow-ups (from CodeRabbit summary review)All 6 actionable review comments have been addressed and resolved in commit c384abb. Of the 6 nitpick suggestions: Addressed (4/6):
Deferred (2/6) — out of scope for this PR: Both deferred items are tracked for future work. |
Master added checkMCPPageScope to the tasks route (b592511) but the test mock wasn't updated. This caused CI failures when merging.
Summary
GET /api/admin/users/[userId]/export) and delete/anonymize user data (DELETE /api/admin/users/[userId]/data) with withAdminAuth, CSRF protection, and audit loggingexcludeFromSearchboolean to pages schema with migration; updated all search queries (multi-drive, regex, glob, main) to respect the flagReview feedback addressed
checkAndDeleteSoloDrives()DB transactionmaskEmail()helper to avoid logging raw email in activity metadatano-storeto DSAR export response to prevent browser/proxy cachingTest plan
excludeFromSearchfield exists on pages schema (4 tests)How to validate
pnpm test:unit— all unit tests passnpx vitest run apps/web/src/app/api/admin/users/\[userId\]/data/__tests__/route.test.tsnpx vitest run packages/lib/src/compliance/__tests__/pii-scrubber.test.ts🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests
Chores