Skip to content

fix: /api/health/db-smoke endpoint + hardened databaseProbe + error surfacing in UI#1410

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-db-smoke-api-endpoint
Draft

fix: /api/health/db-smoke endpoint + hardened databaseProbe + error surfacing in UI#1410
Copilot wants to merge 5 commits intomainfrom
copilot/add-db-smoke-api-endpoint

Conversation

Copy link
Contributor

Copilot AI commented Mar 25, 2026

database: down with latency_ms: 0 was silently swallowing all Prisma errors, making it impossible to distinguish Zod validation failures from network timeouts. This PR adds a dedicated smoke-test endpoint, hardens the existing probe, and surfaces error details in both the admin dashboard and performance UI.

Description

database: down with latency_ms: 0 was silently swallowing all Prisma errors, making it impossible to distinguish Zod validation failures from network timeouts. This PR adds a dedicated smoke-test endpoint, hardens the existing probe, and surfaces error details in both the admin dashboard and performance UI.

Changes

A — New GET /api/health/db-smoke

  • Runs current_database(), version(), now(), and COUNT(*) on information_schema.tables via Promise.all
  • Returns { ok, db_name, pg_version, server_time, table_count, latency_ms, hyperdrive_host }
  • ok: false + status 503 on failure; 400 when HYPERDRIVE binding absent
  • Added to MONITORING_API_PATHS — unauthenticated, same tier as /api/health

B — Hardened databaseProbe

  • DatabaseResult extended: error_code?, error_message? — catch block now surfaces both instead of swallowing them
  • error_message redacts postgres:// / postgresql:// credentials before surfacing: .replace(/postgre(?:s|sql):\/\/[^\s"',}]*/gi, '[redacted]')
  • 5 s timeout via Promise.race + setTimeout; timeoutId declared at the outer try scope so clearTimeout is called in both the success and catch paths — no dangling timers when $queryRaw rejects quickly
  • createPrismaClient moved inside try so Zod validation errors are caught and classified
  • $disconnect() in finally on both databaseProbe and handleDbSmoke

C — Angular error surfacing

  • HealthServiceResult / HealthResponse.services.database extended with error_message? and db_name?
  • performance.component.ts: shows a db-error-row chip under the health card when database is down/degraded; uses @let db = h.services?.database to properly narrow the optional services type and eliminate Angular strict null check (TS2532) errors
  • dashboard.component.ts: dbDetail() returns "<db_name> — <error_message>" for non-healthy database entries

D — Tests

  • Shared makeHealthyPrisma() / makeFailingPrisma() helpers (include $disconnect stub); makeFailingPrisma uses a properly expanded multi-line arrow function body to satisfy deno fmt
  • New: error_code/error_message present on failure; credential redaction
  • New timeout test: uses FakeTime from @std/testing/time with a never-resolving $queryRaw mock — fakeTime.tickAsync(5001) actually triggers the Promise.race timer branch, so regressions in timer creation/clearing are caught
  • New: handleDbSmoke — happy path (with expanded one-property-per-line inline type cast), 503 path, 400 (missing binding), redaction
  • Import order in health.test.ts corrected to satisfy deno fmt alphabetical ordering requirement

E — Docs

  • KB-002: ## Session Log — 2026-03-25 captures the full incident narrative
  • neon-troubleshooting.md: live session section — .hyperdrive.local meaning, latency_ms: 0 decision tree, wrangler tail as first debugging step
  • New KB-003-neon-hyperdrive-live-session-2026-03-25.md: full KB with diagnostic commands, Mermaid flowchart TD decision tree (replacing ASCII art), resolution checklist, post-deploy prevention script

Testing

  • Unit tests added/updated — all new probe behavior, redaction, real-timer timeout (via FakeTime), and smoke-test scenarios covered in health.test.ts
  • Manual testing performed — handler logic verified by code review; route wiring verified against existing /health and /health/latest patterns
  • CI passes

Zero Trust Architecture Checklist

Worker / Backend

  • Every handler verifies auth before executing business logic — handleDbSmoke and handleHealth are explicitly public/anonymous diagnostic endpoints; no auth bypass for protected routes
  • CORS origin allowlist enforced (not *) on write/authenticated endpoints — /api/health/db-smoke added to MONITORING_API_PATHS which inherits the existing public-endpoint CORS prefix match; no wildcard introduced
  • All secrets accessed via Worker Secret bindings (not [vars]) — no new secrets; uses existing env.HYPERDRIVE binding
  • All external inputs Zod-validated before use — createPrismaClient validates the connection string via PrismaClientConfigSchema (inside try now, so errors are caught)
  • All D1 queries use parameterized .prepare().bind() — N/A; this PR uses Prisma $queryRaw tagged template literals (parameterized by design)
  • Security events emitted to Analytics Engine on auth failures — N/A; diagnostic endpoints have no auth layer to fail

Frontend / Angular

  • Protected routes have functional CanActivateFn auth guards — N/A; only type extensions and template changes to existing public health display
  • Auth tokens managed via Clerk SDK (not localStorage) — N/A
  • HTTP interceptor attaches ****** (no manual token passing) — N/A
  • API responses validated with Zod schemas before consumption — N/A; health response shapes are TypeScript interfaces consumed by the template only
Original prompt

Summary

The live site at https://adblock-frontend.jayson-knight.workers.dev/ shows:

  • "Degraded performance — v0.75.0" and "Data may be stale" banners
  • /api/health returns database.status: "down" with latency_ms: 0
  • Cloudflare Hyperdrive admin page shows zero traffic (but Neon shows migration activity)
  • Health response shows hyperdrive_host: "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local" — which is the correct local proxy

The root cause (already documented in KB-002) is confirmed and fixed, but additional hardening, an /api/health/db-smoke endpoint, improved error surfacing in the UI, and documentation are needed.


Changes Required

A — Add a dedicated /api/health/db-smoke smoke-test endpoint

Add a new handler handleDbSmoke in worker/handlers/health.ts that:

  1. Validates env.HYPERDRIVE is present (returns 400 if not).
  2. Creates a PrismaClient via _internals.createPrismaClient(env.HYPERDRIVE.connectionString).
  3. Runs SELECT current_database() AS db_name, version() AS pg_version, now() AS server_time via $queryRaw.
  4. Runs SELECT COUNT(*) AS table_count FROM information_schema.tables WHERE table_schema = 'public' to verify the schema is populated.
  5. Returns a JSON response:
    {
      "ok": true,
      "db_name": "adblock-compiler",
      "pg_version": "PostgreSQL 16.x ...",
      "server_time": "2026-03-25T21:59:15.917Z",
      "table_count": 17,
      "latency_ms": 42,
      "hyperdrive_host": "ep-winter-term-a8rxh2a9-pooler.eastus2.azure.neon.tech"
    }
  6. Returns { ok: false, error: "...", hyperdrive_host: "..." } with status 503 on any failure.
  7. The endpoint must be unauthenticated (diagnostic-only, no secrets in response).
  8. Wire it in worker/worker.ts (or wherever routes are registered) at GET /api/health/db-smoke.

B — Harden databaseProbe in worker/handlers/health.ts

The current catch block silently swallows all errors. Update it to:

  1. Capture error_code and error_message from the caught exception (safely, without logging secrets).
  2. Return them in the down response shape:
    type DatabaseResult = ServiceResult & {
      db_name?: string;
      hyperdrive_host?: string;
      error_code?: string;
      error_message?: string;
    };
  3. Strip any connection string fragments from error_message before surfacing (replace anything that looks like postgres://... with [redacted]).
  4. Add a 5000ms timeout (AbortSignal.timeout) to the database probe so a hung Hyperdrive connection doesn't block the health response indefinitely.
  5. Disconnect the Prisma client in a finally block after each probe (await prisma.$disconnect()).

C — Add error_message to Angular UI health display

In frontend/src/app/performance/performance.component.ts:

  1. Extend the HealthServiceResult interface to include readonly error_message?: string and readonly db_name?: string.
  2. When the database service is down or degraded, surface the error_message and db_name fields as a sub-caption under the database health card row (template change in the same file).

In frontend/src/app/admin/dashboard/dashboard.component.ts:

  1. Extend HealthResponse.services.database to include readonly error_message?: string and readonly db_name?: string.
  2. Surface db_name and error_message as the detail field in the HealthCheck object returned by mapHealthChecks for the database entry.

D — Unit tests

  1. In worker/handlers/health.test.ts, add tests for the updated databaseProbe:

    • When Prisma throws, the response includes error_code and error_message in services.database.
    • error_message does NOT contain postgres:// credentials.
    • A timeout scenario (mock that never resolves within 5s) results in status: "down" with a timeout error_code.
  2. Add a new test file worker/handlers/health-db-smoke.test.ts (or add to existing) for handleDbSmoke:

    • Happy path: mock Prisma returns expected rows → ok: true.
    • Sad path: mock Prisma throws → ok: false, status 503.
    • Missing HYPERDRIVE → status 400.

E — Documentation

  1. Update docs/troubleshooting/KB-002-hyperdrive-database-down.md to add a new section at the top titled ## Session Log — 2026-03-25 that captures this full troubleshooting session:
    • Symptoms: UI banners, latency_ms: 0, zero Hyperdrive traffic
    • Diagnosis steps: curl /api/health, wrangler hyperdrive get, checking Hyperdrive host in health response showing .hyperdrive.local (correct), wrangler tail recommendations
    • Root cause found: connection string Zod validation rejecting postgres:// (already fixed in PrismaClientConfigSchema)
    • Remaining issue: database still down at v0.76.0 even with schema fix — hyperdrive_host shows .hyperdrive.local which is correct (Hyperdrive IS the local proxy), so the issue is that the actual query never completes
    • Resolution path: new `/api...

This pull request was created from Copilot chat.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@jaypatrick jaypatrick added bug Something isn't working enhancement New feature or request frontend Angular frontend labels Mar 25, 2026
@jaypatrick jaypatrick added this to the beta milestone Mar 25, 2026
…g in UI + KB-003 docs

Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com>
Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/02242699-4206-4e4c-8002-10ed00ca7254
Copilot AI changed the title [WIP] Add dedicated /api/health/db-smoke endpoint fix: /api/health/db-smoke endpoint + hardened databaseProbe + error surfacing in UI Mar 26, 2026
Copilot AI requested a review from jaypatrick March 26, 2026 00:04
@jaypatrick jaypatrick requested a review from Copilot March 26, 2026 00:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated database smoke-test endpoint and improves database health probing so Prisma failures are observable (including safe redaction), then surfaces those diagnostics in the Angular health UI and troubleshooting docs.

Changes:

  • Add new public GET /api/health/db-smoke endpoint and route wiring.
  • Harden DB probing in /api/health with timeout, $disconnect(), and surfaced error_code/error_message (with connection-string redaction).
  • Extend Angular health displays + add/expand troubleshooting KB docs and unit tests.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
worker/hono-app.ts Adds /api/health/db-smoke to public monitoring paths and wires route handler.
worker/handlers/health.ts Implements handleDbSmoke; enhances databaseProbe with timeout, error surfacing, redaction, and disconnect.
worker/handlers/health.test.ts Adds tests for error surfacing/redaction and the smoke-test endpoint.
frontend/src/app/performance/performance.component.ts Displays DB name + error details when DB health is down/degraded.
frontend/src/app/admin/dashboard/dashboard.component.ts Shows DB detail string combining db_name and error_message for degraded/down states.
docs/troubleshooting/neon-troubleshooting.md Adds a detailed “live troubleshooting session” section and post-deploy smoke check guidance.
docs/troubleshooting/README.md Promotes KB-003 to active and renumbers planned KBs.
docs/troubleshooting/KB-003-neon-hyperdrive-live-session-2026-03-25.md New standalone KB capturing the live session and procedures.
docs/troubleshooting/KB-002-hyperdrive-database-down.md Adds a “Session Log” and links to KB-003.
docs/SUMMARY.md Adds KB-003 to the docs navigation.

@jaypatrick
Copy link
Owner

@copilot apply changes based on the comments in this thread

…aid decision tree in KB-003

Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com>
Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/573b8f36-14c1-4fb4-941a-c813d333cc16
Copy link
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot apply changes based on the comments in this thread

Applied all three review comments in commit 13fa61c:

  1. Timer cleanup (health.ts) — timeoutId is now declared at the outer try scope so clearTimeout(timeoutId) is called in both the success path and the catch block, ensuring no dangling timer when $queryRaw rejects before the 5 s deadline.

  2. Timeout test (health.test.ts) — replaced the pre-baked PROBE_TIMEOUT error stub with a FakeTime-based test that uses a never-resolving $queryRaw mock and calls fakeTime.tickAsync(5001) to actually trigger the Promise.race timeout branch.

  3. Mermaid decision tree (KB-003) — replaced the ASCII fenced block with a flowchart TD Mermaid diagram per the docs standard.

@jaypatrick
Copy link
Owner

@copilot fix the CI issues

Copy link
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot fix the CI issues

Fixed in commit 0b789c6. Two CI failures:

  1. deno fmt failure (health.test.ts) — imports were not in alphabetical order; handleDbSmoke now comes before handleHealth/handleHealthLatest.

  2. Angular TS2532 failures (performance.component.ts) — 7 "Object is possibly undefined" errors because Angular's strict null checker doesn't narrow h.services to non-optional inside an @if (h.services?.database && ...) block. Replaced the pattern with @let db = h.services?.database; + @if (db && ...) so all inner accesses use the narrowed db local variable.

@jaypatrick
Copy link
Owner

@copilot fix the CI issues

Fixed in commit 0b789c6. Two CI failures:

  1. deno fmt failure (health.test.ts) — imports were not in alphabetical order; handleDbSmoke now comes before handleHealth/handleHealthLatest.
  2. Angular TS2532 failures (performance.component.ts) — 7 "Object is possibly undefined" errors because Angular's strict null checker doesn't narrow h.services to non-optional inside an @if (h.services?.database && ...) block. Replaced the pattern with @let db = h.services?.database; + @if (db && ...) so all inner accesses use the narrowed db local variable.

@copilot CI is still broken. Don't stop working on this until it's fixed.

Copy link
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot fix the CI issues

Fixed in commit 0b789c6. Two CI failures:
...

Fixed in commit cd69e89. Two more deno fmt violations in health.test.ts that the previous fix missed:

  1. makeFailingPrisma$queryRaw: async () => { throw err; } expanded to a multi-line function body (deno fmt requires non-empty single-line arrow function bodies to be expanded)
  2. The inline type cast at the handleDbSmoke happy-path test — { ok: boolean; db_name: string; … } on two lines expanded to one property per line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request frontend Angular frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants