fix: /api/health/db-smoke endpoint + hardened databaseProbe + error surfacing in UI#1410
fix: /api/health/db-smoke endpoint + hardened databaseProbe + error surfacing in UI#1410
Conversation
…g in UI + KB-003 docs Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com> Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/02242699-4206-4e4c-8002-10ed00ca7254
There was a problem hiding this comment.
Pull request overview
Adds a dedicated database smoke-test endpoint and improves database health probing so Prisma failures are observable (including safe redaction), then surfaces those diagnostics in the Angular health UI and troubleshooting docs.
Changes:
- Add new public
GET /api/health/db-smokeendpoint and route wiring. - Harden DB probing in
/api/healthwith timeout,$disconnect(), and surfacederror_code/error_message(with connection-string redaction). - Extend Angular health displays + add/expand troubleshooting KB docs and unit tests.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/hono-app.ts | Adds /api/health/db-smoke to public monitoring paths and wires route handler. |
| worker/handlers/health.ts | Implements handleDbSmoke; enhances databaseProbe with timeout, error surfacing, redaction, and disconnect. |
| worker/handlers/health.test.ts | Adds tests for error surfacing/redaction and the smoke-test endpoint. |
| frontend/src/app/performance/performance.component.ts | Displays DB name + error details when DB health is down/degraded. |
| frontend/src/app/admin/dashboard/dashboard.component.ts | Shows DB detail string combining db_name and error_message for degraded/down states. |
| docs/troubleshooting/neon-troubleshooting.md | Adds a detailed “live troubleshooting session” section and post-deploy smoke check guidance. |
| docs/troubleshooting/README.md | Promotes KB-003 to active and renumbers planned KBs. |
| docs/troubleshooting/KB-003-neon-hyperdrive-live-session-2026-03-25.md | New standalone KB capturing the live session and procedures. |
| docs/troubleshooting/KB-002-hyperdrive-database-down.md | Adds a “Session Log” and links to KB-003. |
| docs/SUMMARY.md | Adds KB-003 to the docs navigation. |
docs/troubleshooting/KB-003-neon-hyperdrive-live-session-2026-03-25.md
Outdated
Show resolved
Hide resolved
|
@copilot apply changes based on the comments in this thread |
…aid decision tree in KB-003 Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com> Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/573b8f36-14c1-4fb4-941a-c813d333cc16
Applied all three review comments in commit 13fa61c:
|
|
@copilot fix the CI issues |
…rrors Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com> Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/fe0245ab-eb2e-40ec-bb4b-b36a46de7ad9
Fixed in commit 0b789c6. Two CI failures:
|
@copilot CI is still broken. Don't stop working on this until it's fixed. |
… fmt Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com> Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/1375e97f-b33d-44d6-9317-d7e220f9c2fd
Fixed in commit cd69e89. Two more
|
database: downwithlatency_ms: 0was silently swallowing all Prisma errors, making it impossible to distinguish Zod validation failures from network timeouts. This PR adds a dedicated smoke-test endpoint, hardens the existing probe, and surfaces error details in both the admin dashboard and performance UI.Description
database: downwithlatency_ms: 0was silently swallowing all Prisma errors, making it impossible to distinguish Zod validation failures from network timeouts. This PR adds a dedicated smoke-test endpoint, hardens the existing probe, and surfaces error details in both the admin dashboard and performance UI.Changes
A — New
GET /api/health/db-smokecurrent_database(),version(),now(), andCOUNT(*)oninformation_schema.tablesviaPromise.all{ ok, db_name, pg_version, server_time, table_count, latency_ms, hyperdrive_host }ok: false+ status503on failure;400whenHYPERDRIVEbinding absentMONITORING_API_PATHS— unauthenticated, same tier as/api/healthB — Hardened
databaseProbeDatabaseResultextended:error_code?,error_message?— catch block now surfaces both instead of swallowing themerror_messageredactspostgres:///postgresql://credentials before surfacing:.replace(/postgre(?:s|sql):\/\/[^\s"',}]*/gi, '[redacted]')Promise.race+setTimeout;timeoutIddeclared at the outertryscope soclearTimeoutis called in both the success andcatchpaths — no dangling timers when$queryRawrejects quicklycreatePrismaClientmoved insidetryso Zod validation errors are caught and classified$disconnect()infinallyon bothdatabaseProbeandhandleDbSmokeC — Angular error surfacing
HealthServiceResult/HealthResponse.services.databaseextended witherror_message?anddb_name?performance.component.ts: shows adb-error-rowchip under the health card when database isdown/degraded; uses@let db = h.services?.databaseto properly narrow the optionalservicestype and eliminate Angular strict null check (TS2532) errorsdashboard.component.ts:dbDetail()returns"<db_name> — <error_message>"for non-healthy database entriesD — Tests
makeHealthyPrisma()/makeFailingPrisma()helpers (include$disconnectstub);makeFailingPrismauses a properly expanded multi-line arrow function body to satisfydeno fmterror_code/error_messagepresent on failure; credential redactionFakeTimefrom@std/testing/timewith a never-resolving$queryRawmock —fakeTime.tickAsync(5001)actually triggers thePromise.racetimer branch, so regressions in timer creation/clearing are caughthandleDbSmoke— happy path (with expanded one-property-per-line inline type cast), 503 path, 400 (missing binding), redactionhealth.test.tscorrected to satisfydeno fmtalphabetical ordering requirementE — Docs
KB-002:## Session Log — 2026-03-25captures the full incident narrativeneon-troubleshooting.md: live session section —.hyperdrive.localmeaning,latency_ms: 0decision tree,wrangler tailas first debugging stepKB-003-neon-hyperdrive-live-session-2026-03-25.md: full KB with diagnostic commands, Mermaidflowchart TDdecision tree (replacing ASCII art), resolution checklist, post-deploy prevention scriptTesting
FakeTime), and smoke-test scenarios covered inhealth.test.ts/healthand/health/latestpatternsZero Trust Architecture Checklist
Worker / Backend
handleDbSmokeandhandleHealthare explicitly public/anonymous diagnostic endpoints; no auth bypass for protected routes*) on write/authenticated endpoints —/api/health/db-smokeadded toMONITORING_API_PATHSwhich inherits the existing public-endpoint CORS prefix match; no wildcard introduced[vars]) — no new secrets; uses existingenv.HYPERDRIVEbindingcreatePrismaClientvalidates the connection string viaPrismaClientConfigSchema(insidetrynow, so errors are caught).prepare().bind()— N/A; this PR uses Prisma$queryRawtagged template literals (parameterized by design)Frontend / Angular
CanActivateFnauth guards — N/A; only type extensions and template changes to existing public health displaylocalStorage) — N/AOriginal prompt
Summary
The live site at
https://adblock-frontend.jayson-knight.workers.dev/shows:/api/healthreturnsdatabase.status: "down"withlatency_ms: 0hyperdrive_host: "11f7f957eaae03a9fe9365c78e6eb4ed.hyperdrive.local"— which is the correct local proxyThe root cause (already documented in
KB-002) is confirmed and fixed, but additional hardening, an/api/health/db-smokeendpoint, improved error surfacing in the UI, and documentation are needed.Changes Required
A — Add a dedicated
/api/health/db-smokesmoke-test endpointAdd a new handler
handleDbSmokeinworker/handlers/health.tsthat:env.HYPERDRIVEis present (returns400if not).PrismaClientvia_internals.createPrismaClient(env.HYPERDRIVE.connectionString).SELECT current_database() AS db_name, version() AS pg_version, now() AS server_timevia$queryRaw.SELECT COUNT(*) AS table_count FROM information_schema.tables WHERE table_schema = 'public'to verify the schema is populated.{ "ok": true, "db_name": "adblock-compiler", "pg_version": "PostgreSQL 16.x ...", "server_time": "2026-03-25T21:59:15.917Z", "table_count": 17, "latency_ms": 42, "hyperdrive_host": "ep-winter-term-a8rxh2a9-pooler.eastus2.azure.neon.tech" }{ ok: false, error: "...", hyperdrive_host: "..." }with status503on any failure.worker/worker.ts(or wherever routes are registered) atGET /api/health/db-smoke.B — Harden
databaseProbeinworker/handlers/health.tsThe current catch block silently swallows all errors. Update it to:
error_codeanderror_messagefrom the caught exception (safely, without logging secrets).downresponse shape:error_messagebefore surfacing (replace anything that looks likepostgres://...with[redacted]).5000mstimeout (AbortSignal.timeout) to the database probe so a hung Hyperdrive connection doesn't block the health response indefinitely.finallyblock after each probe (await prisma.$disconnect()).C — Add
error_messageto Angular UI health displayIn
frontend/src/app/performance/performance.component.ts:HealthServiceResultinterface to includereadonly error_message?: stringandreadonly db_name?: string.downordegraded, surface theerror_messageanddb_namefields as a sub-caption under the database health card row (template change in the same file).In
frontend/src/app/admin/dashboard/dashboard.component.ts:HealthResponse.services.databaseto includereadonly error_message?: stringandreadonly db_name?: string.db_nameanderror_messageas thedetailfield in theHealthCheckobject returned bymapHealthChecksfor the database entry.D — Unit tests
In
worker/handlers/health.test.ts, add tests for the updateddatabaseProbe:error_codeanderror_messageinservices.database.error_messagedoes NOT containpostgres://credentials.status: "down"with a timeouterror_code.Add a new test file
worker/handlers/health-db-smoke.test.ts(or add to existing) forhandleDbSmoke:ok: true.ok: false, status 503.E — Documentation
docs/troubleshooting/KB-002-hyperdrive-database-down.mdto add a new section at the top titled## Session Log — 2026-03-25that captures this full troubleshooting session:latency_ms: 0, zero Hyperdrive trafficcurl /api/health,wrangler hyperdrive get, checking Hyperdrive host in health response showing.hyperdrive.local(correct),wrangler tailrecommendationspostgres://(already fixed inPrismaClientConfigSchema)downat v0.76.0 even with schema fix —hyperdrive_hostshows.hyperdrive.localwhich is correct (Hyperdrive IS the local proxy), so the issue is that the actual query never completesThis pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.