docs: KB-002 — Hyperdrive postgres:// scheme rejected by Zod, database: down with latency_ms: 0#1407
Conversation
… incident Co-authored-by: jaypatrick <1800595+jaypatrick@users.noreply.github.com> Agent-Logs-Url: https://github.com/jaypatrick/adblock-compiler/sessions/7556f038-7cca-4397-9cdc-79a7ce0425e0
postgres:// scheme rejected by Zod, database: down with latency_ms: 0
There was a problem hiding this comment.
Pull request overview
Adds a new troubleshooting KB (KB-002) documenting the Hyperdrive postgres:// vs postgresql:// Zod validation incident and wires it into the docs navigation.
Changes:
- Add new article KB-002 describing the
database: down+latency_ms: 0symptom and resolution. - Update troubleshooting index/links to reference KB-002 as active.
- Add KB-002 to the docs summary navigation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| docs/troubleshooting/README.md | Marks KB-002 as active and links to the new article. |
| docs/troubleshooting/KB-002-hyperdrive-database-down.md | New KB article documenting diagnosis and remediation steps for the Hyperdrive scheme-validation failure. |
| docs/troubleshooting/KB-001-api-not-available.md | Updates “Related KB Articles” to link to KB-002. |
| docs/SUMMARY.md | Adds KB-002 entry under Troubleshooting. |
| **Key tell:** `latency_ms: 0` on the `database` service. | ||
| A real network failure or timeout always returns a non-zero latency. An instant `0 ms` failure means the probe threw *before* any connection attempt — i.e., at the validation layer. |
There was a problem hiding this comment.
The article states that a real network failure/timeout “always returns a non-zero latency” and that latency_ms: 0 definitively means validation-layer failure. In the current health implementation, latency_ms is derived from Date.now() deltas (ms resolution), so values can be 0 for very fast failures beyond just Zod parsing. Consider softening this to “near-zero” and treating it as a strong hint rather than a guarantee, and optionally mention checking logs for the ZodError as the definitive signal.
| **Key tell:** `latency_ms: 0` on the `database` service. | |
| A real network failure or timeout always returns a non-zero latency. An instant `0 ms` failure means the probe threw *before* any connection attempt — i.e., at the validation layer. | |
| **Key tell:** `latency_ms: 0` (or near-zero) on the `database` service. | |
| Real network failures or timeouts typically show a non-zero latency. A `0 ms` (or near-zero) failure is a strong hint that the probe threw *before* any connection attempt — for example, at the validation/parse layer — but you should confirm this by checking logs for a `ZodError` as described below. |
| // After the fix — accepts both schemes | ||
| connectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//), |
There was a problem hiding this comment.
The “After the fix” snippet uses .regex(/^postgre(?:s|sql):\/\//) but the current PrismaClientConfigSchema implementation in worker/lib/prisma-config.ts uses .refine((s) => s.startsWith('postgresql://') || s.startsWith('postgres://'), ...). To avoid drift/confusion, update the snippet (or clearly label it as pseudo-code) to match the repo’s actual implementation and error message.
| // After the fix — accepts both schemes | |
| connectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//), | |
| // After the fix — accepts both schemes (postgres:// and postgresql://) | |
| connectionString: z.string().url().refine((s) => s.startsWith('postgresql://') || s.startsWith('postgres://')), |
| connectionString: z.string().url().startsWith('postgresql://'), | ||
|
|
||
| // After | ||
| connectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//), |
There was a problem hiding this comment.
This section repeats the “Before/After” code snippet for PrismaClientConfigSchema, but the “After” example uses .regex(/^postgre(?:s|sql):\/\//) which doesn’t match the current implementation (it uses .refine(...) with startsWith('postgres://') || startsWith('postgresql://')). Please align the snippet here as well so the Resolution steps are accurate for operators following the KB.
| connectionString: z.string().url().regex(/^postgre(?:s|sql):\/\//), | |
| connectionString: z.string().url().refine( | |
| (value) => value.startsWith('postgres://') || value.startsWith('postgresql://'), | |
| { message: 'connectionString must start with "postgres://" or "postgresql://"' }, | |
| ), |
| "database": { | ||
| "status": "healthy", | ||
| "latency_ms": 42, | ||
| "db_name": "neondb", |
There was a problem hiding this comment.
The example enhanced health response shows db_name: "neondb", but the worker’s health probe currently treats db_name !== 'adblock-compiler' as a degraded/wrong-database condition. Using neondb in the “healthy” example can mislead responders. Suggest updating the example to the expected production DB name for this repo (or making the example explicitly generic).
| "db_name": "neondb", | |
| "db_name": "adblock-compiler", |
| # KB-002: Hyperdrive Binding Connected but `database` Service Reports `down` | ||
|
|
||
| > **Status:** ✅ Active | ||
| > **Affected version:** v0.75.0 | ||
| > **Resolved in:** PR fixing `PrismaClientConfigSchema` to accept `postgres://` + enhanced `/api/health` probe | ||
| > **Date:** 2026-03-25 |
There was a problem hiding this comment.
PR description says KB-002 should “follow the exact same structure as KB-001”. KB-001 starts with a consistent metadata block (Series/Component/Service URL/Date Created/Status), but KB-002 uses a different header format and omits several of those fields. If consistent KB structure is a goal, consider matching KB-001’s header metadata format here (or update the PR description if the divergence is intentional).
Documents the production incident where
database.status: "down"+latency_ms: 0was caused byPrismaClientConfigSchemarejecting Hyperdrive'spostgres://scheme (Hyperdrive never emitspostgresql://), meaning Zod threw before any network call was made and Hyperdrive showed zero activity.Description
Adds KB-002 to the troubleshooting series covering the Hyperdrive
postgres://vspostgresql://schema-validation failure. Thelatency_ms: 0pattern is the key diagnostic signal that distinguishes a validation-layer failure from a real network failure.Changes
docs/troubleshooting/KB-002-hyperdrive-database-down.md— New article covering:database: down,latency_ms: 0, Hyperdrive dashboard shows zero querieslatency_ms: 0→ scheme mismatch → Zod rejectionPrismaClientConfigSchemato accept both schemes via/^postgre(?:s|sql):\/\//SELECT current_database()surfacesdb_name+hyperdrive_hostto catch wrong-database conditiondb_namein health response as a continuous assertionworker/lib/prisma-config.ts,worker/handlers/health.tsdocs/troubleshooting/README.md— KB-002 linked and marked ✅ Active (replaces the old Clerk JWT KB-002 placeholder)docs/SUMMARY.md— KB-002 added to Troubleshooting sectiondocs/troubleshooting/KB-001-api-not-available.md— Related Articles updated to link KB-002Testing
Zero Trust Architecture Checklist
This PR does not touch
worker/orfrontend/. ZTA checklist is not required.Original prompt
Task: Document the Hyperdrive /
postgres://database down incident in the troubleshooting docsThis conversation diagnosed and fixed a production issue where
GET /api/healthreturneddatabase: downeven though Hyperdrive was correctly configured and Neon migrations were running fine.Root cause (for context)
worker/lib/prisma-config.tsvalidated the Hyperdrive connection string with.startsWith('postgresql://'). However, the Cloudflare Hyperdrive binding's.connectionStringproperty always returns thepostgres://short alias (the"scheme"field in Hyperdrive config is"postgres"). This causedPrismaClientConfigSchema.parse(...)to throw aZodErrorinstantly (before any network call), which the health probe caught and turned into{ status: 'down', latency_ms: 0 }. Hyperdrive therefore showed zero activity because the connection was never opened.The fix (applied in a separate PR) was:
PrismaClientConfigSchemato accept bothpostgres://andpostgresql://.handleHealthto runSELECT current_database()instead ofSELECT 1, and surfacedb_name+hyperdrive_hostin the JSON response so the wrong-database condition is also caught.Changes required
1. Create
docs/troubleshooting/KB-002-hyperdrive-database-down.mdFollow the exact same structure as
KB-001-api-not-available.md. The article must cover:GET /api/healthreturnsdatabase.status: "down"withlatency_ms: 0; Hyperdrive dashboard shows zero activitycurl -s https://<your-worker>.workers.dev/api/health | jq .andnpx wrangler hyperdrive get <id>postgres://(Hyperdrive's actual scheme); thelatency_ms: 0is the key tell that the failure is at validation, not at the networkPrismaClientConfigSchemato accept both schemes; also explains the enhanced health check (SELECT current_database()) introduced as part of the fixdb_namefield in the health response lets you confirm the correct database is connectedworker/lib/prisma-config.ts,worker/handlers/health.tsHere is the full article text to use verbatim: