fix(prisma): add retry for Aurora Serverless v2 connection errors#121
fix(prisma): add retry for Aurora Serverless v2 connection errors#121
Conversation
7900894 to
db668e9
Compare
…, #105) Why: Aurora Serverless v2 with auto-pause (0 ACU) drops connections on idle_session_timeout and takes ~15s to resume. Without retry, both runtime queries and CDK deployment migrations fail on transient errors. Also, DATABASE_URL (including password) was logged to CloudWatch. What: - Remove console.log(DATABASE_URL) that leaked credentials to CloudWatch - Add Prisma client extension with retry on transient connection errors (P2024, P1001, P1017, idle-session timeout, ECONNRESET) - Add exponential backoff retry to migration-runner for prisma db push - Optimize connection params: connection_limit=1, connect_timeout=30
The default pool_timeout (10s) is insufficient for Aurora Serverless v2 auto-pause resume (~15s). Also, PrismaClientInitializationError for pool timeout has errorCode=undefined, so message-based detection is needed.
d94e77e to
908ab82
Compare
|
AWS仕様について教えて下さい。 ECONNRESET問題は以下のシチェーションで起きていると想像しています。
ここで 本題ですが、lambdaの立ち上がったインスタンスってどの程度の期間動き続けるものでしょうか..?? |
Co-authored-by: Kazuho Cryer-Shinozuka <malaysia.cryer@gmail.com>
|
@badmintoncryer ご指摘ありがとうございます!
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2-auto-pause.html Lambda 実行環境の生存期間は公式には非決定的で、数時間残る可能性もあるため、その間Aurora が 0 ACU に落ちなくなります。このスターターキットは |
Lambda handles one request per instance with connection_limit=1, so pool contention never occurs. Removing pool_timeout as suggested in review.
🤖 I have created a release *beep* *boop* --- ## [2.1.0](v2.0.0...v2.1.0) (2026-03-22) ### Features * add /update-snapshot comment trigger to update_snapshot workflow ([764a4fa](764a4fa)) * add CloudWatch LogGroup with retention policy to Lambda functions ([#117](#117)) ([53877bb](53877bb)), closes [#103](#103) * **database:** enable Data API and connection logging ([#123](#123)) ([e32dc7a](e32dc7a)) * increase webapp Lambda memory from 512MB to 1024MB ([#116](#116)) ([03c5a00](03c5a00)), closes [#101](#101) ### Bug Fixes * add lambda:InvokeFunction permission for CloudFront OAC ([#83](#83)) ([3cc66bf](3cc66bf)) * **auth:** improve auth error handling and fix Link CORS issue ([#120](#120)) ([84be605](84be605)) * disable Cognito self sign-up by default ([#115](#115)) ([9396e6f](9396e6f)), closes [#106](#106) * prevent CloudFront cache poisoning for Next.js RSC responses ([#119](#119)) ([70cddda](70cddda)) * **prisma:** add retry for Aurora Serverless v2 connection errors ([#121](#121)) ([7c05dfb](7c05dfb)) * support Amazon Linux 2023 for NAT instance ([#81](#81)) ([0c41aa8](0c41aa8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Issue
close #104
close #105
Problem
The starter kit has three issues with Prisma + Aurora Serverless v2 (auto-pause enabled with
minCapacity: 0):Credential leak:
console.log(process.env.DATABASE_URL)inprisma.tsoutputs the full connection string including password to CloudWatch Logs.No runtime retry: Aurora drops idle connections after
idle_session_timeout(60s) and takes ~15s to resume from auto-pause (docs). Without retry, queries fail with transient errors (P1017, ECONNRESET) and do not recover.No migration retry:
migration-runner.tsrunsprisma db pushwithout retry. Duringcdk deploy, Aurora may still be resuming, causing P1001 ("Can't reach database server") and failing the entire deployment.Solution
console.log(DATABASE_URL)to fix the credential leak.Prisma.defineExtensionwith$allModels.$allOperations) that retries transient connection errors with exponential backoff. Retryable errors: P2024, P1001, P1017, idle-session timeout, ECONNRESET. Non-retryable errors (auth failures, schema errors) are thrown immediately.migration-runner.tsforprisma db pushwith exponential backoff (base 3s, max 5 attempts, ~100s worst case within Lambda 5min timeout). Only P1001 / connection refused are retried.connection_limit=1(Lambda handles one request per instance),connect_timeout=30(accommodates auto-pause resume time).Changes
webapp/src/lib/prisma.ts— Removeconsole.log, remove verboselogoption, add retry extension via$extendswebapp/src/jobs/migration-runner.ts— ExtractrunPrismaDbPushwith retry loop, structured loggingcdk/lib/constructs/database.ts— Change connection options to?connection_limit=1&connect_timeout=30Verification
console.log(process.env.DATABASE_URL)is removedcdk deploysucceeds even when Aurora is resuming from 0 ACUtsc --noEmitpassesprettier --checkpasses