fix(prisma): add retry for Aurora Serverless v2 connection errors by konokenj · Pull Request #121 · aws-samples/serverless-full-stack-webapp-starter-kit

konokenj · 2026-03-20T02:39:14Z

Issue

close #104
close #105

Problem

The starter kit has three issues with Prisma + Aurora Serverless v2 (auto-pause enabled with minCapacity: 0):

Credential leak: console.log(process.env.DATABASE_URL) in prisma.ts outputs the full connection string including password to CloudWatch Logs.
No runtime retry: Aurora drops idle connections after idle_session_timeout (60s) and takes ~15s to resume from auto-pause (docs). Without retry, queries fail with transient errors (P1017, ECONNRESET) and do not recover.
No migration retry: migration-runner.ts runs prisma db push without retry. During cdk deploy, Aurora may still be resuming, causing P1001 ("Can't reach database server") and failing the entire deployment.

Solution

Remove console.log(DATABASE_URL) to fix the credential leak.
Add a Prisma client extension (Prisma.defineExtension with $allModels.$allOperations) that retries transient connection errors with exponential backoff. Retryable errors: P2024, P1001, P1017, idle-session timeout, ECONNRESET. Non-retryable errors (auth failures, schema errors) are thrown immediately.
Add retry to migration-runner.ts for prisma db push with exponential backoff (base 3s, max 5 attempts, ~100s worst case within Lambda 5min timeout). Only P1001 / connection refused are retried.
Optimize connection parameters: connection_limit=1 (Lambda handles one request per instance), connect_timeout=30 (accommodates auto-pause resume time).

Changes

webapp/src/lib/prisma.ts — Remove console.log, remove verbose log option, add retry extension via $extends
webapp/src/jobs/migration-runner.ts — Extract runPrismaDbPush with retry loop, structured logging
cdk/lib/constructs/database.ts — Change connection options to ?connection_limit=1&connect_timeout=30

Verification

console.log(process.env.DATABASE_URL) is removed
After Aurora auto-pause resume, the first request recovers via retry
Non-retryable errors (e.g. auth failure) are thrown immediately without retry
cdk deploy succeeds even when Aurora is resuming from 0 ACU
tsc --noEmit passes
prettier --check passes

…, #105) Why: Aurora Serverless v2 with auto-pause (0 ACU) drops connections on idle_session_timeout and takes ~15s to resume. Without retry, both runtime queries and CDK deployment migrations fail on transient errors. Also, DATABASE_URL (including password) was logged to CloudWatch. What: - Remove console.log(DATABASE_URL) that leaked credentials to CloudWatch - Add Prisma client extension with retry on transient connection errors (P2024, P1001, P1017, idle-session timeout, ECONNRESET) - Add exponential backoff retry to migration-runner for prisma db push - Optimize connection params: connection_limit=1, connect_timeout=30

The default pool_timeout (10s) is insufficient for Aurora Serverless v2 auto-pause resume (~15s). Also, PrismaClientInitializationError for pool timeout has errorCode=undefined, so message-based detection is needed.

badmintoncryer

一点だけコメントです！

cdk/lib/constructs/database.ts

badmintoncryer · 2026-03-21T15:44:02Z

AWS仕様について教えて下さい。

ECONNRESET問題は以下のシチェーションで起きていると想像しています。

Lambdaがリクエストを処理し、PrismaがDB接続を確立
リクエスト完了後、Lambdaインスタンスはwarm状態で待機(接続はプール内に残る)
60秒アイドルが続くとPostgreSQL側が接続を強制切断
次のリクエスト時、Prismaはプール内の切断済み接続を使おうとする
ECONNRESET

ここでidle_session_timeout=0にすることで、lambdaのインスタンスが起動中は同一connection poolを使い回すことができ、各種リトライ処理が不要になる可能性があると思っています。
このときのデメリットはlambdaインスタンスが動き続ける間connectionが張られるため、Auroraが0ACUに落ちないことです。

本題ですが、lambdaの立ち上がったインスタンスってどの程度の期間動き続けるものでしょうか..??
この時間が~10分程度である場合、課金上の問題はほぼ無くなるので、この方針の現実味が出てくるかもと思っています。

Co-authored-by: Kazuho Cryer-Shinozuka <malaysia.cryer@gmail.com>

konokenj · 2026-03-22T02:57:52Z

@badmintoncryer ご指摘ありがとうございます！
ECONNRESET 対策なのはおっしゃる通りで、idle_session_timeout=0 にすると根本的に解消できますが、Lambda 実行環境は invocation 間で DB 接続を保持・再利用します。
https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html

idle_session_timeout=0 の場合、この接続は Aurora 側から見て user-initiated connection として残り続けるため、auto-pause が発動しないはずです。

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless-v2-auto-pause.html

Lambda 実行環境の生存期間は公式には非決定的で、数時間残る可能性もあるため、その間Aurora が 0 ACU に落ちなくなります。このスターターキットは minCapacity=0 でのコスト最小化を重視しているので、idle_session_timeout=60s + リトライの現行方針を維持しようと思います。

Lambda handles one request per instance with connection_limit=1, so pool contention never occurs. Removing pool_timeout as suggested in review.

🤖 I have created a release *beep* *boop* --- ## [2.1.0](v2.0.0...v2.1.0) (2026-03-22) ### Features * add /update-snapshot comment trigger to update_snapshot workflow ([764a4fa](764a4fa)) * add CloudWatch LogGroup with retention policy to Lambda functions ([#117](#117)) ([53877bb](53877bb)), closes [#103](#103) * **database:** enable Data API and connection logging ([#123](#123)) ([e32dc7a](e32dc7a)) * increase webapp Lambda memory from 512MB to 1024MB ([#116](#116)) ([03c5a00](03c5a00)), closes [#101](#101) ### Bug Fixes * add lambda:InvokeFunction permission for CloudFront OAC ([#83](#83)) ([3cc66bf](3cc66bf)) * **auth:** improve auth error handling and fix Link CORS issue ([#120](#120)) ([84be605](84be605)) * disable Cognito self sign-up by default ([#115](#115)) ([9396e6f](9396e6f)), closes [#106](#106) * prevent CloudFront cache poisoning for Next.js RSC responses ([#119](#119)) ([70cddda](70cddda)) * **prisma:** add retry for Aurora Serverless v2 connection errors ([#121](#121)) ([7c05dfb](7c05dfb)) * support Amazon Linux 2023 for NAT instance ([#81](#81)) ([0c41aa8](0c41aa8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

konokenj mentioned this pull request Mar 20, 2026

feat(database): Aurora Serverless v2 の Data API 有効化と接続ログの追加 #122

Closed

konokenj force-pushed the feature/prisma-retry branch from 7900894 to db668e9 Compare March 20, 2026 03:37

konokenj added 4 commits March 20, 2026 13:16

test: update CDK snapshots for connection option changes

f18481f

fix(prisma): add success log after retry recovery

c4b9d35

fix(prisma): add pool_timeout=30 and retry on connection pool timeout

908ab82

The default pool_timeout (10s) is insufficient for Aurora Serverless v2 auto-pause resume (~15s). Also, PrismaClientInitializationError for pool timeout has errorCode=undefined, so message-based detection is needed.

konokenj force-pushed the feature/prisma-retry branch from d94e77e to 908ab82 Compare March 20, 2026 04:17

konokenj requested a review from tmokmss March 20, 2026 04:19

tmokmss approved these changes Mar 21, 2026

View reviewed changes

badmintoncryer reviewed Mar 21, 2026

View reviewed changes

cdk/lib/constructs/database.ts Outdated Show resolved Hide resolved

Update cdk/lib/constructs/database.ts

f085b39

Co-authored-by: Kazuho Cryer-Shinozuka <malaysia.cryer@gmail.com>

fix: remove unnecessary pool_timeout parameter

46e64ff

Lambda handles one request per instance with connection_limit=1, so pool contention never occurs. Removing pool_timeout as suggested in review.

konokenj merged commit 7c05dfb into main Mar 22, 2026
5 checks passed

konokenj deleted the feature/prisma-retry branch March 22, 2026 03:04

github-actions bot mentioned this pull request Mar 22, 2026

chore(main): release 2.1.0 #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(prisma): add retry for Aurora Serverless v2 connection errors#121

fix(prisma): add retry for Aurora Serverless v2 connection errors#121
konokenj merged 6 commits intomainfrom
feature/prisma-retry

konokenj commented Mar 20, 2026

Uh oh!

badmintoncryer left a comment •

edited

Loading

Uh oh!

Uh oh!

badmintoncryer commented Mar 21, 2026

Uh oh!

konokenj commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

konokenj commented Mar 20, 2026

Issue

Problem

Solution

Changes

Verification

Uh oh!

badmintoncryer left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

badmintoncryer commented Mar 21, 2026

Uh oh!

konokenj commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

badmintoncryer left a comment •

edited

Loading