Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/fix-batch-duplicate-idempotency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@trigger.dev/webapp": patch
---

Fix batchTriggerAndWait running forever when duplicate idempotencyKey is provided in the same batch

When using batchTriggerAndWait with duplicate idempotencyKeys in the same batch, the batch would never complete because the completedCount and expectedCount would be mismatched. This fix ensures that cached runs (duplicate idempotencyKeys) are properly tracked in the batch, with their completedCount incremented immediately if the cached run is already in a final status.
93 changes: 79 additions & 14 deletions apps/webapp/app/v3/services/batchTriggerV3.server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,11 @@ export class BatchTriggerV3Service extends BaseService {

const existingBatch = options.idempotencyKey
? await this._prisma.batchTaskRun.findFirst({
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
where: {
runtimeEnvironmentId: environment.id,
idempotencyKey: options.idempotencyKey,
},
})
: undefined;

if (existingBatch) {
Expand Down Expand Up @@ -167,16 +167,16 @@ export class BatchTriggerV3Service extends BaseService {

const dependentAttempt = body?.dependentAttempt
? await this._prisma.taskRunAttempt.findFirst({
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
where: { friendlyId: body.dependentAttempt },
include: {
taskRun: {
select: {
id: true,
status: true,
},
},
})
},
})
: undefined;

if (
Expand Down Expand Up @@ -890,7 +890,72 @@ export class BatchTriggerV3Service extends BaseService {
}
}

return false;
// FIX for Issue #2965: When a run is cached (duplicate idempotencyKey),
// we need to ALWAYS create a BatchTaskRunItem to properly track it.
// This handles cases where cached run may originate from another batch.
// Use unique constraint (batchTaskRunId, taskRunId) to prevent duplicates.
const isAlreadyComplete = isFinalRunStatus(result.run.status);

logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] Cached run detected, creating batch item",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
cachedRunStatus: result.run.status,
isAlreadyComplete,
currentIndex,
}
);

// Always create BatchTaskRunItem for cached runs
// This ensures proper tracking even for cross-batch scenarios
try {
await this._prisma.batchTaskRunItem.create({
data: {
batchTaskRunId: batch.id,
taskRunId: result.run.id,
// Use batchTaskRunItemStatusForRunStatus() for all cases
// This correctly maps both successful (COMPLETED) and failed (FAILED) statuses
status: batchTaskRunItemStatusForRunStatus(result.run.status),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Cached runs with failed status are not counted for batch completion

When a cached run has a failed status (e.g., COMPLETED_WITH_ERRORS, CRASHED, SYSTEM_FAILURE), the batch will never complete.

Click to expand

Root Cause

The fix creates a BatchTaskRunItem with status based on batchTaskRunItemStatusForRunStatus(result.run.status) (line 920). For failed run statuses, this returns FAILED (see taskRun.server.ts:119-126):

case TaskRunStatus.COMPLETED_WITH_ERRORS:
case TaskRunStatus.SYSTEM_FAILURE:
case TaskRunStatus.CRASHED:
  return BatchTaskRunItemStatus.FAILED;

However, tryCompleteBatchV3 only counts items with status: "COMPLETED" (line 1034-1035):

const completedCount = await tx.batchTaskRunItem.count({
  where: { batchTaskRunId: batchId, status: "COMPLETED" },
});

Actual vs Expected

Actual: For a cached run with COMPLETED_WITH_ERRORS status:

  1. BatchTaskRunItem is created with status FAILED
  2. isAlreadyComplete is true (line 897) since it's a final status
  3. But tryCompleteBatchV3 only counts COMPLETED items, missing this item
  4. The batch never completes because count won't reach expectedCount

Expected: The batch should complete when all items have finished, regardless of whether the cached runs succeeded or failed.

Impact

This partially defeats the purpose of the fix - batchTriggerAndWait will still run forever if duplicate idempotency keys reference runs that have already failed.

Recommendation: For cached runs that are already complete (regardless of success/failure), create the BatchTaskRunItem with status COMPLETED instead of using batchTaskRunItemStatusForRunStatus(). This aligns with how completeBatchTaskRunItemV3 works - it always sets status to COMPLETED when a run finishes (line 1088).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

},
});

// Only increment completedCount if the cached run is already finished
// For in-progress runs, completedCount will be incremented when the run completes
if (isAlreadyComplete) {
await this._prisma.batchTaskRun.update({
where: { id: batch.id },
data: {
completedCount: {
increment: 1,
},
},
});
}

// Return true so expectedCount is incremented
return true;
} catch (error) {
if (isUniqueConstraintError(error, ["batchTaskRunId", "taskRunId"])) {
// BatchTaskRunItem already exists for this batch and cached run
// This can happen if the same idempotencyKey is used multiple times in the same batch
logger.debug(
"[BatchTriggerV2][processBatchTaskRunItem] BatchTaskRunItem already exists for cached run",
{
batchId: batch.friendlyId,
runId: task.runId,
cachedRunId: result.run.id,
currentIndex,
}
);

// Don't increment expectedCount since this item is already tracked
return false;
}

throw error;
}
}

async #enqueueBatchTaskRun(options: BatchProcessingOptions) {
Expand Down