CIVIMM-480: Add webhook stuck recovery with attempt tracking#16
CIVIMM-480: Add webhook stuck recovery with attempt tracking#16
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where payment webhooks could become permanently stuck in a 'processing' state if a worker crashed during their execution. The changes introduce a comprehensive recovery system that automatically detects these stuck webhooks, tracks their retry attempts, and either resets them for reprocessing or marks them as permanently failed after a defined number of retries. This significantly improves the reliability and resilience of webhook processing. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a robust mechanism for recovering stuck webhooks, complete with attempt tracking and a distinction between retriable and permanent errors. The refactoring of the PaymentWebhook BAO to separate data access concerns is a great improvement for maintainability and follows the Single Responsibility Principle. The new unit tests are comprehensive and cover the new logic and edge cases effectively.
I have one suggestion regarding a potential performance issue where the stuck webhook recovery logic may be executed multiple times unnecessarily during a single run. Overall, this is a solid enhancement to the webhook processing reliability.
Civi/Paymentprocessingcore/Service/WebhookQueueRunnerService.php
Outdated
Show resolved
Hide resolved
2b0c00a to
34cd504
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a robust mechanism for handling stuck webhooks, which is a significant improvement for reliability. The introduction of attempt tracking and a permanent error state prevents infinite loops and provides better visibility into failing webhooks. The code is well-structured, with a clear separation of concerns between the BAO layer for data access and the service layer for business logic. The addition of comprehensive unit tests ensures the new functionality is well-covered. I have a few minor suggestions to further improve code quality and adherence to repository guidelines.
| return \Civi\Api4\PaymentWebhook::get(FALSE) | ||
| ->addSelect('id', 'processor_type') | ||
| ->addWhere('processor_type', '=', $processorType) | ||
| ->addWhere('status', '=', 'new') | ||
| ->addWhere('attempts', '>', 0) | ||
| ->addWhere('next_retry_at', 'IS NULL') | ||
| ->addWhere('processing_started_at', 'IS NOT NULL') | ||
| ->setLimit($limit) | ||
| ->execute() | ||
| ->getArrayCopy(); | ||
| } | ||
|
|
||
| if (empty($webhookIds)) { | ||
| /** |
There was a problem hiding this comment.
Similar to the getStuckWebhooks method, the repository guidelines prefer a manual loop with explicit type checks over fluent methods like getArrayCopy() or iterator_to_array() for processing CiviCRM Api4 results. This approach ensures compliance with strict static analysis (like PHPStan level 9) for typed arrays and maintains consistency with established codebase patterns. Please refactor this to use a foreach loop to allow for explicit type handling.
$webhooks = [];
foreach (\Civi\Api4\PaymentWebhook::get(FALSE)
->addSelect('id', 'processor_type')
->addWhere('processor_type', '=', $processorType)
->addWhere('status', '=', 'new')
->addWhere('attempts', '>', 0)
->addWhere('next_retry_at', 'IS NULL')
->addWhere('processing_started_at', 'IS NOT NULL')
->setLimit($limit)
->execute() as $webhook) {
$webhooks[] = $webhook;
}
return $webhooks;References
- When processing CiviCRM Api4 results, prefer a manual loop with explicit type checks over fluent methods like
getArrayCopy()orindexBy(). This approach ensures compliance with strict static analysis (like PHPStan level 9) for typed arrays and maintains consistency with established codebase patterns.
Civi/Paymentprocessingcore/Service/WebhookQueueRunnerService.php
Outdated
Show resolved
Hide resolved
34cd504 to
18ab40e
Compare
18ab40e to
113bafd
Compare
Overview
When a worker crashes (PHP fatal, OOM, server death) during webhook processing, webhooks get stuck in 'processing' status forever with no rollback. This adds stuck recovery with attempt tracking so webhooks are automatically retried (up to 3 attempts) or permanently failed.
Before
After
Technical Details
resetStuckWebhooks()split into data-access-onlygetStuckWebhooks()+ batch update methods (batchResetStuckToNew,batchMarkStuckAsPermanentError)recoverStuckWebhooks()inWebhookQueueRunnerServicehandles decision logic (retry vs permanent error) and re-queuingAND status = 'processing'guard to prevent race conditions