Prevent multiple execution of aborted jobs#528
Conversation
samdark
left a comment
There was a problem hiding this comment.
That sounds like a valid case. Would you please add a line for CHANGELOG? Thanks.
|
@samdark Thanks for the feedback. I still need to investigate why the tests are failing. After that, I'll adapt the CHANGELOG. |
src/Queue.php
Outdated
| if ($job instanceof RetryableJobInterface && !$job->canRetry($attempt - 1, $error)) { | ||
| return true; | ||
| } else { | ||
| // Non RetryableJobs can have a maximum of one attempt |
There was a problem hiding this comment.
This assumption is incorrect. RetryableJobInterface means that job can decide whether it should be retried. In other case it is controlled by attempts property at queue level - job can still have multiple retries even if it doesn't implement RetryableJobInterface.
There was a problem hiding this comment.
@rob006 Thanks for pointing that out. I've adjusted the condition.
However, I still need to check the tests.
|
I was able to fix the tests. Unfortunately, PHP 5.4 and PHP 7 fail for other reasons. I don't have time to investigate this further at the moment. I'm not entirely happy with my fix/PR, as these failed jobs are closed without any error message. Unfortunately, no specific error is available either. At least these jobs are not executed multiple times. |
src/Queue.php
Outdated
| list($job, $error) = $this->unserializeMessage($message); | ||
|
|
||
| // Handle aborted jobs without thrown error | ||
| if ($attempt > 1) { |
There was a problem hiding this comment.
I think that all this can be combined into one condition
There was a problem hiding this comment.
@s1lver Shall I summarize that? I'd be happy to. I personally prefer the breakdown for quick readability.
There was a problem hiding this comment.
What about this solution? Will return true if any of the conditions evaluate to true
return
($job instanceof RetryableJobInterface && !$job->canRetry($attempt - 1, $error))
|| (!($job instanceof RetryableJobInterface) && $attempt > $this->attempts)Co-authored-by: Alexander Makarov <sam@rmcreative.ru>
Co-authored-by: Alexander Makarov <sam@rmcreative.ru>
|
Thank you! |
|
После апдейта на 2.3.8 у меня все слетело к чертям, что работало месяцами! Вставить этот "фикс" в handleMessage() и назвать то, что было до этого "багом" - ну... человек просто не разобрался, как это работает. Line 229 in 95631f5 Под капотом - Symphony/Process - и конечно он убивает процесс по таймауту. Но потом джобы перезапускаются и доделывают свою работу. На это и было рассчитано. Все это было штатными вещами до этого релиза. Мало того. То, что писал Журавлев - веб-морду для мониторинга - это тоже начало работать с багами, поскольку там джобы теперь висят как отмеченные к перезапуску, но с учетом этого "патча", этого не будет случаться. И наконец, вишенка на торте: Надеюсь, что все это учтется в след. релизах. |
Can you fix it? |
|
I genuinely cannot believe this change was merged. Invoking the canRetry method with a null $error argument value is a major issue and can break existing implementations of canRetry. The definition of RetryableJobInterface clearly states: In our jobs, we rely on the error instance to determine whether a job should be retried. Calling canRetry with a null error value and ignoring the job's failure has already led to significant data loss. Example of an existing implementation that depends on the error: This change alters the contract of the interface in a backward-incompatible way and undermines the intended retry logic. |
|
You're right, I need to see how I can solve the initial problem in a different way. |
We have the situation that apparently in some environments, jobs canceled by
max_execution_timeare not correctly removed from the queue. As a result, they are executed again and again. Even if noRetryableJobInterfaceis implemented orcanRetry()always returnsfalse.Unfortunately, I can't reproduce it myself, but some users can in connection with CPanel environments. Perhaps the worker is there not running in CLI but in CGI mode.
In our project, we have now successfully solved the problem using the
ExecEvent. But maybe it makes sense to add this fix (even if I am not completely happy with it) to the upstream queue project.Related issue: humhub/calendar#547