Skip to content

fix: prevent worker starvation after max_execution_time restarts#2207

Closed
dunglas wants to merge 1 commit intomainfrom
fix/2205
Closed

fix: prevent worker starvation after max_execution_time restarts#2207
dunglas wants to merge 1 commit intomainfrom
fix/2205

Conversation

@dunglas
Copy link
Member

@dunglas dunglas commented Feb 19, 2026

Fixes #2205. At least I hope so.

Disable the execution timer in frankenphp_worker_request_shutdown() to prevent stale timers from firing between requests. Also, fix evaluation order in frankenphp_handle_request() to check for shutdown before calling frankenphp_worker_request_startup().

@AlliBalliBaba
Copy link
Contributor

Not sure this will fix it. On fatal error frankenphp_request_shutdown() is never called and we long jump directly to after

php_execute_script(&file_handle);

@AlliBalliBaba
Copy link
Contributor

AlliBalliBaba commented Feb 19, 2026

I could imagine that php_request_startup() might fail for some reason, but not sure how to get it to fail. Basically some kind of startup error here (which is caught and not logged)
https://github.com/php/php-src/blob/c8912639008a89a195a258fc8cb924b5763c6766/main/main.c#L1950

Might be necessary to completely restart the thread if php_request_startup fails

Comment on lines 670 to 678
if (frankenphp_worker_request_startup() == FAILURE
/* Shutting down */
|| !result.r0) {
if (!result.r0 || frankenphp_worker_request_startup() == FAILURE) {
RETURN_FALSE;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is probably fine, seems like this order is possible due to #1814. Should keep in mind that it will affect stuff happening on worker shutdown

while(frankenphp_handle_request(...)){
    # handle x requests
}

# will not reset globals for shutdown logic happening here (probably fine?)

Comment on lines +388 to +393
#ifdef ZEND_MAX_EXECUTION_TIMERS
/* Disable any execution timer set during the request callback to prevent
* stale timers from firing between requests */
zend_unset_timeout();
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does anything. The timeout can only happen between 2 calls to frankenphp_handle_request(), where it should be save to timeout.

Disable the execution timer in frankenphp_worker_request_shutdown() to
prevent stale timers from firing between requests. Also fix evaluation
order in frankenphp_handle_request() to check for shutdown before calling
frankenphp_worker_request_startup().

Fixes #2205

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dunglas
Copy link
Member Author

dunglas commented Feb 20, 2026

Thanks for the review @AlliBalliBaba! Let me address each point:

On zend_unset_timeout() in frankenphp_worker_request_shutdown()

I don't think this does anything. The timeout can only happen between 2 calls to frankenphp_handle_request(), where it should be save to timeout.

The timer set by set_time_limit() during the callback keeps ticking after the callback returns. Between RETURN_TRUE from one frankenphp_handle_request() and zend_unset_timeout() at the top of the next call, the PHP VM executes opcodes for the while loop (JMPNZ, INIT_FCALL, DO_ICALL). These opcodes check EG(vm_interrupt).

If the timer fires in this gap, the interrupt check triggers a fatal error — even though the request completed successfully. Under high load with requests near the time limit (e.g., 29.5s out of a 30s set_time_limit), this window is hit frequently, causing cascading worker crashes (the starvation described in #2205).

Adding zend_unset_timeout() at the start of frankenphp_worker_request_shutdown() closes this gap by disabling the timer immediately when the request finishes.

On frankenphp_request_shutdown() not being called on fatal error

On fatal error frankenphp_request_shutdown() is never called and we long jump directly to after L1317

frankenphp_request_shutdown() IS called — it's at line 1338, after zend_end_try(). When zend_bailout() fires inside php_execute_script(), the longjmp lands at zend_catch (line 1326), then execution continues past zend_end_try() to line 1338. So request shutdown does happen after fatal errors.

On the evaluation order change

You were right to flag this — the ASAN double-free is caused by this change. PHP's sapi_deactivate_module() frees content_type_dup without nulling the pointer. With the old order, frankenphp_worker_request_startup()sapi_activate() overwrites the dangling pointer before the script shuts down. With the new order, skipping startup leaves the stale pointer, and php_request_shutdown() double-frees it.

I've reverted this change. The PR now only contains the zend_unset_timeout() addition.

@AlliBalliBaba
Copy link
Contributor

If the timer fires in this gap, the interrupt check triggers a fatal error — even though the request completed successfully. Under high load with requests near the time limit (e.g., 29.5s out of a 30s set_time_limit), this window is hit frequently, causing cascading worker crashes (the starvation described in #2205).

This is not true. The timeout potentially happens in a time window of a few micro-seconds, extremely unlikely if the timeout is 30s. And it would only affect 1 thread, not all threads. Also it wouldn't matter if it happens in the gap, as it just leads to a fatal error / worker restart when reaching the while loop.

frankenphp_request_shutdown() IS called — it's at line 1338, after zend_end_try(). When zend_bailout() fires inside php_execute_script(), the longjmp lands at zend_catch (line 1326), then execution continues past zend_end_try() to line 1338. So request shutdown does happen after fatal errors.

Sry meant to say frankenphp_worker_request_shutdown() is never called, which is what's modified in this PR.

@dunglas
Copy link
Member Author

dunglas commented Feb 20, 2026

Hum ok, you look right. Let's close!

@dunglas dunglas closed this Feb 20, 2026
@AlliBalliBaba
Copy link
Contributor

#2205 might also be connected to the php.ini reset logic, just not sure how to reproduce it.
Might also just be from requests piling up on a locked database, in wich case max_wait_time will resolve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Permanent worker starvation after maximum execution time worker restarts

2 participants

Comments