Fix DevTools command completion race#533
Fix DevTools command completion race#533OLSander wants to merge 1 commit intochromiumembedded:masterfrom
Conversation
|
@magreenblatt can you please review this PR? Also mentioning @S1artie We have been running into CDP-related timeouts after updating JCEF. This happens rarely, but for some reason the timeouts are much easier to reproduce on a VM with Windows Server 2022. We flagged commit 7ac5c1f as the most likely cause, but finding conclusive proof took some time. Part of the problem was that we were close to a release, so we were forced to postpone the investigation and temporarily downgrade JCEF. Another complication was that the AI agents we involved all concluded that commit 7ac5c1f could not possibly introduce a race. After instrumenting JCEF with logging we are now convinced that the earlier AI analysis was flawed; the issue was indeed caused by the leak fix. We verified that the changes in this PR prevent the timeouts. Flawed AI analysis (provided by a coworker, not sure which model they used): |


Commit 7ac5c1f fixed a
queuedCommands_memory leak by addingremoveQueuedCommand(messageId)at the end ofonDevToolsMethodResult. Unfortunately this introduced a race condition that causes DevTools calls to hang indefinitely (or, if the caller addsorTimeout, to throw aTimeoutException).The race
executeDevToolsMethodchains two async steps:messageIdback via a callbackCompletableFuture<String>inqueuedCommands_for thatmessageId(viathenCompose)The leak fix assumed that by the time the observer's
onDevToolsMethodResultfires, step 2 has already completed and the queued future is bound to the caller. That assumption holds in the common case, but it is not guaranteed. The two callbacks — the message-id callback and the observer result callback — cross the JNI boundary independently and can be scheduled in either order.When the result arrives before step 2 runs:
getQueuedCommand(messageId)→ creates a newCompletableFutureF1, completes it, removes it from the mapgetQueuedCommand(messageId)→ map entry is gone, creates a newCompletableFutureF2This was confirmed with an instrumentation build that added lifecycle logging across the Java/native boundary. The observed log sequence for a reproducing
Network.setCacheDisabledcall matched the scenario above exactly, including a warning "DevTools result arrived before consumer bound queued future."Fix
Remove the observer-side cleanup. Instead, clean up from the consumer side — specifically from the
whenCompleteattached inexecuteDevToolsMethod, using the two-argumentMap.remove(key, value)overload to avoid accidentally deleting a newer mapping for a reused message id:This ensures cleanup can only happen after the consumer has bound to the specific future instance, eliminating the race while still preventing the original leak.