Fix DAP initialization deadlock by making launch request non-blocking#10
Fix DAP initialization deadlock by making launch request non-blocking#10jjskuld wants to merge 1 commit intoakiselev:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical deadlock issue during the Debug Adapter Protocol (DAP) initialization phase, particularly affecting lldb-dap on macOS. By refactoring the launch request mechanism to be universally non-blocking, the system now correctly handles debug adapters that defer their launch response, aligning with DAP specifications and improving robustness across various debugging environments. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a deadlock issue during DAP initialization with certain debug adapters like lldb-dap. The fix involves universally adopting a non-blocking launch request, which aligns better with the DAP specification and removes adapter-specific logic. The change is well-reasoned and effectively resolves the deadlock. However, my review highlights a potential regression in error handling for adapters that do respond immediately to the launch request. The new approach may cause launch failures from these adapters to be silently ignored, which could complicate debugging setup issues. I've added a detailed comment explaining this concern.
| client.launch_no_wait(launch_args).await?; | ||
| tracing::debug!("DAP launch request sent (no-wait mode)"); |
There was a problem hiding this comment.
By switching to launch_no_wait for all adapters, this change correctly fixes the deadlock with adapters like lldb-dap and debugpy. However, this introduces a potential regression for adapters that do respond to the launch request immediately.
Previously, client.launch() was used for those adapters, which would await the response and propagate any errors. With the switch to launch_no_wait, the response is handled by a background task without a registered waiter. This means if an immediate launch response indicates a failure, it will be treated as an 'unknown' response. The error from the adapter will be lost, and the session will proceed as if the launch was successful, potentially leading to confusing failures later.
To mitigate this, consider enhancing DapClient::process_message to more robustly handle unexpected error responses. For instance, if an un-awaited response indicates failure (success: false), it could be logged as an error with its message, rather than just a warning about an unknown sequence number. This would make launch failures from any adapter more visible.
aff4285 to
a82fe75
Compare
There was a problem hiding this comment.
Code Review
This pull request fixes a deadlock during DAP initialization by making the launch request non-blocking for all debug adapters, not just debugpy. This change aligns the implementation with the DAP specification and resolves issues with strict adapters like lldb-dap. While the fix is correct for its intended purpose, I've identified a high-severity issue where launch failures for adapters that respond immediately could be missed. My review includes a detailed comment explaining the issue and suggesting a more robust implementation to handle responses asynchronously without reintroducing the deadlock.
| // The DAP specification allows adapters to defer the launch response | ||
| // until after configurationDone is received. To prevent deadlocks with | ||
| // strict adapters (like debugpy and lldb-dap), we use a non-blocking launch. | ||
| client.launch_no_wait(launch_args).await?; | ||
| tracing::debug!("DAP launch request sent (no-wait mode)"); |
There was a problem hiding this comment.
While this change correctly fixes the deadlock with lldb-dap and other strict adapters by using a non-blocking launch universally, it introduces a potential issue for adapters that respond to the launch request immediately.
The current implementation of client.launch_no_wait (in src/dap/client.rs) uses send_request, which does not register a response handler in the pending map. When an adapter that responds immediately (like gdb) sends its launch response, the background reader task finds no corresponding handler and logs a warning: Received response for unknown request seq {}.
This means that if the launch fails for such an adapter, the error response will be ignored, and the session will likely time out later when waiting for the initialized event, masking the true root cause of the failure. This contradicts the PR description's claim that responses are 'handled cleanly'.
To fix this, launch_no_wait should be modified to handle responses asynchronously. A possible approach:
- In
launch_no_wait, register aoneshotchannel in thependingmap for thelaunchrequest's sequence number before sending it. - Spawn a new, non-blocking
tokiotask that awaits the response on theoneshotreceiver. - Inside this task, check if the response was successful. If not, log the error using
tracing::error!. This ensures launch failures are captured without re-introducing the deadlock.
This change would be in src/dap/client.rs and would make the overall solution more robust.
a82fe75 to
7111ff7
Compare
The Issue:
Currently, when trying to start a debugging session using Apple's lldb-dap (bundled with Xcode command line tools on macOS), the debugger-cli daemon hangs immediately and the client times out with: Error: Operation timed out after 0 seconds or timed out waiting for socket.
The Root Cause:
The issue stems from a deadlock during the DAP initialization handshake in src/daemon/session.rs.
The code contained a workaround specifically for debugpy (Python) because it does not respond to the launch request until after it receives the configurationDone request. However, Apple's lldb-dap exhibits the exact same behavior.
According to the official Microsoft DAP specification (https://microsoft.github.io/debug-adapter-protocol/overview#initialization):
By awaiting the launch request synchronously for non-Python adapters, debugger-cli was deadlocking: the CLI was waiting for the launch response, while lldb-dap was waiting for configurationDone.
The Fix:
This prevents strict adapters from deadlocking the initialization phase.
oneshot::channel for the sequence number and spawns a non-blocking tokio::spawn task to await the response. If the adapter immediately rejects the launch (e.g., missing
executable or bad arguments), the background task catches success: false and properly logs the error using tracing::error!.
Testing: