Fix stale RxDone race condition in sx12xx driver#2
Open
ringlej wants to merge 1 commit intogridpoint-4.3.0from
Open
Fix stale RxDone race condition in sx12xx driver#2ringlej wants to merge 1 commit intogridpoint-4.3.0from
ringlej wants to merge 1 commit intogridpoint-4.3.0from
Conversation
abelino
reviewed
Mar 26, 2026
When lora_recv_async() is called with cb=NULL to cancel ongoing reception, modem_release() is called but async_rx_cb is left pointing to the previous callback. If a stale DIO1 work item fires after the radio has been reconfigured for TX, sx12xx_ev_rx_done() checks async_rx_cb (still set), calls Radio.Rx(0) during an active TX, and fires the stale callback. This corrupts the radio state and leaks packet buffers allocated by the callback. Clear async_rx_cb before calling modem_release() so that any stale DIO1 work item that fires after cancellation takes the synchronous path in sx12xx_ev_rx_done(), which properly checks modem_usage and bails out. Signed-off-by: Jon Ringle <jringle@gridpoint.com>
0bb1103 to
a422e9e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
async_rx_cbwhen cancelling async reception insx12xx_lora_recv_async(), preventing stale DIO1 work items from dispatching the callback after RX→TX transitionLOG_WRNdetection for stale RxDone events to validate the fix in the fieldRoot Cause
sx12xx_lora_recv_async(dev, NULL)callsmodem_release()but does not cleardev_data.async_rx_cb. When a stale DIO1 work item fires after the radio has beenreconfigured for TX,
sx12xx_ev_rx_done()sees the stale callback, callsRadio.Rx(0)during an active TX, and fires the callback — leaking packet buffers permanently.
Race Condition Sequence
sequenceDiagram participant HW as Radio HW participant ISR as DIO1 ISR participant WQ as sysworkq participant TX as TX thread participant RX as RX thread Note over HW: Radio in async RX<br/>on network channel HW->>ISR: Packet received (RxDone)<br/>IRQ_RX_DONE latched in HW register ISR->>ISR: irq_disable() ISR->>WQ: k_work_submit(dio1_work)<br/>queued, not yet run Note over TX: TX thread needs to send<br/>join request on join channel TX->>TX: take radio lock TX->>HW: lora_recv_async(NULL)<br/>modem_release() → Radio.Sleep() Note over HW: IRQ_RX_DONE NOT cleared<br/>async_rx_cb NOT cleared TX->>HW: lora_config(TX, join_freq) TX->>HW: Radio.Send(data, len) Note over HW: TX active on join channel TX-->>TX: k_poll() — YIELDS waiting for TX_DONE Note over WQ: sysworkq scheduled<br/>(higher priority) WQ->>WQ: RadioOnDioIrq()<br/>IrqFired = true WQ->>HW: RadioIrqProcess()<br/>SX126xGetIrqStatus() HW-->>WQ: IRQ_RX_DONE (stale!) WQ->>HW: SX126xClearIrqStatus() WQ->>WQ: sx12xx_ev_rx_done() Note over WQ: async_rx_cb still set! WQ->>HW: Radio.Rx(0) Note over HW: RX started during active TX<br/>RADIO STATE CORRUPTED WQ->>WQ: async_rx_cb() fires WQ->>WQ: packet_buf_alloc() → buf[0] WQ->>RX: k_fifo_put() → buf[0] in FIFO WQ->>HW: irq_enable() Note over HW: Radio now receiving.<br/>Picks up noise/packets. loop Every ~1 second HW->>ISR: DIO1 interrupt (new packet/noise) ISR->>WQ: k_work_submit() WQ->>WQ: sx12xx_ev_rx_done() → async_rx_cb() WQ->>WQ: packet_buf_alloc() → buf[N] leaked end TX-->>TX: k_poll returns (timeout or TX_DONE) TX->>TX: release radio lock Note over TX,RX: 4 buffers permanently leaked<br/>FIFO delta = 3<br/>Gateway OOM — no recoveryFix
Clear
async_rx_cbbeforemodem_release():Evidence from logs
Radio.Rx(0)picking up successive packets/noise[sc-220449]