Skip to content

[ISSUE #9728] Fix correctMinOffset handling of incomplete consume queue records#10211

Open
daguimu wants to merge 1 commit intoapache:developfrom
daguimu:fix/correctMinOffset-partial-record-9728
Open

[ISSUE #9728] Fix correctMinOffset handling of incomplete consume queue records#10211
daguimu wants to merge 1 commit intoapache:developfrom
daguimu:fix/correctMinOffset-partial-record-9728

Conversation

@daguimu
Copy link
Copy Markdown

@daguimu daguimu commented Mar 25, 2026

Problem

In ConsumeQueue.correctMinOffset(), the size check at line 616 only guards against completely empty buffers (result.getSize() == 0). When a consume queue mapped file contains a partially written record (1-19 bytes, less than CQ_STORE_UNIT_SIZE = 20), the subsequent buffer.getLong() call throws BufferUnderflowException.

This can happen when:

  1. The last mapped file in a consume queue has a partially written record due to async flush
  2. CommitLog messages are lost during restart (async-flush + sync-master)
  3. The broker does not truncate the incomplete record during recovery

Root Cause

The check result.getSize() == 0 does not cover the case where result.getSize() is between 1 and 19 (less than one complete CQ unit of 20 bytes). A partial record cannot be meaningfully processed.

Note: PR #10109 previously added a guard for the last mapped file check (lines 564-570), which resolved the main startup crash. This PR addresses the remaining edge case in the binary search section where a partially written record could still cause a BufferUnderflowException.

Fix

Changed the size check from:

if (result.getSize() == 0) {

to:

if (result.getSize() < ConsumeQueue.CQ_STORE_UNIT_SIZE) {

This properly handles both empty buffers and incomplete records.

Tests Added

No new tests needed — existing ConsumeQueueTest (13 tests covering correctMinOffset with various offset scenarios) all pass. The fix is a one-line defensive check improvement.

Impact

  • Prevents BufferUnderflowException on partially written consume queue records
  • No behavioral change for normal consume queues (all valid records are >= 20 bytes)
  • Minimal, single-line change with no risk of side effects

Fixes #9728

…me queue records

Change the size check from `result.getSize() == 0` to
`result.getSize() < CQ_STORE_UNIT_SIZE` to properly handle partially
written consume queue records. When the buffer contains fewer bytes
than a complete CQ unit (20 bytes), the subsequent getLong() call
would throw BufferUnderflowException.

Fixes apache#9728
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] consumequeue correctMinOffset throws IllegalArgumentException causing startup failure

1 participant