From 4d5b9f05651e343f889665d94af9410c4f40485b Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Fri, 13 Mar 2026 11:35:24 +0200 Subject: [PATCH 1/5] [FIX] Fix race conditions in Auditor bookie change detection MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Problem Investigating CI flakiness in `BookieAutoRecoveryTest` revealed three related race conditions in `Auditor` that could cause a bookie departure to go undetected until the next periodic audit run (default: 86400 s), causing tests that wait up to 60 s for underreplicated-ledger marking to time out. ### Race 1 – watcher-provided bookie set was discarded `watchBookieChanges()` passed `bookies -> submitAuditTask()` as the listener, silently ignoring the current bookie set delivered by the ZK watcher. Inside `submitAuditTask()` a fresh `getAvailableBookies()` ZK call was made from an executor lambda that could run well after the watcher fired (e.g. while `checkAllLedgers` was occupying the single-threaded executor). In that window the two views of the cluster could diverge, causing `bookiesToBeAudited` to end up empty and the departure to be silently skipped. Fix: capture `pendingWritableBookies` / `pendingReadOnlyBookies` atomically in the watcher callback and read them in `runAuditTask()`, falling back to `getAvailableBookies()` only when the watchers have not yet fired (e.g. direct test calls via `submitAuditTask()`). ### Race 2 – duplicate audit tasks could be queued per change type Rapid successive watcher callbacks (or a slow executor) could queue many identical audit tasks, each making its own ZK calls, wasting resources and risking interleaved state updates to `knownBookies` / `bookiesToBeAudited`. Fix: separate `AtomicBoolean` queued-flags (`writableAuditTaskQueued`, `readOnlyAuditTaskQueued`) ensure at most one task is *queued* per change type at a time. The flag is cleared at the *start* of the task (not at submission), so watcher callbacks that arrive while a task is already running queue exactly one follow-up task rather than being silently dropped. ### Race 3 – knownBookies initialised after watcher registration `start()` called `watchBookieChanges()` before assigning `knownBookies`. A watcher callback firing immediately (the ZK registration client fires listeners synchronously with the current bookie set when one is already cached) could race with the subsequent `knownBookies = admin.getAllBookies()` assignment. Although the existing `synchronized` on `start()` and `submitAuditTask()` prevented the *lambda* from running before `knownBookies` was set, the field itself was not `volatile`, leaving a visibility gap for the executor thread. Fix: swap the initialisation order so `knownBookies` is populated before `watchBookieChanges()` is called, and declare the field `volatile` to guarantee cross-thread visibility of the reference assignment. ## Changes - `Auditor.knownBookies`: add `volatile` - New fields `pendingWritableBookies`, `pendingReadOnlyBookies` (`AtomicReference>`): hold the latest bookie set from each watcher type - New fields `writableAuditTaskQueued`, `readOnlyAuditTaskQueued` (`AtomicBoolean`): deduplication flags per change type - `watchBookieChanges()`: populate the pending sets and call new `submitAuditTaskForBookieChange(boolean)` instead of `submitAuditTask()` - `submitAuditTaskForBookieChange(boolean)`: queues a task only when no task of that type is already queued; clears the flag at task start - `submitAuditTask()`: unchanged public/test API; now delegates to the extracted `runAuditTask()` - `runAuditTask()`: extracted audit logic; uses watcher-captured bookie sets when available, falls back to `getAvailableBookies()` - `start()`: initialise `knownBookies` before `watchBookieChanges()` Co-Authored-By: Claude Sonnet 4.6 --- .../bookkeeper/replication/Auditor.java | 215 ++++++++++++------ 1 file changed, 143 insertions(+), 72 deletions(-) diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java index 9c6be197550..1766288bedc 100644 --- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java +++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java @@ -35,6 +35,7 @@ import java.util.concurrent.ThreadFactory; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicReference; import java.util.function.BiConsumer; import java.util.stream.Collectors; import org.apache.bookkeeper.client.BKException; @@ -76,11 +77,21 @@ public class Auditor implements AutoCloseable { private LedgerManager ledgerManager; private LedgerUnderreplicationManager ledgerUnderreplicationManager; private final ScheduledExecutorService executor; - private List knownBookies = new ArrayList(); + // volatile so the executor thread always sees the reference set by start() + private volatile List knownBookies = new ArrayList(); private final String bookieIdentifier; protected volatile Future auditTask; private final Set bookiesToBeAudited = Sets.newHashSet(); private volatile int lostBookieRecoveryDelayBeforeChange; + // Latest bookie sets received from watcher callbacks (null until the first watcher fires). + // Written by ZK watcher callback threads, read by the executor thread. + private final AtomicReference> pendingWritableBookies = new AtomicReference<>(); + private final AtomicReference> pendingReadOnlyBookies = new AtomicReference<>(); + // Guards against queuing more than one pending audit task per change type. + // Cleared at the start of the task so that changes arriving while the task is + // running cause a new task to be queued rather than being silently dropped. + private final AtomicBoolean writableAuditTaskQueued = new AtomicBoolean(false); + private final AtomicBoolean readOnlyAuditTaskQueued = new AtomicBoolean(false); protected AuditorBookieCheckTask auditorBookieCheckTask; protected AuditorTask auditorCheckAllLedgersTask; protected AuditorTask auditorPlacementPolicyCheckTask; @@ -237,6 +248,33 @@ private void submitShutdownTask() { } } + /** + * Submit an audit task triggered by a bookie change notification from the watcher. + * At most one task per change type (writable/readonly) is queued at a time: + * the queued-flag is cleared when the task starts executing, so any watcher + * callbacks that arrive while the task is running will queue exactly one more task. + */ + private synchronized void submitAuditTaskForBookieChange(boolean writableChange) { + if (executor.isShutdown()) { + return; + } + AtomicBoolean queued = writableChange ? writableAuditTaskQueued : readOnlyAuditTaskQueued; + if (queued.compareAndSet(false, true)) { + executor.submit(() -> { + // Clear the flag before running so that watcher callbacks arriving + // during execution can queue a follow-up task instead of being dropped. + queued.set(false); + runAuditTask(); + }); + } + // else: a task is already queued; the latest pendingWritableBookies / + // pendingReadOnlyBookies will be picked up when that task starts. + } + + /** + * Submit an audit task unconditionally. Used by tests and by the + * LostBookieRecoveryDelay-changed event handler. + */ @VisibleForTesting synchronized Future submitAuditTask() { if (executor.isShutdown()) { @@ -244,78 +282,102 @@ synchronized Future submitAuditTask() { f.setException(new BKAuditException("Auditor shutting down")); return f; } - return executor.submit(() -> { - try { - waitIfLedgerReplicationDisabled(); - int lostBookieRecoveryDelay = Auditor.this.ledgerUnderreplicationManager - .getLostBookieRecoveryDelay(); - List availableBookies = getAvailableBookies(); - - // casting to String, as knownBookies and availableBookies - // contains only String values - // find new bookies(if any) and update the known bookie list - Collection newBookies = CollectionUtils.subtract( - availableBookies, knownBookies); - knownBookies.addAll(newBookies); - if (!bookiesToBeAudited.isEmpty() && knownBookies.containsAll(bookiesToBeAudited)) { - // the bookie, which went down earlier and had an audit scheduled for, - // has come up. So let us stop tracking it and cancel the audit. Since - // we allow delaying of audit when there is only one failed bookie, - // bookiesToBeAudited should just have 1 element and hence containsAll - // check should be ok - if (auditTask != null && auditTask.cancel(false)) { - auditTask = null; - auditorStats.getNumDelayedBookieAuditsCancelled().inc(); - } - bookiesToBeAudited.clear(); - } + return executor.submit(this::runAuditTask); + } + + /** + * Core audit logic: determine which bookies have disappeared and trigger + * re-replication if needed. Runs on the single-threaded {@link #executor}. + * + *

Uses the bookie sets most recently received from the ZK watcher callbacks + * ({@link #pendingWritableBookies} / {@link #pendingReadOnlyBookies}) when + * both are available, avoiding a redundant ZK round-trip and the race where a + * fresh {@code getAvailableBookies()} call could see a different state than the + * event that triggered this task. + */ + private void runAuditTask() { + try { + waitIfLedgerReplicationDisabled(); + int lostBookieRecoveryDelay = Auditor.this.ledgerUnderreplicationManager + .getLostBookieRecoveryDelay(); + + // Use the bookie sets captured synchronously by the watcher callbacks. + // Fall back to a fresh ZK read when the watchers haven't fired yet + // (e.g. direct test invocations via submitAuditTask()). + Set writableSnapshot = pendingWritableBookies.get(); + Set readOnlySnapshot = pendingReadOnlyBookies.get(); + List availableBookies; + if (writableSnapshot != null && readOnlySnapshot != null) { + availableBookies = new ArrayList<>(writableSnapshot); + availableBookies.addAll(readOnlySnapshot); + } else { + availableBookies = getAvailableBookies(); + } - // find lost bookies(if any) - bookiesToBeAudited.addAll(CollectionUtils.subtract(knownBookies, availableBookies)); - if (bookiesToBeAudited.size() == 0) { - return; + // casting to String, as knownBookies and availableBookies + // contains only String values + // find new bookies(if any) and update the known bookie list + Collection newBookies = CollectionUtils.subtract( + availableBookies, knownBookies); + knownBookies.addAll(newBookies); + if (!bookiesToBeAudited.isEmpty() && knownBookies.containsAll(bookiesToBeAudited)) { + // the bookie, which went down earlier and had an audit scheduled for, + // has come up. So let us stop tracking it and cancel the audit. Since + // we allow delaying of audit when there is only one failed bookie, + // bookiesToBeAudited should just have 1 element and hence containsAll + // check should be ok + if (auditTask != null && auditTask.cancel(false)) { + auditTask = null; + auditorStats.getNumDelayedBookieAuditsCancelled().inc(); } + bookiesToBeAudited.clear(); + } - knownBookies.removeAll(bookiesToBeAudited); - if (lostBookieRecoveryDelay == 0) { - auditorBookieCheckTask.startAudit(false); - bookiesToBeAudited.clear(); - return; + // find lost bookies(if any) + bookiesToBeAudited.addAll(CollectionUtils.subtract(knownBookies, availableBookies)); + if (bookiesToBeAudited.size() == 0) { + return; + } + + knownBookies.removeAll(bookiesToBeAudited); + if (lostBookieRecoveryDelay == 0) { + auditorBookieCheckTask.startAudit(false); + bookiesToBeAudited.clear(); + return; + } + if (bookiesToBeAudited.size() > 1) { + // if more than one bookie is down, start the audit immediately; + LOG.info("Multiple bookie failure; not delaying bookie audit. " + + "Bookies lost now: {}; All lost bookies: {}", + CollectionUtils.subtract(knownBookies, availableBookies), + bookiesToBeAudited); + if (auditTask != null && auditTask.cancel(false)) { + auditTask = null; + auditorStats.getNumDelayedBookieAuditsCancelled().inc(); } - if (bookiesToBeAudited.size() > 1) { - // if more than one bookie is down, start the audit immediately; - LOG.info("Multiple bookie failure; not delaying bookie audit. " - + "Bookies lost now: {}; All lost bookies: {}", - CollectionUtils.subtract(knownBookies, availableBookies), - bookiesToBeAudited); - if (auditTask != null && auditTask.cancel(false)) { - auditTask = null; - auditorStats.getNumDelayedBookieAuditsCancelled().inc(); - } + auditorBookieCheckTask.startAudit(false); + bookiesToBeAudited.clear(); + return; + } + if (auditTask == null) { + // if there is no scheduled audit, schedule one + auditTask = executor.schedule(() -> { auditorBookieCheckTask.startAudit(false); + auditTask = null; bookiesToBeAudited.clear(); - return; - } - if (auditTask == null) { - // if there is no scheduled audit, schedule one - auditTask = executor.schedule(() -> { - auditorBookieCheckTask.startAudit(false); - auditTask = null; - bookiesToBeAudited.clear(); - }, lostBookieRecoveryDelay, TimeUnit.SECONDS); - auditorStats.getNumBookieAuditsDelayed().inc(); - LOG.info("Delaying bookie audit by {} secs for {}", lostBookieRecoveryDelay, - bookiesToBeAudited); - } - } catch (BKException bke) { - LOG.error("Exception getting bookie list", bke); - } catch (InterruptedException ie) { - Thread.currentThread().interrupt(); - LOG.error("Interrupted while watching available bookies ", ie); - } catch (UnavailableException ue) { - LOG.error("Exception while watching available bookies", ue); + }, lostBookieRecoveryDelay, TimeUnit.SECONDS); + auditorStats.getNumBookieAuditsDelayed().inc(); + LOG.info("Delaying bookie audit by {} secs for {}", lostBookieRecoveryDelay, + bookiesToBeAudited); } - }); + } catch (BKException bke) { + LOG.error("Exception getting bookie list", bke); + } catch (InterruptedException ie) { + Thread.currentThread().interrupt(); + LOG.error("Interrupted while watching available bookies ", ie); + } catch (UnavailableException ue) { + LOG.error("Exception while watching available bookies", ue); + } } synchronized Future submitLostBookieRecoveryDelayChangedEvent() { @@ -386,13 +448,12 @@ public void start() { } try { - watchBookieChanges(); - // Start with all available bookies - // to handle situations where the auditor - // is started after some bookies have already failed + // Initialize knownBookies before registering watchers so that any + // watcher callback that fires immediately sees a consistent baseline. knownBookies = admin.getAllBookies().stream() .map(BookieId::toString) .collect(Collectors.toList()); + watchBookieChanges(); this.ledgerUnderreplicationManager .notifyLostBookieRecoveryDelayChanged(new LostBookieRecoveryDelayChangedCb()); } catch (BKException bke) { @@ -598,8 +659,18 @@ protected List getAvailableBookies() throws BKException { } private void watchBookieChanges() throws BKException { - admin.watchWritableBookiesChanged(bookies -> submitAuditTask()); - admin.watchReadOnlyBookiesChanged(bookies -> submitAuditTask()); + admin.watchWritableBookiesChanged(bookies -> { + pendingWritableBookies.set(bookies.getValue().stream() + .map(BookieId::toString) + .collect(Collectors.toSet())); + submitAuditTaskForBookieChange(true); + }); + admin.watchReadOnlyBookiesChanged(bookies -> { + pendingReadOnlyBookies.set(bookies.getValue().stream() + .map(BookieId::toString) + .collect(Collectors.toSet())); + submitAuditTaskForBookieChange(false); + }); } /** From 590cc7e96fe9da7b2d2231def5c84196135b0d84 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Fri, 13 Mar 2026 11:50:02 +0200 Subject: [PATCH 2/5] [FIX] Replace AtomicBoolean with AtomicInteger for audit task deduplication MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous AtomicBoolean approach (flag cleared at task *start*) had a subtle race: between the flag being cleared and `runAuditTask()` actually reading `pendingWritableBookies`, a watcher callback could see `flag=false` and submit a second concurrent task. Because the flag gives no information about a task that is already *running*, the invariant "at most one queued + one in-progress" was not cleanly enforced. Replace with an `AtomicInteger` counter (max 2) that is: - incremented synchronously at submission time (inside the existing `synchronized` block that also protects the `isShutdown()` check) - decremented in a `finally` block when the task *finishes* This accurately counts all live tasks (running or queued) for the full duration of their lifetime. When the counter reaches 2 (one in-progress + one queued), further watcher callbacks are dropped — the queued task will read the latest `pendingWritableBookies` / `pendingReadOnlyBookies` when it eventually executes, so no additional task provides any benefit. Co-Authored-By: Claude Sonnet 4.6 --- .../bookkeeper/replication/Auditor.java | 47 ++++++++++++------- 1 file changed, 29 insertions(+), 18 deletions(-) diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java index 1766288bedc..32de1d04322 100644 --- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java +++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java @@ -35,6 +35,7 @@ import java.util.concurrent.ThreadFactory; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; +import java.util.concurrent.atomic.AtomicInteger; import java.util.concurrent.atomic.AtomicReference; import java.util.function.BiConsumer; import java.util.stream.Collectors; @@ -87,11 +88,13 @@ public class Auditor implements AutoCloseable { // Written by ZK watcher callback threads, read by the executor thread. private final AtomicReference> pendingWritableBookies = new AtomicReference<>(); private final AtomicReference> pendingReadOnlyBookies = new AtomicReference<>(); - // Guards against queuing more than one pending audit task per change type. - // Cleared at the start of the task so that changes arriving while the task is - // running cause a new task to be queued rather than being silently dropped. - private final AtomicBoolean writableAuditTaskQueued = new AtomicBoolean(false); - private final AtomicBoolean readOnlyAuditTaskQueued = new AtomicBoolean(false); + // Counts tasks (running + queued) per change type. Incremented when a task is + // submitted, decremented when it finishes. Capped at 2 (one in-progress + one + // queued): a queued task always reads the latest pending bookie set when it + // starts, so there is no value in queuing more than one additional task. + private static final int MAX_AUDIT_TASKS_PER_TYPE = 2; + private final AtomicInteger writableAuditTaskCount = new AtomicInteger(0); + private final AtomicInteger readOnlyAuditTaskCount = new AtomicInteger(0); protected AuditorBookieCheckTask auditorBookieCheckTask; protected AuditorTask auditorCheckAllLedgersTask; protected AuditorTask auditorPlacementPolicyCheckTask; @@ -250,25 +253,33 @@ private void submitShutdownTask() { /** * Submit an audit task triggered by a bookie change notification from the watcher. - * At most one task per change type (writable/readonly) is queued at a time: - * the queued-flag is cleared when the task starts executing, so any watcher - * callbacks that arrive while the task is running will queue exactly one more task. + * + *

At most {@value #MAX_AUDIT_TASKS_PER_TYPE} tasks (one in-progress + one + * queued) are allowed per change type at any time. The counter is incremented on + * submission and decremented when the task finishes, so the limit is enforced + * across the full task lifetime. A queued task always reads the latest pending + * bookie set when it starts, so queuing more than one additional task would only + * duplicate work without improving correctness. */ private synchronized void submitAuditTaskForBookieChange(boolean writableChange) { if (executor.isShutdown()) { return; } - AtomicBoolean queued = writableChange ? writableAuditTaskQueued : readOnlyAuditTaskQueued; - if (queued.compareAndSet(false, true)) { - executor.submit(() -> { - // Clear the flag before running so that watcher callbacks arriving - // during execution can queue a follow-up task instead of being dropped. - queued.set(false); - runAuditTask(); - }); + AtomicInteger count = writableChange ? writableAuditTaskCount : readOnlyAuditTaskCount; + if (count.get() >= MAX_AUDIT_TASKS_PER_TYPE) { + // One task is already running and one is queued. The queued task will + // pick up the latest pendingWritableBookies / pendingReadOnlyBookies + // when it executes, so no additional task is needed. + return; } - // else: a task is already queued; the latest pendingWritableBookies / - // pendingReadOnlyBookies will be picked up when that task starts. + count.incrementAndGet(); + executor.submit(() -> { + try { + runAuditTask(); + } finally { + count.decrementAndGet(); + } + }); } /** From 162b707de7c11befa2ec178ea27cfa71bee0877a Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Fri, 13 Mar 2026 11:56:31 +0200 Subject: [PATCH 3/5] [FIX] Fetch only the missing bookie type when one watcher snapshot is absent The previous fallback in runAuditTask() called getAvailableBookies() (both writable + readonly ZK reads) whenever *either* pending snapshot was null. This was wrong: writable and readonly watcher callbacks are independent and fire separately, so one snapshot can be present while the other is not yet populated (e.g. right after startup when only one type of change has occurred). Replace the single else-branch with three cases: - Both snapshots present: combine them (no ZK call, as before). - Only writable snapshot present: use it + call admin.getReadOnlyBookies(). - Only readonly snapshot present: call admin.getAvailableBookies() + use it. - Neither present (direct test calls via submitAuditTask()): call the full getAvailableBookies() as before. This ensures we always use the watcher-captured data for the half that is available and only make a targeted ZK round-trip for the missing half. Co-Authored-By: Claude Sonnet 4.6 --- .../bookkeeper/replication/Auditor.java | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java index 32de1d04322..5d76aaedf66 100644 --- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java +++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java @@ -313,14 +313,29 @@ private void runAuditTask() { .getLostBookieRecoveryDelay(); // Use the bookie sets captured synchronously by the watcher callbacks. - // Fall back to a fresh ZK read when the watchers haven't fired yet - // (e.g. direct test invocations via submitAuditTask()). + // If only one type has fired so far, fetch only the other type from the + // admin API to avoid a redundant ZK round-trip for the known half. + // Fall back to a full getAvailableBookies() only when neither watcher + // has fired yet (e.g. direct test invocations via submitAuditTask()). Set writableSnapshot = pendingWritableBookies.get(); Set readOnlySnapshot = pendingReadOnlyBookies.get(); List availableBookies; if (writableSnapshot != null && readOnlySnapshot != null) { availableBookies = new ArrayList<>(writableSnapshot); availableBookies.addAll(readOnlySnapshot); + } else if (writableSnapshot != null) { + // Readonly watcher hasn't fired yet; fetch only readonly bookies. + availableBookies = new ArrayList<>(writableSnapshot); + for (BookieId id : admin.getReadOnlyBookies()) { + availableBookies.add(id.toString()); + } + } else if (readOnlySnapshot != null) { + // Writable watcher hasn't fired yet; fetch only writable bookies. + availableBookies = new ArrayList<>(); + for (BookieId id : admin.getAvailableBookies()) { + availableBookies.add(id.toString()); + } + availableBookies.addAll(readOnlySnapshot); } else { availableBookies = getAvailableBookies(); } From 79baa3fc7048d6237a0eb7128105891906f7a1dd Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Fri, 13 Mar 2026 13:31:13 +0200 Subject: [PATCH 4/5] Fix issue running tests in docker with multiple loopback interfaces and IPs --- .../java/org/apache/bookkeeper/net/DNS.java | 21 +++++++++++++++++-- .../bookkeeper/conf/TestBKConfiguration.java | 21 ++++++++++++++++++- 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/DNS.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/DNS.java index 68ff6621265..2c714ef88f7 100644 --- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/DNS.java +++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/net/DNS.java @@ -202,7 +202,7 @@ public static String[] getIPs(String strInterface, /** - * Returns the first available IP address associated with the provided + * Returns the first available IP address with a resolvable hostname associated with the provided * network interface or the local host IP if "default" is given. * * @param strInterface The name of the network interface or subinterface to query @@ -214,7 +214,24 @@ public static String[] getIPs(String strInterface, public static String getDefaultIP(String strInterface) throws UnknownHostException { String[] ips = getIPs(strInterface); - return ips[0]; + UnknownHostException lastException = null; + for (String ip : ips) { + String resolved = null; + try { + resolved = InetAddress.getByName(ip).getHostName(); + } catch (UnknownHostException e) { + // skip ip addresses that cannot be resolved to a hostname + lastException = e; + continue; + } + if (resolved.equals(ip)) { + lastException = new UnknownHostException( + "IP address " + ip + " for " + strInterface + " interface cannot be resolved to a hostname"); + continue; + } + return ip; + } + throw lastException; } /** diff --git a/bookkeeper-server/src/test/java/org/apache/bookkeeper/conf/TestBKConfiguration.java b/bookkeeper-server/src/test/java/org/apache/bookkeeper/conf/TestBKConfiguration.java index 4b1d64fc94b..b823649140e 100644 --- a/bookkeeper-server/src/test/java/org/apache/bookkeeper/conf/TestBKConfiguration.java +++ b/bookkeeper-server/src/test/java/org/apache/bookkeeper/conf/TestBKConfiguration.java @@ -23,10 +23,14 @@ import java.net.NetworkInterface; import java.net.SocketException; +import java.net.UnknownHostException; import java.util.Collections; import java.util.Enumeration; +import java.util.LinkedHashSet; +import java.util.Set; import org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage; import org.apache.bookkeeper.common.allocator.PoolingPolicy; +import org.apache.bookkeeper.net.DNS; import org.apache.bookkeeper.util.PortManager; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -69,12 +73,27 @@ public static ServerConfiguration newServerConfiguration() { private static String getLoopbackInterfaceName() { try { + Set loopbackInterfaces = new LinkedHashSet<>(); Enumeration nifs = NetworkInterface.getNetworkInterfaces(); for (NetworkInterface nif : Collections.list(nifs)) { if (nif.isLoopback()) { - return nif.getName(); + String nifName = nif.getName(); + try { + DNS.getDefaultIP(nifName); + } catch (UnknownHostException e) { + // skip interfaces that don't have a resolvable hostname + continue; + } + loopbackInterfaces.add(nifName); } } + // prefer lo if available to avoid issues on Linux + if (loopbackInterfaces.contains("lo")) { + return "lo"; + } + if (!loopbackInterfaces.isEmpty()) { + return loopbackInterfaces.iterator().next(); + } } catch (SocketException se) { LOG.warn("Exception while figuring out loopback interface. Will use null.", se); return null; From 4833e97ed31ed1744e51f778c56477a995992216 Mon Sep 17 00:00:00 2001 From: Lari Hotari Date: Fri, 13 Mar 2026 15:39:50 +0200 Subject: [PATCH 5/5] [FIX] submitAuditTask() must run full audit to detect bookies not in knownBookies submitAuditTask() was submitting runAuditTask(), which compares pendingWritableBookies against knownBookies to detect lost bookies. knownBookies is seeded from admin.getAllBookies() at auditor startup, so any bookie that registers after the auditor started and dies before being processed by runAuditTask() is never added to knownBookies and remains invisible to the subtraction-based loss detection. This caused testEmptyLedgerLosesQuorumEventually to fail when run after testNoSuchLedgerExists: bookies 33105 and 36375 registered after auditor 42899 started, two watcher tasks were queued (one per registration), and by the time they ran (after checkAllLedgers completed 112ms later) bookie 36375 had already died. pendingWritableBookies correctly reflected {42899,33105} but since 36375 was never in knownBookies the loss went undetected. The explicit submitAuditTask().get() call in the test suffered the same gap. Fix: submitAuditTask() now submits auditorBookieCheckTask.startAudit(false) instead of runAuditTask(). The full auditBookies() scan uses ledger metadata to find all bookies that own ledgers and checks their availability, detecting 36375 via the ledger it appears in regardless of knownBookies state. The runAuditTask() / knownBookies mechanism is still correct and sufficient for its intended purpose: detecting losses for bookies that were already known. The register-and-die edge case is an inherent limitation of that lightweight approach and is handled by the periodic auditorBookieCheckTask and by explicit submitAuditTask() calls. Co-Authored-By: Claude Sonnet 4.6 --- .../java/org/apache/bookkeeper/replication/Auditor.java | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java index 5d76aaedf66..2be82159e7e 100644 --- a/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java +++ b/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/Auditor.java @@ -283,8 +283,13 @@ private synchronized void submitAuditTaskForBookieChange(boolean writableChange) } /** - * Submit an audit task unconditionally. Used by tests and by the + * Submit a full bookie-check audit task unconditionally. Used by tests and by the * LostBookieRecoveryDelay-changed event handler. + * + *

Runs the full {@code auditBookies()} scan (which uses ledger metadata) rather + * than the lightweight {@link #runAuditTask()}, so that bookies which registered + * and died since the auditor started are correctly detected even if they were never + * added to {@code knownBookies}. */ @VisibleForTesting synchronized Future submitAuditTask() { @@ -293,7 +298,7 @@ synchronized Future submitAuditTask() { f.setException(new BKAuditException("Auditor shutting down")); return f; } - return executor.submit(this::runAuditTask); + return executor.submit(() -> auditorBookieCheckTask.startAudit(false)); } /**