Skip to content

[Detail Bug] Race condition in ProviderCircuitState causes NPE on first rate-limit #39

@detail-app

Description

@detail-app

Summary

  • Context: ProviderCircuitState manages the in-memory circuit-breaker state for API providers to prevent overloading them after receiving rate-limit signals.
  • Bug: A race condition in the state update logic allows isAvailable() to read an uninitialized (null) nextRetryTime while the circuit is marked as open, leading to a NullPointerException.
  • Actual vs. expected: Concurrent threads can encounter a crash (NPE) when a provider first hits a rate limit, instead of correctly identifying that the provider is unavailable.
  • Impact: The application fails to process requests and may crash during the transition to a rate-limited state, especially under high concurrency when a provider is first throttled.

Code with bug

private volatile Instant nextRetryTime; // <-- BUG 🔴 [Uninitialized; remains null until first recordRateLimit call]

boolean isAvailable() {
    if (circuitOpen && Instant.now().isBefore(nextRetryTime)) { // <-- BUG 🔴 [NPE if circuitOpen is true but nextRetryTime is still null]
        return false;
    }
    // ...
}

void recordRateLimit(long retryAfterSeconds) {
    circuitOpen = true; // <-- BUG 🔴 [Visibility race: other threads see circuitOpen=true before nextRetryTime is set]
    if (retryAfterSeconds > 0) {
        nextRetryTime = Instant.now().plusSeconds(retryAfterSeconds);
        return;
    }
    // ...
}

Evidence

  • Reproduction Test Results: A multi-threaded test case concurrently calling recordRateLimit() and isAvailable() on a fresh ProviderCircuitState instance consistently triggers the following exception:
    java.lang.NullPointerException: Cannot invoke "java.time.Instant.isBefore(java.time.Instant)" because "this.nextRetryTime" is null
        at com.williamcallahan.javachat.service.ProviderCircuitState.isAvailable(ProviderCircuitState.java:33)
    
  • Logical Trace:
    1. The ProviderCircuitState is instantiated with nextRetryTime as null.
    2. Thread A receives a rate-limit error and calls recordRateLimit(10).
    3. Thread A executes circuitOpen = true. Since circuitOpen is volatile, this change is immediately visible to other CPUs.
    4. Thread B concurrently calls isAvailable().
    5. Thread B reads circuitOpen == true and proceeds to evaluate Instant.now().isBefore(nextRetryTime).
    6. Because Thread A has not yet reached the line assigning a value to nextRetryTime, Thread B reads null.
    7. Instant.isBefore(null) throws a NullPointerException.

Why has this bug gone undetected?

This bug is extremely transient and only manifests the very first time a specific provider is rate-limited during the application's lifecycle. Once nextRetryTime is assigned any non-null value, it is never reset to null, meaning subsequent rate-limit events will not trigger the NPE. Furthermore, the race window is limited to the tiny gap between two volatile writes, making it difficult to hit without significant concurrent load at the exact moment of the first provider failure.

Recommended fix

Initialize nextRetryTime to a safe default and update the state in an order that ensures consistency for readers.

    ProviderCircuitState(int maxBackoffMultiplier) {
        this.nextRetryTime = Instant.EPOCH; // <-- FIX 🟢 [Initialize to prevent null reads]
        // ...
    }

    void recordRateLimit(long retryAfterSeconds) {
        Instant retryTime;
        if (retryAfterSeconds > 0) {
            retryTime = Instant.now().plusSeconds(retryAfterSeconds);
        } else {
            backoffMultiplier = Math.min(backoffMultiplier * 2, maxBackoffMultiplier);
            retryTime = Instant.now().plusSeconds(backoffMultiplier);
        }
        this.nextRetryTime = retryTime; 
        this.circuitOpen = true; // <-- FIX 🟢 [Set circuitOpen LAST to ensure nextRetryTime is visible]
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdetail

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions