Skip to content

[fluss-server] Fix replicasOnOffline deadlock in addFetcherForReplicas #3011

@platinumhamburg

Description

@platinumhamburg

Search before asking

  • I searched in the issues and found nothing similar.

Fluss version

0.9.0 (latest release)

Please describe the bug 🐞

When a tablet server receives a NotifyLeaderAndIsr request for a bucket with no elected leader (leaderId == null), the original code threw a STORAGE_EXCEPTION. The coordinator then marked that replica as offline via replicasOnOffline. Since isReplicaOnline() excludes replicas in the offline set, subsequent elections could never select these replicas as leader — causing a permanent no-leader state.

Solution

Replace the error branch with a guard condition (leaderId != null && leaderId >= 0). When no valid leader exists, skip fetcher setup silently and let the next LeaderAndIsr notification recover naturally.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions