Skip to content

[Bug] FE Observer node fails to join cluster in cloud mode with Docker host network #61536

@zhangdong1015

Description

@zhangdong1015

Search before asking

  • I had searched in the issues and found no similar issues.

Version

Doris 4.0.3, 4.0.4-slim

What's Wrong?

In Doris cloud mode, when deploying FE cluster using Docker host network where each FE node uses a
different http_port, Observer nodes fail to join the cluster.

Error log:
WARN [Env.getFeNodeTypeAndNameFromHelpers():1520] failed to get fe node type from helper node:
HostInfo{host='127.0.0.1', port=9010}.
java.net.ConnectException: Connection refused
WARN [Env.getClusterIdAndRole():1342] current node HostInfo{host='127.0.0.1', port=9011} is not added to
the group. please add it first.

Root cause:

In Env.getFeNodeTypeAndNameFromHelpers() method:
String url = "http://" + NetUtils.getHostPortInAccessibleFormat(
helperNode.getHost(), // "127.0.0.1" (correct)
Config.http_port // Uses current node's http_port, NOT helper's!
) + "/role?host=...";

The code uses Config.http_port (current node's port) instead of the helper node's http_port. This assumes
all FE nodes use the same http_port, which fails when:

  • Using Docker host network mode (containers share host network, must use different ports)

What You Expected

FE-2 (Observer) should connect to FE-1's HTTP endpoint at port 8030 and join the cluster successfully.

The code should use the helper node's http_port when constructing the HTTP URL, not the current node's
Config.http_port.

What You Expected?

  1. FE-2 should successfully connect to FE-1's HTTP endpoint and join the cluster as an Observer node.
  2. The code should use the helper node's http_port when constructing the HTTP URL, not the current node's
    Config.http_port.

Current behavior (wrong):
FE-2 tries: http://127.0.0.1:8031/role (FE-2's own http_port, no service)
Should be: http://127.0.0.1:8030/role (FE-1's http_port, correct)

Suggested fix:

  • Store http_port in Meta Service when registering FE nodes
  • Or provide a way to specify helper node's http_port in the --helper parameter (e.g., --helper
    host:http_port:edit_log_port)

How to Reproduce?

  1. Deploy Doris cloud mode with Meta Service + FoundationDB
  2. Configure FE-1 (Master) with http_port=8030, edit_log_port=9010
  3. Configure FE-2 (Observer) with http_port=8031, edit_log_port=9011
  4. Start FE-1 successfully
  5. Start FE-2 with --helper 127.0.0.1:9010
  6. FE-2 fails to join with "Connection refused" error

Configuration example:

┌──────┬──────────┬───────────────┬───────────┐
│ Node │ Role │ edit_log_port │ http_port │
├──────┼──────────┼───────────────┼───────────┤
│ FE-1 │ Master │ 9010 │ 8030 │
├──────┼──────────┼───────────────┼───────────┤
│ FE-2 │ Observer │ 9011 │ 8031 │
└──────┴──────────┴───────────────┴───────────┘

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions