Document bpfKubeProxyHealthzPort conflict when kube-proxy cannot be disabled#2575
Document bpfKubeProxyHealthzPort conflict when kube-proxy cannot be disabled#2575tomastigera wants to merge 3 commits intotigera:mainfrom
Conversation
MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789, which conflicts with Calico's VXLAN in eBPF flow mode (BTF/kernel v5.8+). The flow-mode device acts as a catch-all and the kernel rejects it with EADDRINUSE. The fix is to change the VXLAN port (e.g. to 4790) before enabling eBPF. Changes across Calico OSS (next + 3.31) and Enterprise (next + 3.23-1): - enabling-ebpf.mdx: prerequisite section to change VXLAN port on MKE - install.mdx: caution admonition in the MKE tab - troubleshoot-ebpf.mdx: diagnosis and fix for VXLAN device DOWN Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unsupported {#heading-id} syntax from troubleshoot headings
(Docusaurus auto-generates the slug from the heading text)
- Clarify BTF wording: "when BTF is available (typically v5.8+)" instead
of "with BTF support (v5.8+)" to avoid conflating BTF with kernel version
- Add "(run on the node via SSH)" note and quote $ns in the Docker netns
diagnosis loop to clarify execution context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When kube-proxy cannot be disabled (e.g. AKS), the BPF kube-proxy replacement's health check server conflicts with Kubernetes kube-proxy on port 10256. For released versions (calico 3.30/3.31, CE 3.23-1): change the port to an unused value (e.g. 10258). For unreleased versions (calico/next, CE/next): set the port to 0 to disable the health check server entirely (requires projectcalico/calico#12033). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
✅ Deploy Preview for calico-docs-preview-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview succeeded!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Documents operational workarounds for port conflicts encountered when running Calico in eBPF mode—specifically the bpfKubeProxyHealthzPort conflict with Kubernetes kube-proxy (10256), and an additional MKE-specific VXLAN port (4789) conflict scenario.
Changes:
- Add guidance to avoid
bpfKubeProxyHealthzPort(10256) conflicts whenkube-proxycannot be disabled (released docs: move Felix healthz to an unused port; next docs: set to0). - Add MKE-specific documentation for VXLAN port 4789 conflicts in eBPF “flow mode”, including a workaround to change
vxlanPort. - Duplicate the above guidance across OSS/Enterprise and relevant versioned/unversioned doc sets.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx | Adds MKE VXLAN 4789 conflict troubleshooting section. |
| calico_versioned_docs/version-3.31/operations/ebpf/install.mdx | Adds MKE VXLAN port conflict caution + vxlanPort patch instruction. |
| calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx | Adds kube-proxy healthz port conflict workaround + MKE VXLAN guidance. |
| calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx | Adds kube-proxy healthz port conflict workaround (use unused port). |
| calico/operations/ebpf/troubleshoot-ebpf.mdx | Adds MKE VXLAN 4789 conflict troubleshooting section. |
| calico/operations/ebpf/install.mdx | Adds MKE VXLAN port conflict caution + vxlanPort patch instruction. |
| calico/operations/ebpf/enabling-ebpf.mdx | Adds kube-proxy healthz port conflict workaround (set to 0) + MKE VXLAN guidance. |
| calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx | Adds MKE VXLAN 4789 conflict troubleshooting section. |
| calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx | Adds MKE VXLAN port conflict caution + vxlanPort patch instruction. |
| calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx | Adds kube-proxy healthz port conflict workaround + MKE VXLAN guidance. |
| calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx | Adds MKE VXLAN 4789 conflict troubleshooting section. |
| calico-enterprise/operations/ebpf/install.mdx | Adds MKE VXLAN port conflict caution + vxlanPort patch instruction. |
| calico-enterprise/operations/ebpf/enabling-ebpf.mdx | Adds kube-proxy healthz port conflict workaround (set to 0) + MKE VXLAN guidance. |
| :::caution VXLAN port conflict | ||
|
|
||
| MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), | ||
| $[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. | ||
| You must change the VXLAN port before enabling eBPF: | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' | ||
| ``` |
There was a problem hiding this comment.
The VXLAN port patch command appears in the "Create a suitable cluster" section, before the guide installs the Tigera Operator / Calico CRDs. At this point, the FelixConfiguration CRD (and the default resource) may not exist yet, so kubectl patch felixconfiguration default ... will fail. Consider moving this step to after Calico is installed (or instruct users to add/apply a FelixConfiguration manifest alongside custom-resources.yaml) but still before switching to eBPF mode.
| :::caution VXLAN port conflict | ||
|
|
||
| MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), | ||
| $[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. | ||
| You must change the VXLAN port before enabling eBPF: | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' | ||
| ``` |
There was a problem hiding this comment.
The VXLAN port patch command is placed before the guide installs the Tigera Operator / Calico CRDs. At this stage the FelixConfiguration CRD (and default FelixConfiguration) may not exist yet, so this kubectl patch felixconfiguration ... step can fail. Please move it to a point after Calico CRDs/resources are present (but still before enabling eBPF), or change the instructions to apply a FelixConfiguration manifest as part of the install.
| You must change the VXLAN port before enabling eBPF: | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' |
There was a problem hiding this comment.
This VXLAN port patch is introduced before the guide installs the Tigera Operator / Calico CRDs. If the FelixConfiguration CRD or the default resource is not present yet, kubectl patch felixconfiguration ... will fail. Please move this instruction to a point after Calico CRDs/resources are installed (but still before enabling eBPF), or have users apply a FelixConfiguration manifest as part of the install.
| You must change the VXLAN port before enabling eBPF: | |
| ```bash | |
| kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' | |
| You must configure $[prodname] to use a different VXLAN port after installing the Tigera Operator and Calico CRDs, but before enabling eBPF. For example: | |
| ```bash | |
| cat <<EOF | kubectl apply -f - | |
| apiVersion: crd.projectcalico.org/v1 | |
| kind: FelixConfiguration | |
| metadata: | |
| name: default | |
| spec: | |
| vxlanPort: 4790 | |
| EOF |
| You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. | ||
|
|
||
| ``` | ||
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' | ||
| ``` |
There was a problem hiding this comment.
This claims that setting bpfKubeProxyHealthzPort to 0 disables Felix's embedded health check server. However, the generated Felix configuration reference (calico/_includes/components/FelixConfig/config-params.json) documents this field only as an integer port (default 10256) and does not mention 0 having special meaning. Please either (a) add a version/feature availability note and link to the release that implements 0=disabled, or (b) keep the guidance consistent with released docs by choosing an unused port instead of 0.
| You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' |
There was a problem hiding this comment.
This states that bpfKubeProxyHealthzPort: 0 disables the embedded health check server, but the generated Felix config reference (calico-enterprise/_includes/components/FelixConfig/config-params.json) currently documents this parameter only as an integer port and does not mention 0 as a disable value. Please add a version/feature gate note (and link) for when 0 is supported, or change the guidance to selecting an unused port (as in the versioned docs).
| You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. | |
| ```bash | |
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' | |
| You should also set `bpfKubeProxyHealthzPort` to an unused port to avoid conflicts with the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation; the new port value is only to avoid the conflict. For example: | |
| ```bash | |
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 11256}}' |
|
|
||
| If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. | ||
|
|
||
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). |
There was a problem hiding this comment.
The note "10257 may be used by containerd" appears incorrect (10257 is typically the kube-controller-manager secure port). Please adjust/remove this to avoid steering users toward a port that may actually be in use on many clusters, and prefer guidance to verify node port usage before choosing an alternative.
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). | |
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258), and verify that it is free on all nodes before applying this change. |
| :::caution VXLAN port conflict | ||
|
|
||
| MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), | ||
| $[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. | ||
| You must change the VXLAN port before enabling eBPF: | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' | ||
| ``` |
There was a problem hiding this comment.
This VXLAN port patch is in the pre-install portion of the guide, before Tigera Operator / Calico CRDs are installed. kubectl patch felixconfiguration default ... can fail if the FelixConfiguration CRD or the default resource doesn't exist yet. Consider moving this to after Calico installation (or instruct users to apply a FelixConfiguration manifest during install) but before enabling eBPF.
|
|
||
| If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. | ||
|
|
||
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). |
There was a problem hiding this comment.
The note "10257 may be used by containerd" is likely inaccurate and could mislead users picking a port. Port 10257 is commonly used by the Kubernetes controller-manager (secure port), not containerd. Suggest removing the containerd reference and either avoid naming specific ports beyond 10256, or mention the typical Kubernetes component reservations (10257/10259) and instruct users to verify with ss/netstat on their nodes.
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). | |
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258); note that other Kubernetes control plane components commonly use ports such as 10257 and 10259, so verify an unused port on your nodes with tools like `ss` or `netstat` before selecting one. |
|
|
||
| If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. | ||
|
|
||
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). |
There was a problem hiding this comment.
The note "10257 may be used by containerd" is likely wrong (10257 is commonly reserved by kube-controller-manager). Please update/remove this example to avoid recommending a potentially in-use port; consider advising users to check node listeners and/or note the typical Kubernetes reserved ports (10257/10259).
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). | |
| You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258), avoid typical Kubernetes reserved ports such as 10257 and 10259, and verify port availability by checking which ports are currently in use on your nodes. |
| ## MKE: VXLAN device DOWN | ||
|
|
||
| On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing: | ||
|
|
||
| ``` | ||
| Failed to set tunnel device up error=address already in use | ||
| ``` | ||
|
|
||
| This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789. |
There was a problem hiding this comment.
PR description/title focus on documenting bpfKubeProxyHealthzPort/kube-proxy port 10256 conflicts, but this PR also adds a new MKE-specific troubleshooting/install section about a VXLAN port 4789 conflict. Please update the PR description to reflect this additional scope (or split into a separate PR) so reviewers can validate both topics appropriately.
| You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. | ||
|
|
||
| ```bash | ||
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' |
| kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' | ||
| ``` | ||
|
|
||
| ### MKE: Change the VXLAN port before enabling eBPF |
There was a problem hiding this comment.
I think this PR includes changes from the other PR #2574 . Can you separate them?

Summary
bpfKubeProxyHealthzPortto an unused port (e.g. 10258).bpfKubeProxyHealthzPortto0to disable the health check server entirely (requires Allow disabling BPF kube-proxy health check by setting port to 0 projectcalico/calico#12033).Question: Does Calico Cloud expose the
bpfKubeProxyHealthzPortFelixConfiguration field? If so, the same guidance should be added to the Calico Cloud docs. If not, there is nothing users can do about the port conflict in Cloud today.Test plan
🤖 Generated with Claude Code