Skip to content

Document bpfKubeProxyHealthzPort conflict when kube-proxy cannot be disabled#2575

Open
tomastigera wants to merge 3 commits intotigera:mainfrom
tomastigera:tomas-bpf-kp-healthz-port-conflict
Open

Document bpfKubeProxyHealthzPort conflict when kube-proxy cannot be disabled#2575
tomastigera wants to merge 3 commits intotigera:mainfrom
tomastigera:tomas-bpf-kp-healthz-port-conflict

Conversation

@tomastigera
Copy link
Contributor

Summary

  • Document how to avoid port 10256 conflict between Calico's BPF kube-proxy replacement health check server and the Kubernetes kube-proxy, when kube-proxy cannot be disabled (e.g. AKS).
  • For released versions (calico 3.30/3.31, CE 3.23-1): change bpfKubeProxyHealthzPort to an unused port (e.g. 10258).
  • For unreleased versions (calico/next, CE/next): set bpfKubeProxyHealthzPort to 0 to disable the health check server entirely (requires Allow disabling BPF kube-proxy health check by setting port to 0 projectcalico/calico#12033).

Question: Does Calico Cloud expose the bpfKubeProxyHealthzPort FelixConfiguration field? If so, the same guidance should be added to the Calico Cloud docs. If not, there is nothing users can do about the port conflict in Cloud today.

Test plan

  • Verify docs render correctly on all affected versions

🤖 Generated with Claude Code

tomastigera and others added 3 commits March 6, 2026 12:44
MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port
4789, which conflicts with Calico's VXLAN in eBPF flow mode (BTF/kernel
v5.8+). The flow-mode device acts as a catch-all and the kernel rejects
it with EADDRINUSE. The fix is to change the VXLAN port (e.g. to 4790)
before enabling eBPF.

Changes across Calico OSS (next + 3.31) and Enterprise (next + 3.23-1):
- enabling-ebpf.mdx: prerequisite section to change VXLAN port on MKE
- install.mdx: caution admonition in the MKE tab
- troubleshoot-ebpf.mdx: diagnosis and fix for VXLAN device DOWN

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unsupported {#heading-id} syntax from troubleshoot headings
  (Docusaurus auto-generates the slug from the heading text)
- Clarify BTF wording: "when BTF is available (typically v5.8+)" instead
  of "with BTF support (v5.8+)" to avoid conflating BTF with kernel version
- Add "(run on the node via SSH)" note and quote $ns in the Docker netns
  diagnosis loop to clarify execution context

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When kube-proxy cannot be disabled (e.g. AKS), the BPF kube-proxy
replacement's health check server conflicts with Kubernetes kube-proxy
on port 10256.

For released versions (calico 3.30/3.31, CE 3.23-1): change the port
to an unused value (e.g. 10258).

For unreleased versions (calico/next, CE/next): set the port to 0 to
disable the health check server entirely (requires projectcalico/calico#12033).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 00:25
@tomastigera tomastigera requested a review from a team as a code owner March 8, 2026 00:25
@netlify
Copy link

netlify bot commented Mar 8, 2026

Deploy Preview for calico-docs-preview-next ready!

Name Link
🔨 Latest commit 53bf550
🔍 Latest deploy log https://app.netlify.com/projects/calico-docs-preview-next/deploys/69acc1eae1a9870008840b19
😎 Deploy Preview https://deploy-preview-2575--calico-docs-preview-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Mar 8, 2026

Deploy Preview succeeded!

Built without sensitive environment variables

Name Link
🔨 Latest commit 53bf550
🔍 Latest deploy log https://app.netlify.com/projects/tigera/deploys/69acc1eb5fd6ab000889666f
😎 Deploy Preview https://deploy-preview-2575--tigera.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 68 (no change from production)
Accessibility: 98 (no change from production)
Best Practices: 92 (no change from production)
SEO: 100 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documents operational workarounds for port conflicts encountered when running Calico in eBPF mode—specifically the bpfKubeProxyHealthzPort conflict with Kubernetes kube-proxy (10256), and an additional MKE-specific VXLAN port (4789) conflict scenario.

Changes:

  • Add guidance to avoid bpfKubeProxyHealthzPort (10256) conflicts when kube-proxy cannot be disabled (released docs: move Felix healthz to an unused port; next docs: set to 0).
  • Add MKE-specific documentation for VXLAN port 4789 conflicts in eBPF “flow mode”, including a workaround to change vxlanPort.
  • Duplicate the above guidance across OSS/Enterprise and relevant versioned/unversioned doc sets.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx Adds MKE VXLAN 4789 conflict troubleshooting section.
calico_versioned_docs/version-3.31/operations/ebpf/install.mdx Adds MKE VXLAN port conflict caution + vxlanPort patch instruction.
calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx Adds kube-proxy healthz port conflict workaround + MKE VXLAN guidance.
calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx Adds kube-proxy healthz port conflict workaround (use unused port).
calico/operations/ebpf/troubleshoot-ebpf.mdx Adds MKE VXLAN 4789 conflict troubleshooting section.
calico/operations/ebpf/install.mdx Adds MKE VXLAN port conflict caution + vxlanPort patch instruction.
calico/operations/ebpf/enabling-ebpf.mdx Adds kube-proxy healthz port conflict workaround (set to 0) + MKE VXLAN guidance.
calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx Adds MKE VXLAN 4789 conflict troubleshooting section.
calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx Adds MKE VXLAN port conflict caution + vxlanPort patch instruction.
calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx Adds kube-proxy healthz port conflict workaround + MKE VXLAN guidance.
calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx Adds MKE VXLAN 4789 conflict troubleshooting section.
calico-enterprise/operations/ebpf/install.mdx Adds MKE VXLAN port conflict caution + vxlanPort patch instruction.
calico-enterprise/operations/ebpf/enabling-ebpf.mdx Adds kube-proxy healthz port conflict workaround (set to 0) + MKE VXLAN guidance.

Comment on lines +206 to +214
:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VXLAN port patch command appears in the "Create a suitable cluster" section, before the guide installs the Tigera Operator / Calico CRDs. At this point, the FelixConfiguration CRD (and the default resource) may not exist yet, so kubectl patch felixconfiguration default ... will fail. Consider moving this step to after Calico is installed (or instruct users to add/apply a FelixConfiguration manifest alongside custom-resources.yaml) but still before switching to eBPF mode.

Copilot uses AI. Check for mistakes.
Comment on lines +206 to +214
:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VXLAN port patch command is placed before the guide installs the Tigera Operator / Calico CRDs. At this stage the FelixConfiguration CRD (and default FelixConfiguration) may not exist yet, so this kubectl patch felixconfiguration ... step can fail. Please move it to a point after Calico CRDs/resources are present (but still before enabling eBPF), or change the instructions to apply a FelixConfiguration manifest as part of the install.

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +185
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This VXLAN port patch is introduced before the guide installs the Tigera Operator / Calico CRDs. If the FelixConfiguration CRD or the default resource is not present yet, kubectl patch felixconfiguration ... will fail. Please move this instruction to a point after Calico CRDs/resources are installed (but still before enabling eBPF), or have users apply a FelixConfiguration manifest as part of the install.

Suggested change
You must change the VXLAN port before enabling eBPF:
```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
You must configure $[prodname] to use a different VXLAN port after installing the Tigera Operator and Calico CRDs, but before enabling eBPF. For example:
```bash
cat <<EOF | kubectl apply -f -
apiVersion: crd.projectcalico.org/v1
kind: FelixConfiguration
metadata:
name: default
spec:
vxlanPort: 4790
EOF

Copilot uses AI. Check for mistakes.
Comment on lines +329 to +333
You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation.

```
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}'
```
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This claims that setting bpfKubeProxyHealthzPort to 0 disables Felix's embedded health check server. However, the generated Felix configuration reference (calico/_includes/components/FelixConfig/config-params.json) documents this field only as an integer port (default 10256) and does not mention 0 having special meaning. Please either (a) add a version/feature availability note and link to the release that implements 0=disabled, or (b) keep the guidance consistent with released docs by choosing an unused port instead of 0.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +191
You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation.

```bash
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}'
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states that bpfKubeProxyHealthzPort: 0 disables the embedded health check server, but the generated Felix config reference (calico-enterprise/_includes/components/FelixConfig/config-params.json) currently documents this parameter only as an integer port and does not mention 0 as a disable value. Please add a version/feature gate note (and link) for when 0 is supported, or change the guidance to selecting an unused port (as in the versioned docs).

Suggested change
You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation.
```bash
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}'
You should also set `bpfKubeProxyHealthzPort` to an unused port to avoid conflicts with the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation; the new port value is only to avoid the conflict. For example:
```bash
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 11256}}'

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomastigera Is this valid?


If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note "10257 may be used by containerd" appears incorrect (10257 is typically the kube-controller-manager secure port). Please adjust/remove this to avoid steering users toward a port that may actually be in use on many clusters, and prefer guidance to verify node port usage before choosing an alternative.

Suggested change
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258), and verify that it is free on all nodes before applying this change.

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +186
:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This VXLAN port patch is in the pre-install portion of the guide, before Tigera Operator / Calico CRDs are installed. kubectl patch felixconfiguration default ... can fail if the FelixConfiguration CRD or the default resource doesn't exist yet. Consider moving this to after Calico installation (or instruct users to apply a FelixConfiguration manifest during install) but before enabling eBPF.

Copilot uses AI. Check for mistakes.

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note "10257 may be used by containerd" is likely inaccurate and could mislead users picking a port. Port 10257 is commonly used by the Kubernetes controller-manager (secure port), not containerd. Suggest removing the containerd reference and either avoid naming specific ports beyond 10256, or mention the typical Kubernetes component reservations (10257/10259) and instruct users to verify with ss/netstat on their nodes.

Suggested change
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258); note that other Kubernetes control plane components commonly use ports such as 10257 and 10259, so verify an unused port on your nodes with tools like `ss` or `netstat` before selecting one.

Copilot uses AI. Check for mistakes.

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note "10257 may be used by containerd" is likely wrong (10257 is commonly reserved by kube-controller-manager). Please update/remove this example to avoid recommending a potentially in-use port; consider advising users to check node listeners and/or note the typical Kubernetes reserved ports (10257/10259).

Suggested change
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd).
You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258), avoid typical Kubernetes reserved ports such as 10257 and 10259, and verify port availability by checking which ports are currently in use on your nodes.

Copilot uses AI. Check for mistakes.
Comment on lines +399 to +407
## MKE: VXLAN device DOWN

On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing:

```
Failed to set tunnel device up error=address already in use
```

This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/title focus on documenting bpfKubeProxyHealthzPort/kube-proxy port 10256 conflicts, but this PR also adds a new MKE-specific troubleshooting/install section about a VXLAN port 4789 conflict. Please update the PR description to reflect this additional scope (or split into a separate PR) so reviewers can validate both topics appropriately.

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +191
You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation.

```bash
kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomastigera Is this valid?

kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}'
```

### MKE: Change the VXLAN port before enabling eBPF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR includes changes from the other PR #2574 . Can you separate them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants