diff --git a/calico-enterprise/operations/ebpf/enabling-ebpf.mdx b/calico-enterprise/operations/ebpf/enabling-ebpf.mdx index 418509bc85..ae6a4eb430 100644 --- a/calico-enterprise/operations/ebpf/enabling-ebpf.mdx +++ b/calico-enterprise/operations/ebpf/enabling-ebpf.mdx @@ -185,6 +185,38 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. +You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. + +```bash +kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' +``` + +### MKE: Change the VXLAN port before enabling eBPF + +:::caution + +MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces. +When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in +flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's) +already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors. + +**You must change the VXLAN port before enabling eBPF on MKE clusters:** + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +``` + +Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding. +Ensure that the chosen UDP port is allowed by your underlying network between all nodes. + +::: + ### Enable eBPF mode To enable eBPF mode, change the `spec.calicoNetwork.linuxDataplane` parameter in the operator's `Installation` diff --git a/calico-enterprise/operations/ebpf/install.mdx b/calico-enterprise/operations/ebpf/install.mdx index 96984ed807..26b763d9fa 100644 --- a/calico-enterprise/operations/ebpf/install.mdx +++ b/calico-enterprise/operations/ebpf/install.mdx @@ -175,6 +175,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html) +:::caution VXLAN port conflict + +MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), +$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. +You must change the VXLAN port before enabling eBPF: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. +See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details. + +::: + diff --git a/calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx b/calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx index b8850f1028..c68010958c 100644 --- a/calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx +++ b/calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it. ``` +## MKE: VXLAN device DOWN + +On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing: + +``` +Failed to set tunnel device up error=address already in use +``` + +This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789. + +In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port. + +**Diagnosis:** + +```bash +# Confirm the VXLAN device is DOWN +kubectl exec -n calico-system -- ip -d link show vxlan.calico + +# Check what holds port 4789 (shows a kernel-owned socket with no process name) +kubectl exec -n calico-system -- ss -ulnp sport = :4789 + +# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH) +for ns in /run/docker/netns/*; do + echo "=== $(basename "$ns") ===" + nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null +done +``` + +**Fix:** Change Calico's VXLAN port to avoid the conflict: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +# Should show: vxlan external ... dstport 4790 ... state UP +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. + +:::tip + +For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption. + +::: + ## Debug high CPU usage If you notice `$[noderunning]` using high CPU: diff --git a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx index b409a20f0f..ff30791583 100644 --- a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx +++ b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/enabling-ebpf.mdx @@ -185,6 +185,38 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. +You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). + +```bash +kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 10258}}' +``` + +### MKE: Change the VXLAN port before enabling eBPF + +:::caution + +MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces. +When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in +flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's) +already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors. + +**You must change the VXLAN port before enabling eBPF on MKE clusters:** + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +``` + +Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding. +Ensure that the chosen UDP port is allowed by your underlying network between all nodes. + +::: + ### Enable eBPF mode To enable eBPF mode, change the `spec.calicoNetwork.linuxDataplane` parameter in the operator's `Installation` diff --git a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx index 631dff33d5..3fb6d9ae20 100644 --- a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx +++ b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/install.mdx @@ -175,6 +175,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html) +:::caution VXLAN port conflict + +MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), +$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. +You must change the VXLAN port before enabling eBPF: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. +See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details. + +::: + diff --git a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx index b8850f1028..c68010958c 100644 --- a/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx +++ b/calico-enterprise_versioned_docs/version-3.23-1/operations/ebpf/troubleshoot-ebpf.mdx @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it. ``` +## MKE: VXLAN device DOWN + +On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing: + +``` +Failed to set tunnel device up error=address already in use +``` + +This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789. + +In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port. + +**Diagnosis:** + +```bash +# Confirm the VXLAN device is DOWN +kubectl exec -n calico-system -- ip -d link show vxlan.calico + +# Check what holds port 4789 (shows a kernel-owned socket with no process name) +kubectl exec -n calico-system -- ss -ulnp sport = :4789 + +# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH) +for ns in /run/docker/netns/*; do + echo "=== $(basename "$ns") ===" + nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null +done +``` + +**Fix:** Change Calico's VXLAN port to avoid the conflict: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +# Should show: vxlan external ... dstport 4790 ... state UP +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. + +:::tip + +For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption. + +::: + ## Debug high CPU usage If you notice `$[noderunning]` using high CPU: diff --git a/calico/operations/ebpf/enabling-ebpf.mdx b/calico/operations/ebpf/enabling-ebpf.mdx index 406e5d4461..184d50f6c8 100644 --- a/calico/operations/ebpf/enabling-ebpf.mdx +++ b/calico/operations/ebpf/enabling-ebpf.mdx @@ -326,6 +326,38 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. +You should also set `bpfKubeProxyHealthzPort` to `0` to disable the health check server in $[prodname]'s BPF kube-proxy replacement, which by default binds to port 10256 and would conflict with the Kubernetes `kube-proxy` already running on the node. The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. + +``` +kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 0}}' +``` + +### MKE: Change the VXLAN port before enabling eBPF + +:::caution + +MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces. +When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in +flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's) +already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors. + +**You must change the VXLAN port before enabling eBPF on MKE clusters:** + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +``` + +Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding. +Ensure that the chosen UDP port is allowed by your underlying network between all nodes. + +::: + ### Enable eBPF mode **The next step depends on whether you installed $[prodname] using the operator, or a manifest:** diff --git a/calico/operations/ebpf/install.mdx b/calico/operations/ebpf/install.mdx index 6ac8643320..815cd431fa 100644 --- a/calico/operations/ebpf/install.mdx +++ b/calico/operations/ebpf/install.mdx @@ -203,6 +203,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html) +:::caution VXLAN port conflict + +MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), +$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. +You must change the VXLAN port before enabling eBPF: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. +See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details. + +::: + diff --git a/calico/operations/ebpf/troubleshoot-ebpf.mdx b/calico/operations/ebpf/troubleshoot-ebpf.mdx index b1be84f35b..aab1018ac5 100644 --- a/calico/operations/ebpf/troubleshoot-ebpf.mdx +++ b/calico/operations/ebpf/troubleshoot-ebpf.mdx @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it. ``` +## MKE: VXLAN device DOWN + +On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing: + +``` +Failed to set tunnel device up error=address already in use +``` + +This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789. + +In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port. + +**Diagnosis:** + +```bash +# Confirm the VXLAN device is DOWN +kubectl exec -n calico-system -- ip -d link show vxlan.calico + +# Check what holds port 4789 (shows a kernel-owned socket with no process name) +kubectl exec -n calico-system -- ss -ulnp sport = :4789 + +# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH) +for ns in /run/docker/netns/*; do + echo "=== $(basename "$ns") ===" + nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null +done +``` + +**Fix:** Change Calico's VXLAN port to avoid the conflict: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +# Should show: vxlan external ... dstport 4790 ... state UP +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. + +:::tip + +For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption. + +::: + ## Debug high CPU usage If you notice `$[noderunning]` using high CPU: diff --git a/calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx b/calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx index 7cd3187332..c0209e4653 100644 --- a/calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx +++ b/calico_versioned_docs/version-3.30/operations/ebpf/enabling-ebpf.mdx @@ -318,6 +318,12 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. +You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). + +``` +kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 10258}}' +``` + ### Enable eBPF mode **The next step depends on whether you installed $[prodname] using the operator, or a manifest:** diff --git a/calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx b/calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx index c92eed2ef0..e8fed61f92 100644 --- a/calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx +++ b/calico_versioned_docs/version-3.31/operations/ebpf/enabling-ebpf.mdx @@ -326,6 +326,38 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two. +You should also change `bpfKubeProxyHealthzPort` to an unused port to avoid conflicting with the Kubernetes `kube-proxy`'s default health check port (10256). The Kubernetes `kube-proxy` can serve the health check equally well, so there is no degradation. Changing the health check port of the Kubernetes `kube-proxy` is typically not possible on managed platforms such as AKS. Choose a port that is not already in use on your nodes (for example, 10258; note that 10257 may be used by containerd). + +``` +kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyHealthzPort": 10258}}' +``` + +### MKE: Change the VXLAN port before enabling eBPF + +:::caution + +MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces. +When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in +flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's) +already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors. + +**You must change the VXLAN port before enabling eBPF on MKE clusters:** + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +``` + +Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding. +Ensure that the chosen UDP port is allowed by your underlying network between all nodes. + +::: + ### Enable eBPF mode **The next step depends on whether you installed $[prodname] using the operator, or a manifest:** diff --git a/calico_versioned_docs/version-3.31/operations/ebpf/install.mdx b/calico_versioned_docs/version-3.31/operations/ebpf/install.mdx index 6210bf4d4c..2199f0f7ab 100644 --- a/calico_versioned_docs/version-3.31/operations/ebpf/install.mdx +++ b/calico_versioned_docs/version-3.31/operations/ebpf/install.mdx @@ -203,6 +203,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html) +:::caution VXLAN port conflict + +MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+), +$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port. +You must change the VXLAN port before enabling eBPF: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. +See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details. + +::: + diff --git a/calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx b/calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx index b1be84f35b..aab1018ac5 100644 --- a/calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx +++ b/calico_versioned_docs/version-3.31/operations/ebpf/troubleshoot-ebpf.mdx @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it. ``` +## MKE: VXLAN device DOWN + +On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing: + +``` +Failed to set tunnel device up error=address already in use +``` + +This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789. + +In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port. + +**Diagnosis:** + +```bash +# Confirm the VXLAN device is DOWN +kubectl exec -n calico-system -- ip -d link show vxlan.calico + +# Check what holds port 4789 (shows a kernel-owned socket with no process name) +kubectl exec -n calico-system -- ss -ulnp sport = :4789 + +# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH) +for ns in /run/docker/netns/*; do + echo "=== $(basename "$ns") ===" + nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null +done +``` + +**Fix:** Change Calico's VXLAN port to avoid the conflict: + +```bash +kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}' +``` + +Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node: + +```bash +kubectl exec -n calico-system -- ip -d link show vxlan.calico +# Should show: vxlan external ... dstport 4790 ... state UP +``` + +Ensure the chosen UDP port is allowed by your underlying network between all nodes. + +:::tip + +For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption. + +::: + ## Debug high CPU usage If you notice `$[noderunning]` using high CPU: