Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions calico-enterprise/operations/ebpf/enabling-ebpf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,32 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

### MKE: Change the VXLAN port before enabling eBPF

:::caution

MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces.
When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in
flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's)
already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors.

**You must change the VXLAN port before enabling eBPF on MKE clusters:**

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
```

Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding.
Ensure that the chosen UDP port is allowed by your underlying network between all nodes.

:::

### Enable eBPF mode

To enable eBPF mode, change the `spec.calicoNetwork.linuxDataplane` parameter in the operator's `Installation`
Expand Down
15 changes: 15 additions & 0 deletions calico-enterprise/operations/ebpf/install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas

More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html)

:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.
See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details.

:::

</TabItem>

</Tabs>
Expand Down
49 changes: 49 additions & 0 deletions calico-enterprise/operations/ebpf/troubleshoot-ebpf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it.

```

## MKE: VXLAN device DOWN

On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing:

```
Failed to set tunnel device up error=address already in use
```

This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789.

In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port.

**Diagnosis:**

```bash
# Confirm the VXLAN device is DOWN
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico

# Check what holds port 4789 (shows a kernel-owned socket with no process name)
kubectl exec -n calico-system <calico-node-pod> -- ss -ulnp sport = :4789

# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH)
for ns in /run/docker/netns/*; do
echo "=== $(basename "$ns") ==="
nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null
done
```

**Fix:** Change Calico's VXLAN port to avoid the conflict:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
# Should show: vxlan external ... dstport 4790 ... state UP
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.

:::tip

For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption.

:::

## Debug high CPU usage

If you notice `$[noderunning]` using high CPU:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,32 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

### MKE: Change the VXLAN port before enabling eBPF

:::caution

MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces.
When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in
flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's)
already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors.

**You must change the VXLAN port before enabling eBPF on MKE clusters:**

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
```

Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding.
Ensure that the chosen UDP port is allowed by your underlying network between all nodes.

:::

### Enable eBPF mode

To enable eBPF mode, change the `spec.calicoNetwork.linuxDataplane` parameter in the operator's `Installation`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas

More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html)

:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.
See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details.

:::

</TabItem>

</Tabs>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it.

```

## MKE: VXLAN device DOWN

On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing:

```
Failed to set tunnel device up error=address already in use
```

This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789.

In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port.

**Diagnosis:**

```bash
# Confirm the VXLAN device is DOWN
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico

# Check what holds port 4789 (shows a kernel-owned socket with no process name)
kubectl exec -n calico-system <calico-node-pod> -- ss -ulnp sport = :4789

# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH)
for ns in /run/docker/netns/*; do
echo "=== $(basename "$ns") ==="
nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null
done
```

**Fix:** Change Calico's VXLAN port to avoid the conflict:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
# Should show: vxlan external ... dstport 4790 ... state UP
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.

:::tip

For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption.

:::

## Debug high CPU usage

If you notice `$[noderunning]` using high CPU:
Expand Down
26 changes: 26 additions & 0 deletions calico/operations/ebpf/enabling-ebpf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,32 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

### MKE: Change the VXLAN port before enabling eBPF

:::caution

MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces.
When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in
flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's)
already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors.

**You must change the VXLAN port before enabling eBPF on MKE clusters:**

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
```

Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding.
Ensure that the chosen UDP port is allowed by your underlying network between all nodes.

:::

### Enable eBPF mode

**The next step depends on whether you installed $[prodname] using the operator, or a manifest:**
Expand Down
15 changes: 15 additions & 0 deletions calico/operations/ebpf/install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas

More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html)

:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.
See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details.

:::

</TabItem>

</Tabs>
Expand Down
49 changes: 49 additions & 0 deletions calico/operations/ebpf/troubleshoot-ebpf.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,55 @@ e2e` command. The command resets the profiling data after dumping it.

```

## MKE: VXLAN device DOWN

On MKE clusters, after enabling eBPF mode, the `vxlan.calico` device may stay DOWN with Felix logs showing:

```
Failed to set tunnel device up error=address already in use
```

This happens because MKE's Docker Swarm overlay networking creates VXLAN devices on UDP port 4789 inside Docker network namespaces (`/run/docker/netns/`). In eBPF mode, when BTF is available on the node (typically kernel v5.8+), Felix creates `vxlan.calico` in flow mode (`external`), which acts as a catch-all on its UDP port. The kernel rejects this because Docker Swarm's VXLAN device already holds port 4789.

In iptables mode this conflict doesn't occur because Calico's VXLAN device binds with a specific VNI (4096), which can coexist with Docker Swarm's VXLAN on the same port.

**Diagnosis:**

```bash
# Confirm the VXLAN device is DOWN
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico

# Check what holds port 4789 (shows a kernel-owned socket with no process name)
kubectl exec -n calico-system <calico-node-pod> -- ss -ulnp sport = :4789

# Find Docker Swarm VXLAN devices in Docker network namespaces (run on the node via SSH)
for ns in /run/docker/netns/*; do
echo "=== $(basename "$ns") ==="
nsenter --net="$ns" ip -d link show type vxlan 2>/dev/null
done
```

**Fix:** Change Calico's VXLAN port to avoid the conflict:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Felix will recreate `vxlan.calico` on the new port within seconds. Verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
# Should show: vxlan external ... dstport 4790 ... state UP
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.

:::tip

For MKE clusters, set the VXLAN port **before** enabling eBPF mode to avoid any disruption.

:::

## Debug high CPU usage

If you notice `$[noderunning]` using high CPU:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,32 @@ kubectl patch felixconfiguration default --patch='{"spec": {"bpfKubeProxyIptable

If both `kube-proxy` and `BPFKubeProxyIptablesCleanupEnabled` is enabled then `kube-proxy` will write its iptables rules and Felix will try to clean them up resulting in iptables flapping between the two.

### MKE: Change the VXLAN port before enabling eBPF

:::caution

MKE uses Docker Swarm overlay networking, which creates VXLAN devices on UDP port 4789 inside Docker network namespaces.
When $[prodname] switches to eBPF mode on kernels where BTF is available (typically v5.8+), Felix creates the `vxlan.calico` device in
flow mode, which acts as a catch-all on its UDP port. The kernel rejects this when another VXLAN device (Docker Swarm's)
already holds the same port, causing `vxlan.calico` to stay DOWN with `address already in use` errors.

**You must change the VXLAN port before enabling eBPF on MKE clusters:**

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Wait for all calico-node pods to recreate the VXLAN device on the new port, then verify on each node:

```bash
kubectl exec -n calico-system <calico-node-pod> -- ip -d link show vxlan.calico
```

Confirm the device shows `dstport 4790` (or your chosen port) and is UP before proceeding.
Ensure that the chosen UDP port is allowed by your underlying network between all nodes.

:::

### Enable eBPF mode

**The next step depends on whether you installed $[prodname] using the operator, or a manifest:**
Expand Down
15 changes: 15 additions & 0 deletions calico_versioned_docs/version-3.31/operations/ebpf/install.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,21 @@ can do that by setting `--kube-proxy-mode=disabled` and `--kube-default-drop-mas

More details can be found in [the MKE documentation](https://docs.mirantis.com/mke/current/install/predeployment/configure-networking/cluster-service-networking-options.html)

:::caution VXLAN port conflict

MKE's Docker Swarm overlay networking uses UDP port 4789 for its own VXLAN devices. In eBPF mode, when BTF is available on the node (typically kernel v5.8+),
$[prodname] creates the `vxlan.calico` device in flow mode, which conflicts with Docker Swarm's use of the same port.
You must change the VXLAN port before enabling eBPF:

```bash
kubectl patch felixconfiguration default --type merge -p '{"spec":{"vxlanPort":4790}}'
```

Ensure the chosen UDP port is allowed by your underlying network between all nodes.
See [Troubleshoot eBPF mode](troubleshoot-ebpf.mdx#mke-vxlan-device-down) for more details.

:::

</TabItem>

</Tabs>
Expand Down
Loading
Loading