-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Please fill out all the sections below for bug issues, otherwise it'll be closed as it won't be actionable for us to address.
Describe the bug
When operating in environments with high pod churn rates—where pods are frequently created and terminated—and multiple services are configured to reference a single deployment, we have observed a critical issue with Host Network Service (HNS) endpoint lifecycle management. Specifically, HNS endpoints are not being properly cleaned up and remain in a stale state even after their associated pods have been successfully terminated.
The problem becomes particularly severe when Kubernetes' IP address management (IPAM) reassigns an IP address that still has a stale HNS endpoint attached to it. When a new pod is scheduled and receives this recycled IP address, the presence of the stale endpoint configuration causes a conflict in the network stack. This results in complete network connectivity failure for the newly created pod, manifesting as DNS
resolution timeouts and inability to establish network connections. The pod appears healthy from a Kubernetes perspective but is effectively isolated from the network, unable to communicate with other services or resolve domain names.
To Reproduce
Steps to reproduce the behavior:
- Deploy kubernetes cluster with limited available IPs for pods on both Linux and Windows node.
- Deploy Linux pods that consume almost all IPs.
- Deploy service type LoadBalancer that refer to Linux pods, This will trigger kube-proxy to create Remote HNS endpoint on Windows. This Remote HNS endpoint will have Linux pod IP.
- Scale down the Linux pods to 1.
- Do multiple time scaling up and down.
- Deploy windows deployment with multiple replicas and have multiple service referencing the deployment.
- Scale down the windows pods to 1.
- Perform the same operation multiple time (scaling up and down)
- After few days we see that there are stale remote HNS endpoints on windows nodes even though the pods are terminated.
- If the same IP get assigned to another windows pod in the next cycle, the network connectivity on that pod fails.
Expected behavior
There should not be any remote HNS endpoints on the windows once the pods are terminated.
Configuration:
- Edition: WS 2022 and WS 2019
- Base Image being used: servercore/iis:windowsservercore
- Container engine: containerd
- HNS Version: Major: 13, Minor: 3
- CNI: aws-vpc-bridge (L2 bridge networking mode)
- Cloud Platform: Amazon EKS
Additional context
- My DEV K8S cluster has 2 linux nodes and 2 windows nodes. After few days I notices that on windows nodes there are few stale HNS remote endpoints
Node: ip-192-168-7-47.us-west-1.compute.internal
PS C:\Windows\system32> Get-HnsEndpoint | Format-Table Id, Name, IPAddress, IsRemoteEndpoint
ID Name IPAddress IsRemoteEndpoint
-- ---- --------- ----------------
b10f186a-ee1d-40a1-bb8a-f532cab4d131 Ethernet 192.168.7.110 True
ba5b0702-d05a-4fec-869f-4352d33f1891 Ethernet 172.0.32.0 True
ddec4b0f-0cf6-4276-bd42-5edd075fd179 Ethernet 192.168.7.84 True
fb92e3ac-3e12-461d-b73f-3ed5270ac42d Ethernet 192.168.7.51 True
e5c241d5-6123-4a8b-b7b1-01fd442ead92 Ethernet 192.168.7.52 True
248b9d31-ae56-4543-b9d0-ef1fc42f2be4 Ethernet 192.168.7.41 True
5abe1c00-18ae-4219-a527-e7d06e4522d6 Ethernet 192.168.7.39 True
042595c0-0031-4344-bbf7-6dfe3bc95b9f Ethernet 192.168.7.36 True
Node: ip-192-168-7-56.us-west-1.compute.internal
PS C:\Windows\system32> Get-HnsEndpoint | Format-Table Id, Name, IPAddress, IsRemoteEndpoint
ID Name IPAddress IsRemoteEndpoint
-- ---- --------- ----------------
ea6140b7-eee0-42c1-9ba2-5d7a4b0fefb1 Ethernet 172.0.32.0 True
b6453839-0d26-4509-931a-e9224a5135e7 Ethernet 192.168.7.49 True
1bf35c74-0c1d-4d26-97ce-5580c509f946 Ethernet 192.168.7.110 True
2b9cc7b7-bd9c-4940-b4da-70c200452d82 Ethernet 192.168.7.84 True
96b49a8c-8c29-41a0-b5ba-512b23189816 Ethernet 192.168.7.39 True
67b7082d-b7e7-4593-a766-8b3a646c8fca Ethernet 192.168.7.52 True
9c003bd3-0e56-494c-b1f6-6ae82a267013 Ethernet 192.168.7.50 True
691d9ecc-b20b-4b8e-99eb-9cf6ea0ee22b Ethernet 192.168.7.41 True
ba497e17-aba8-4146-a5a0-b55fe5338bef Ethernet 192.168.7.36 True
- I launched the all my nodes with in a subnet CIDR range
192.168.7.32/27 - Below are my pods running with no windows pods.
❯ kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-guardduty aws-guardduty-agent-2cjnx 1/1 Running 9 (4h33m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
amazon-guardduty aws-guardduty-agent-w7x9r 1/1 Running 9 (4h33m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
kube-system aws-node-sjcx6 2/2 Running 16 (4h33m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system aws-node-zfws4 2/2 Running 16 (4h33m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
kube-system coredns-6b7fdfbc95-4v4n9 1/1 Running 9 (4h33m ago) 8d 192.168.7.41 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system coredns-6b7fdfbc95-zfhv2 1/1 Running 9 (4h33m ago) 8d 192.168.7.52 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system kube-proxy-jgkcg 1/1 Running 9 (4h33m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
kube-system kube-proxy-n64cg 1/1 Running 9 (4h33m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system metrics-server-8559b8c95f-b7dmf 1/1 Running 9 (4h33m ago) 8d 192.168.7.36 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system metrics-server-8559b8c95f-j94d9 1/1 Running 9 (4h33m ago) 8d 192.168.7.39 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
- From the running pods the
metric-serverandcore-dnspods has the remote endpoints which are valid. - The other endpoints with IP
192.168.7.49, 192.168.7.51should be deleted but those are staying as stale on windows node. - Scaled windows deployment and verified the pods IP. If we see from below, one of the windows pod got assigned an IP
192.168.7.49and if I exec and do the nslook inside pod, it fails.
❯ kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
amazon-guardduty aws-guardduty-agent-2cjnx 1/1 Running 9 (4h42m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
amazon-guardduty aws-guardduty-agent-w7x9r 1/1 Running 9 (4h42m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
default linux-deployment-8676b68d6f-g8gkf 1/1 Running 7 (4h42m ago) 6d22h 192.168.7.38 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-46gv6 1/1 Running 0 45m 192.168.7.61 ip-192-168-7-56.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-4knr2 1/1 Running 0 45m 192.168.7.43 ip-192-168-7-56.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-6g24f 1/1 Running 0 45m 192.168.7.62 ip-192-168-7-47.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-b2ddh 1/1 Running 0 45m 192.168.7.48 ip-192-168-7-47.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-bh6xt 1/1 Running 0 45m 192.168.7.42 ip-192-168-7-56.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-c77js 1/1 Running 0 45m 192.168.7.46 ip-192-168-7-47.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-lh24g 1/1 Running 0 45m 192.168.7.58 ip-192-168-7-47.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-s444z 1/1 Running 0 45m 192.168.7.49 ip-192-168-7-56.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-tbzqn 1/1 Running 0 45m 192.168.7.45 ip-192-168-7-56.us-west-1.compute.internal <none> <none>
default test-app-5d8ff7fc67-tgfv6 1/1 Running 0 45m 192.168.7.59 ip-192-168-7-47.us-west-1.compute.internal <none> <none>
kube-system aws-node-sjcx6 2/2 Running 16 (4h42m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system aws-node-zfws4 2/2 Running 16 (4h42m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
kube-system coredns-6b7fdfbc95-4v4n9 1/1 Running 9 (4h42m ago) 8d 192.168.7.41 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system coredns-6b7fdfbc95-zfhv2 1/1 Running 9 (4h42m ago) 8d 192.168.7.52 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system kube-proxy-jgkcg 1/1 Running 9 (4h42m ago) 8d 192.168.7.53 ip-192-168-7-53.us-west-1.compute.internal <none> <none>
kube-system kube-proxy-n64cg 1/1 Running 9 (4h42m ago) 8d 192.168.7.60 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system metrics-server-8559b8c95f-b7dmf 1/1 Running 9 (4h42m ago) 8d 192.168.7.36 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
kube-system metrics-server-8559b8c95f-j94d9 1/1 Running 9 (4h42m ago) 8d 192.168.7.39 ip-192-168-7-60.us-west-1.compute.internal <none> <none>
❯ kubectl exec -it test-app-5d8ff7fc67-s444z -- powershell
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows
PS C:\> nslookup google.com
DNS request timed out.
timeout was 2 seconds.
Server: UnKnown
Address: 10.100.0.10
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
DNS request timed out.
timeout was 2 seconds.
*** Request to UnKnown timed-out
PS C:\> exit
- Attached the HNS trace logs which are collected using script. Please let us know if you need more details or anything.
9a98.zip - logs on Node ip-192-168-7-56.us-west-1.compute.internal
bf83.zip - logs on Node ip-192-168-7-47.us-west-1.compute.internal