From 71610c196fda68b0b79ae76165360bf880f4bb45 Mon Sep 17 00:00:00 2001 From: MeoK <12069138+chinameok@users.noreply.github.com> Date: Wed, 11 Feb 2026 16:40:00 +0800 Subject: [PATCH 1/2] docs: clarify node removal strategies for IP recovery --- docs/en/node.mdx | 172 ++++++++++++++++++++++++++--------------------- 1 file changed, 97 insertions(+), 75 deletions(-) diff --git a/docs/en/node.mdx b/docs/en/node.mdx index 6d60ae2..43898b4 100644 --- a/docs/en/node.mdx +++ b/docs/en/node.mdx @@ -384,123 +384,145 @@ When scaling up, new nodes are created immediately without affecting existing no #### Removing Worker Nodes -Decrease the number of worker nodes to reduce cluster capacity or remove underutilized resources. +Decrease the number of worker nodes to reduce cluster capacity or remove underutilized resources. The Cluster API supports two removal strategies: -**Use Case**: Scale down cluster to reduce costs or adjust to reduced workload +1. **Random removal**: Reduce replicas, the platform randomly selects and deletes machines +2. **Targeted removal**: Mark specific machines for deletion, then reduce replicas (recommended for IP recovery) + +:::info +**IP Recovery Scenario** +When you need to recycle specific machine IPs (e.g., for reassignment or IP pool management), use the targeted removal method. The deletion annotation ensures the platform deletes the marked machines, not random ones. +::: + +##### Random Removal + +**Use Case**: Scale down cluster where any node can be removed (no specific IP requirements) **Procedure**: -1. **Identify Nodes to Remove** +1. **Identify Current Machine Status** - View the current machines in the MachineDeployment: + View the current machines in the MachineDeployment: - ```bash - kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name= - ``` + ```bash + kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name= + ``` 2. **Scale Down the MachineDeployment** - Update the `replicas` field to reduce the node count: + Update the `replicas` field to reduce the node count: - ```bash - kubectl patch machinedeployment -n cpaas-system \ - --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": }]' - ``` + ```bash + kubectl patch machinedeployment -n cpaas-system \ + --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": }]' + ``` - **Example**: Scale from 5 to 3 nodes + **Example**: Scale from 5 to 3 nodes - ```bash - kubectl patch machinedeployment worker-pool-1 -n cpaas-system \ - --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": 3}]' - ``` + ```bash + kubectl patch machinedeployment worker-pool-1 -n cpaas-system \ + --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": 3}]' + ``` + + The Cluster API controller will randomly select and delete machines to match the desired replica count. 3. **Monitor the Removal Progress** - Watch the machine deletion process: + Watch the machine deletion process: - ```bash - kubectl get machines -n cpaas-system -w - ``` + ```bash + kubectl get machines -n cpaas-system -w + ``` - The Cluster API controller will: - - Drain the selected nodes (evict pods if possible) - - Delete the underlying VMs from the DCS platform - - Remove the machine resources + The Cluster API controller will: + - Drain the selected nodes (evict pods if possible) + - Delete the underlying VMs from the DCS platform + - Remove the machine resources 4. **Verify Nodes Removed** - Switch to the target cluster context: + Switch to the target cluster context: - ```bash - kubectl config use-context - kubectl get nodes - ``` + ```bash + kubectl config use-context + kubectl get nodes + ``` - The removed nodes should no longer appear in the list. + The removed nodes should no longer appear in the list. -:::warning -**Data Loss Warning** -Scaling down removes nodes and their associated disks. Ensure: -- Workloads can tolerate node loss through proper replication -- No critical data is stored only on the nodes being removed -- Applications are designed for horizontal scaling -::: +##### Targeted Removal -#### Deleting Specific Nodes +**Use Case**: Remove specific machines (e.g., for IP recovery, replace unhealthy nodes) -Remove a specific unhealthy or problematic node while maintaining the replica count. +**Procedure**: -**Use Case**: Replace a single unhealthy node without scaling the entire deployment +1. **Identify Machines to Remove** -**Procedure**: + View the current machines: -1. **Identify the Unhealthy Machine** + ```bash + kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name= + ``` - Find the machine corresponding to the unhealthy node: + Note the `` of the machines you want to remove. - ```bash - # List all machines - kubectl get machines -n cpaas-system +2. **Annotate Machines for Deletion** - # Check machine status - kubectl get machine -n cpaas-system -o yaml - ``` + Mark the specific machines for deletion: -2. **Annotate the Machine for Deletion** + ```bash + kubectl patch machine -n cpaas-system \ + --type='merge' -p='{"metadata": {"annotations": {"cluster.x-k8s.io/delete-machine": "true"}}}' + ``` - Mark the machine for deletion: + Repeat for each machine you want to remove. - ```bash - kubectl patch machine -n cpaas-system \ - --type='merge' -p='{"metadata": {"annotations": {"cluster.x-k8s.io/delete-machine": "true"}}}' - ``` + **Example**: Remove two specific machines -3. **Wait for Machine Replacement** + ```bash + kubectl patch machine worker-pool-1-abc123 -n cpaas-system \ + --type='merge' -p='{"metadata": {"annotations": {"cluster.x-k8s.io/delete-machine": "true"}}}' - The Cluster API controller will: - - Delete the annotated machine - - Create a new machine to maintain the desired replica count - - The new machine will automatically join the cluster + kubectl patch machine worker-pool-1-def456 -n cpaas-system \ + --type='merge' -p='{"metadata": {"annotations": {"cluster.x-k8s.io/delete-machine": "true"}}}' + ``` -4. **Monitor the Replacement** +3. **Scale Down the MachineDeployment** - ```bash - # Watch machine status - kubectl get machines -n cpaas-system -w - ``` + After annotating the machines, reduce the replica count: -5. **Verify Node Replacement** + ```bash + kubectl patch machinedeployment -n cpaas-system \ + --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": }]' + ``` - ```bash - kubectl config use-context - kubectl get nodes - ``` + The platform will delete the **annotated** machines, not randomly selected ones. - The new node should appear and transition to `Ready` status. +4. **Monitor the Removal Progress** -:::info -**Automatic Replacement** -If you only annotate a machine for deletion without changing the replica count, the MachineDeployment automatically creates a replacement machine to maintain the desired state. + Watch the machine deletion process: + + ```bash + kubectl get machines -n cpaas-system -w + ``` + +5. **Verify Nodes Removed** + + Switch to the target cluster context: + + ```bash + kubectl config use-context + kubectl get nodes + ``` + + The removed nodes should no longer appear in the list. + +:::warning +**Data Loss Warning** +Scaling down removes nodes and their associated disks. Ensure: +- Workloads can tolerate node loss through proper replication +- No critical data is stored only on the nodes being removed +- Applications are designed for horizontal scaling ::: ### Upgrading Machine Infrastructure From f9f3698c6fdaf924ca1d5c8411963a3c0ae4559b Mon Sep 17 00:00:00 2001 From: MeoK <12069138+chinameok@users.noreply.github.com> Date: Sat, 14 Feb 2026 09:26:08 +0800 Subject: [PATCH 2/2] docs: improve node scaling and removal documentation - Add IP pool expansion step before scaling up worker nodes - Move Data Loss Warning to apply to both removal strategies - Add replica count guidance for targeted machine removal - Update info callout for template rollout behavior - Move version compatibility warning to proper section Co-Authored-By: Claude Sonnet 4.5 --- docs/en/node.mdx | 175 ++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 151 insertions(+), 24 deletions(-) diff --git a/docs/en/node.mdx b/docs/en/node.mdx index 43898b4..e939a33 100644 --- a/docs/en/node.mdx +++ b/docs/en/node.mdx @@ -318,22 +318,140 @@ Increase the number of worker nodes to handle increased workload or add new capa kubectl get machines -n cpaas-system -l cluster.x-k8s.io/deployment-name= ``` -2. **Verify IP Pool Capacity** +2. **Extend IP Pool** - Before scaling, ensure the IP pool has enough available entries: + Before scaling up, add new IP configurations to the pool for the additional nodes. + + :::info + **IP Pool Expansion** + The IP pool must contain at least as many entries as the desired replica count. Add new IP entries for each additional worker node you plan to deploy. + ::: + + **Add IP entries to the pool:** + + First, export the current pool configuration to preserve existing entries: ```bash - kubectl get dcsiphostnamepool -n cpaas-system -o yaml + kubectl get dcsiphostnamepool -n cpaas-system -o yaml ``` - Check that the pool contains at least as many entries as the desired replica count. + Then use the following command to add new IP configurations. The `pool` array must include **all existing entries** plus the new entries: + + ```bash + kubectl patch dcsiphostnamepool -n cpaas-system \ + --type='merge' -p=' + { + "spec": { + "pool": [ + { + "ip": "", + "mask": "", + "gateway": "", + "dns": "", + "hostname": "", + "machineName": "" + }, + { + "ip": "", + "mask": "", + "gateway": "", + "dns": "", + "hostname": "", + "machineName": "" + }, + { + "ip": "", + "mask": "", + "gateway": "", + "dns": "", + "hostname": "", + "machineName": "" + }, + { + "ip": "", + "mask": "", + "gateway": "", + "dns": "", + "hostname": "", + "machineName": "" + } + ] + }' + ``` :::warning - **IP Pool Requirement** - If the IP pool has insufficient entries, add more IP entries to the pool before scaling. Refer to the [IP Pool Configuration](#step-1-configure-ip-hostname-pool) section for guidance on adding entries. + **Important Notes** + - The `pool` array must include **all existing entries** plus the new entries you want to add + - Copy the existing entries from the exported YAML to avoid data loss + - Ensure each new entry has unique `ip`, `hostname`, and `machineName` values + - Network parameters (`mask`, `gateway`, `dns`) typically match existing entries ::: -3. **Scale Up the MachineDeployment** + **Example**: Adding 2 new nodes to an existing pool of 3 nodes + + ```bash + # Current pool has 3 entries (10.0.1.11, 10.0.1.12, 10.0.1.13) + # Adding 2 more entries for nodes 4 and 5 + kubectl patch dcsiphostnamepool worker-pool-1-ippool -n cpaas-system \ + --type='merge' -p=' + { + "spec": { + "pool": [ + { + "ip": "10.0.1.11", + "mask": "255.255.255.0", + "gateway": "10.0.1.1", + "dns": "10.0.0.2", + "hostname": "worker-node-1", + "machineName": "worker-vm-1" + }, + { + "ip": "10.0.1.12", + "mask": "255.255.255.0", + "gateway": "10.0.1.1", + "dns": "10.0.0.2", + "hostname": "worker-node-2", + "machineName": "worker-vm-2" + }, + { + "ip": "10.0.1.13", + "mask": "255.255.255.0", + "gateway": "10.0.1.1", + "dns": "10.0.0.2", + "hostname": "worker-node-3", + "machineName": "worker-vm-3" + }, + { + "ip": "10.0.1.14", + "mask": "255.255.255.0", + "gateway": "10.0.1.1", + "dns": "10.0.0.2", + "hostname": "worker-node-4", + "machineName": "worker-vm-4" + }, + { + "ip": "10.0.1.15", + "mask": "255.255.255.0", + "gateway": "10.0.1.1", + "dns": "10.0.0.2", + "hostname": "worker-node-5", + "machineName": "worker-vm-5" + } + ] + }' + ``` + +3. **Verify IP Pool Capacity** + + After extending the IP pool, verify it has sufficient entries for the desired replica count: + + ```bash + kubectl get dcsiphostnamepool -n cpaas-system -o yaml + ``` + + Check that the pool contains at least as many entries as the desired replica count. + +4. **Scale Up the MachineDeployment** Update the `replicas` field to the desired number of nodes: @@ -349,7 +467,7 @@ Increase the number of worker nodes to handle increased workload or add new capa --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": 5}]' ``` -4. **Monitor the Scaling Progress** +5. **Monitor the Scaling Progress** Watch the machine creation process: @@ -363,7 +481,7 @@ Increase the number of worker nodes to handle increased workload or add new capa The Cluster API controller will automatically create new machines based on the MachineDeployment template. -5. **Verify Nodes Joined the Cluster** +6. **Verify Nodes Joined the Cluster** Switch to the target cluster context and verify the new nodes: @@ -394,6 +512,14 @@ Decrease the number of worker nodes to reduce cluster capacity or remove underut When you need to recycle specific machine IPs (e.g., for reassignment or IP pool management), use the targeted removal method. The deletion annotation ensures the platform deletes the marked machines, not random ones. ::: +:::warning +**Data Loss Warning** +Scaling down removes nodes and their associated disks. Ensure: +- Workloads can tolerate node loss through proper replication +- No critical data is stored only on the nodes being removed +- Applications are designed for horizontal scaling +::: + ##### Random Removal **Use Case**: Scale down cluster where any node can be removed (no specific IP requirements) @@ -491,11 +617,20 @@ When you need to recycle specific machine IPs (e.g., for reassignment or IP pool After annotating the machines, reduce the replica count: + :::info + **Replica Count Must Match Annotated Machines** + Reduce replicas by exactly the number of annotated machines. + - If you reduce by fewer, not all annotated machines will be removed + - If you reduce by more, additional machines will be randomly selected for deletion + ::: + ```bash kubectl patch machinedeployment -n cpaas-system \ --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value": }]' ``` + **Example**: If you annotated 2 machines, reduce replicas by exactly 2 (e.g., from 5 to 3) + The platform will delete the **annotated** machines, not randomly selected ones. 4. **Monitor the Removal Progress** @@ -517,14 +652,6 @@ When you need to recycle specific machine IPs (e.g., for reassignment or IP pool The removed nodes should no longer appear in the list. -:::warning -**Data Loss Warning** -Scaling down removes nodes and their associated disks. Ensure: -- Workloads can tolerate node loss through proper replication -- No critical data is stored only on the nodes being removed -- Applications are designed for horizontal scaling -::: - ### Upgrading Machine Infrastructure To upgrade worker machine specifications (CPU, memory, disk, VM template), follow these steps: @@ -571,7 +698,7 @@ Bootstrap templates (KubeadmConfigTemplate) are used by MachineDeployment and Ma - Update `spec.template.spec.bootstrap.configRef.name` to reference the new template - Apply the changes to trigger a rolling update -:::warning +:::info **Template Rollout Behavior** Existing machines continue using the old bootstrap configuration. Only newly created machines (during scaling or rolling updates) will use the updated template. ::: @@ -580,6 +707,11 @@ Existing machines continue using the old bootstrap configuration. Only newly cre Kubernetes version upgrades require coordinated updates to both the MachineDeployment and the underlying VM template to ensure compatibility. +:::warning +**Version Compatibility** +Ensure the VM template's Kubernetes version matches the version specified in the MachineDeployment. Mismatched versions will cause node join failures. +::: + **Upgrade Process:** 1. **Update Machine Template** @@ -595,9 +727,4 @@ Kubernetes version upgrades require coordinated updates to both the MachineDeplo 3. **Monitor Upgrade** - The system will perform a rolling upgrade of worker nodes - Verify that new nodes join the cluster with the correct Kubernetes version - - Monitor cluster health throughout the upgrade process - -:::warning -**Version Compatibility** -Ensure the VM template's Kubernetes version matches the version specified in the MachineDeployment. Mismatched versions will cause node join failures. -::: + - Monitor cluster health throughout the upgrade process \ No newline at end of file