docs(autoscaling): add cluster-autoscaler guides for Hetzner and Azure#419
docs(autoscaling): add cluster-autoscaler guides for Hetzner and Azure#419
Conversation
Add comprehensive documentation for deploying cluster-autoscaler on Cozystack with Hetzner Cloud and Azure providers, covering Talos image creation, infrastructure setup, and troubleshooting. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
The kubelet cloud-provider: external flag is required for Azure cloud-controller-manager to assign ProviderID to nodes. Without it, cluster-autoscaler cannot match Kubernetes nodes to VMSS instances. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the Cozystack documentation by introducing comprehensive guides for setting up and managing Kubernetes Cluster Autoscaler on Hetzner Cloud and Azure. These new resources empower users to implement automatic node scaling for their Cozystack management clusters, ensuring efficient resource utilization and responsiveness to workload demands across different cloud environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds comprehensive documentation for setting up cluster autoscaling on Hetzner Cloud and Azure. The guides are detailed and cover infrastructure setup, image creation, configuration, and troubleshooting. I've found a few areas for improvement, primarily around security practices in the Azure guide, such as avoiding plaintext credentials in shell history and Kubernetes manifests. I've also noted a potential typo in a Kubernetes version in the Hetzner guide. My suggestions aim to make the documentation more secure and accurate for users.
| azureClientID: "<APP_ID>" | ||
| azureClientSecret: "<PASSWORD>" | ||
| azureTenantID: "<TENANT_ID>" | ||
| azureSubscriptionID: "<SUBSCRIPTION_ID>" |
There was a problem hiding this comment.
Storing credentials like azureClientSecret directly in the YAML is a critical security risk, as they will be committed to version control in plaintext. You should create a Kubernetes secret to hold these credentials and reference it from the Package definition. This follows the best practice already demonstrated in the Hetzner autoscaler guide.
First, add a new step to create a secret with your Azure credentials:
# This secret should be created in the same namespace as the package
kubectl create secret generic azure-credentials \
--from-literal=AZURE_CLIENT_ID="<APP_ID>" \
--from-literal=AZURE_CLIENT_SECRET="<PASSWORD>" \
--from-literal=AZURE_TENANT_ID="<TENANT_ID>" \
--from-literal=AZURE_SUBSCRIPTION_ID="<SUBSCRIPTION_ID>"Then, update the Package resource to use these secrets via environment variables. The cluster-autoscaler for Azure can read credentials from environment variables.
| azureClientID: "<APP_ID>" | |
| azureClientSecret: "<PASSWORD>" | |
| azureTenantID: "<TENANT_ID>" | |
| azureSubscriptionID: "<SUBSCRIPTION_ID>" | |
| extraEnvSecrets: | |
| AZURE_CLIENT_ID: | |
| name: azure-credentials | |
| key: AZURE_CLIENT_ID | |
| AZURE_CLIENT_SECRET: | |
| name: azure-credentials | |
| key: AZURE_CLIENT_SECRET | |
| AZURE_TENANT_ID: | |
| name: azure-credentials | |
| key: AZURE_TENANT_ID | |
| AZURE_SUBSCRIPTION_ID: | |
| name: azure-credentials | |
| key: AZURE_SUBSCRIPTION_ID |
| az login --service-principal \ | ||
| --username "<APP_ID>" \ | ||
| --password "<PASSWORD>" \ | ||
| --tenant "<TENANT_ID>" |
There was a problem hiding this comment.
Using the --password flag with a plaintext password in the az login command is a security risk, as the password can be stored in your shell's history file. A more secure method is to use environment variables for credentials.
The Azure CLI will automatically pick up the AZURE_CLIENT_SECRET environment variable for the password.
| az login --service-principal \ | |
| --username "<APP_ID>" \ | |
| --password "<PASSWORD>" \ | |
| --tenant "<TENANT_ID>" | |
| export AZURE_CLIENT_ID="<APP_ID>" | |
| export AZURE_CLIENT_SECRET="<PASSWORD>" | |
| export AZURE_TENANT_ID="<TENANT_ID>" | |
| az login --service-principal \ | |
| --username "$AZURE_CLIENT_ID" \ | |
| --tenant "$AZURE_TENANT_ID" |
| kilo.squat.ai/location: hetzner-cloud | ||
| topology.kubernetes.io/zone: hetzner-cloud | ||
| kubelet: | ||
| image: ghcr.io/siderolabs/kubelet:v1.33.1 |
There was a problem hiding this comment.
The specified kubelet image version v1.33.1 appears to be incorrect, as this is not a valid Kubernetes version. This is likely a typo and could cause confusion or errors for users. Please replace it with a placeholder and clarify that the user should select a version that matches their Kubernetes cluster version.
| image: ghcr.io/siderolabs/kubelet:v1.33.1 | |
| image: ghcr.io/siderolabs/kubelet:<k8s-version> |
…live Kilo reads kilo.squat.ai/location from node annotations, not labels. Using nodeLabels for this value does not work. Add kilo.squat.ai/persistent-keepalive annotation which is required for WireGuard NAT traversal on cloud nodes (especially Azure nodes behind NAT). Without it, Kilo's NAT endpoint discovery is disabled and tunnels will not stabilize. Replace force-endpoint approach in Azure docs with the simpler persistent-keepalive mechanism that enables automatic NAT traversal. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Add documentation for Azure UDR (User Defined Route) table required for Kilo non-leader node connectivity and VMSS IP forwarding setup. Without these, reply traffic from non-leader nodes to remote subnets is dropped by Azure SDN because it routes by destination IP, not Linux next-hop. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Summary
operations/cluster/autoscaling/Test plan
hugo serve