Maglev node-maintenance how-to, Node resource updates, and metrics updates#2577
Maglev node-maintenance how-to, Node resource updates, and metrics updates#2577aaaaaaaalex wants to merge 15 commits intotigera:mainfrom
Conversation
* Documents node maintenance metric for EE,CE * Links maglev metrics from maglev how-to * Documents Node resources new "maintenance" field for EE,CE * Adds how-to for node-maintenance for EE,CE
✅ Deploy Preview for calico-docs-preview-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
❌ Deploy Preview for tigera failed. Why did it fail? →Built without sensitive environment variables
|
There was a problem hiding this comment.
Pull request overview
Adds documentation for Maglev-related Prometheus metrics and introduces docs for marking load balancer nodes as “under maintenance” (plus related Node resource schema updates) across Calico OSS / Enterprise / Cloud.
Changes:
- Document new Maglev conntrack metric (and maintenance metric for Enterprise/Cloud) in Felix Prometheus metrics pages.
- Add “mark load balancer node for maintenance” how-to pages for Enterprise and Cloud.
- Update Node resource reference docs (Enterprise/Cloud) to include new
spec.loadBalancer.maintenancefield.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| calico/reference/felix/prometheus.mdx | Adds Maglev metrics section to OSS Felix Prometheus metrics reference. |
| calico/networking/configuring/add-maglev-load-balancing.mdx | Adds a link to the Maglev Prometheus metrics section from the Maglev how-to. |
| calico-enterprise/reference/resources/node.mdx | Documents new Node loadBalancer field and maintenance sub-field. |
| calico-enterprise/reference/component-resources/node/felix/prometheus.mdx | Adds Maglev metrics + node maintenance metric to Enterprise Felix Prometheus metrics reference. |
| calico-enterprise/networking/configuring/mark-lb-node-for-maintenance.mdx | New Enterprise how-to for marking LB nodes under maintenance. |
| calico-enterprise/networking/configuring/add-maglev-load-balancing.mdx | Adds link to Enterprise Felix Maglev metrics section. |
| calico-cloud/reference/resources/node.mdx | Documents new Node loadBalancer field and maintenance sub-field for Cloud. |
| calico-cloud/reference/component-resources/node/felix/prometheus.mdx | Adds Maglev metrics + node maintenance metric to Cloud Felix Prometheus metrics reference. |
| calico-cloud/networking/configuring/mark-lb-node-for-maintenance.mdx | New Cloud how-to for marking LB nodes under maintenance. |
| calico-cloud/networking/configuring/add-maglev-load-balancing.mdx | Adds link to Cloud Felix Maglev metrics section. |
You can also share your feedback on Copilot code review. Take the survey.
| - Upstream routers might still send traffic to marked nodes, if routing is not configured to exclude | ||
| the marked nodes. In this case, the node under maintenance will load balance traffic to a backend | ||
| on another node. |
There was a problem hiding this comment.
The wrapped lines in this list item aren’t indented enough to be part of the same bullet, which can render oddly in Markdown. Indent the continuation lines under the “Upstream routers…” bullet so they stay within that list item.
| ## Additional resources | ||
| * [Enable the eBPF data plane](../../operations/ebpf/enabling-ebpf.mdx) | ||
| * [FelixConfiguration](../../reference/resources/felixconfig.mdx) | ||
| * [Maglev Load Balancing](add-maglev-load-balancing.mdx) |
There was a problem hiding this comment.
This is a new how-to page, but it isn’t referenced in the Calico Enterprise sidebar navigation (see sidebars-calico-enterprise.js under networking/configuring). Add it there (and/or link it from an existing page) so readers can discover it.
|
|
||
|
|
||
| ## Additional resources | ||
| * [Prometheus Metrics for Maglev](../../reference/felix/prometheux.mdx#maglev-metrics) |
There was a problem hiding this comment.
The link target filename is misspelled (prometheux.mdx), which will produce a broken link. Update it to point at the existing prometheus.mdx page (and keep the #maglev-metrics anchor).
| * [Prometheus Metrics for Maglev](../../reference/felix/prometheux.mdx#maglev-metrics) | |
| * [Prometheus Metrics for Maglev](../../reference/felix/prometheus.mdx#maglev-metrics) |
| | vxlanTunnelMACAddrV6 | MAC address of the IPv6 VXLAN tunnel. This is system configured and should not be updated manually. | | string | | ||
| | orchRefs | Correlates this node to a node in another orchestrator. | | list of [OrchRefs](#orchref) | | ||
| | wireguard | WireGuard configuration for this node. This is applicable only if WireGuard is enabled in [Felix Configuration](felixconfig.mdx). | | [WireGuard](#wireguard) | | ||
| | loadBalancer | Loadbalancer configuration for this node. This is applicable only if the eBPF dataplane is enabled [Felix Configuration](felixconfig.mdx). | | [LoadBalancer](#loadbalancer) | |
There was a problem hiding this comment.
This sentence is missing a word: “enabled in [Felix Configuration]…”. Also consider using “Load balancer” (two words) for consistency with the rest of the docs.
|
|
||
| | Field | Description | Accepted Values | Schema | Default | | ||
| | ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- | | ||
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic loadbalancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string | |
There was a problem hiding this comment.
Typo/wording: “loadbalancing” should be “load balancing” (two words).
|
|
||
| ### Maglev Metrics | ||
|
|
||
| Felix exports statistics for monitoring Maglev loadbalancing. |
There was a problem hiding this comment.
Typo/wording: “loadbalancing” should be “load balancing” (two words).
|
|
||
| ### Maglev Metrics | ||
|
|
||
| Felix exports statistics for monitoring Maglev loadbalancing. |
There was a problem hiding this comment.
Typo/wording: “loadbalancing” should be “load balancing” (two words).
|
|
||
| ### Maglev Metrics | ||
|
|
||
| Felix exports statistics for monitoring Maglev loadbalancing. |
There was a problem hiding this comment.
Typo/wording: “loadbalancing” should be “load balancing” (two words).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…aaaalex/tigera-docs into maglev-node-maintenance-and-metrics
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
fasaxc
left a comment
There was a problem hiding this comment.
I wonder if we should mention that maglev helps with the upstream router problem (even though it doesn't solve it completely, you still want to take the node out of your upstream routing pro-actively)
ctauchen
left a comment
There was a problem hiding this comment.
One big question about whether we need a new page, otherwise all reference content is good to go.
There was a problem hiding this comment.
I wonder whether the new information is a bit thin to merit a standalone page. It's tightly coupled to the existing Maglev content. Can we add this as a section there?
At the extreme end, we could cover this with another section under Additional notes:
### Pause traffic to a load balancer node during maintenance
To drain a node of new LB traffic, annotate it with `X`. Remove the annotation to re-add it. Note: if this would leave a service with zero backends, the annotation is ignored for that service.
That's a very short version, and we can add necessary bits.
There was a problem hiding this comment.
This annotation works outside of Maglev also - which is what stopped me adding it there. I also considered adding this as a section to other pages related to services, but couldn't find one that felt like a good fit
…P", in both the maglev how-to, and the LB Maintenance how-to. Recommend Maglev in the LB Maintenance How-to
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
You can also share your feedback on Copilot code review. Take the survey.
| ## Exclude backends on a node from load balancer backend selection | ||
|
|
||
| To exclude a node's backends from load balancer backend selection, annotate the node with the | ||
| following annotation: | ||
|
|
||
| ```yaml | ||
| lb.projectcalico.org/maintenance: "exclude-local-backends" | ||
| ``` | ||
|
|
||
| To add the backends back into the selection pool, either remove the annotation, or set the value to "none". | ||
|
|
There was a problem hiding this comment.
This procedure says to “annotate the node” but doesn’t specify whether that means a Kubernetes Node object (via kubectl annotate node ...) or the Calico Node custom resource. Since the Node resource reference also introduces spec.loadBalancer.maintenance, it would help to clarify which object is being modified here, and ensure the documented values (exclude-local-backends / none) are consistent with the API values shown in the Node resource reference.
| ## Exclude backends on a node from load balancer backend selection | ||
|
|
||
| To exclude a node's backends from load balancer backend selection, annotate the node with the | ||
| following annotation: | ||
|
|
||
| ```yaml | ||
| lb.projectcalico.org/maintenance: "exclude-local-backends" | ||
| ``` | ||
|
|
||
| To add the backends back into the selection pool, either remove the annotation, or set the value to "none". | ||
|
|
There was a problem hiding this comment.
This procedure says to “annotate the node” but doesn’t specify whether that means a Kubernetes Node object (via kubectl annotate node ...) or the Calico Node custom resource. Since the Node resource reference also introduces spec.loadBalancer.maintenance, it would help to clarify which object is being modified here, and ensure the documented values (exclude-local-backends / none) are consistent with the API values shown in the Node resource reference.
There was a problem hiding this comment.
I now clarify which node resource to annotate
| ### Maglev Metrics | ||
|
|
||
| Felix exports statistics for monitoring Maglev load balancing. | ||
|
|
||
| #### `felix_bpf_conntrack_maglev_entries_total{destination="local|remote",ip_family="4|6"}` | ||
|
|
||
| Total number of Maglev entries in conntrack table broken down by IP version, and, whether the destination backend | ||
| is remote (we're acting as a frontend) or local (we're the backend node). | ||
|
|
||
| #### `felix_bpf_lb_maintenance` | ||
|
|
||
| Can be 1 or 0 (true or false). Indicates whether Felix's node has been marked for maintenance. No newline at end of file |
There was a problem hiding this comment.
The “Maglev Metrics” section now includes felix_bpf_lb_maintenance, which appears to be a general load balancer maintenance metric (not Maglev-specific). Consider moving this metric to its own “Load balancer maintenance” (or similar) section, or renaming the section to cover both Maglev and maintenance metrics so readers don’t miss it.
| ### Maglev Metrics | ||
|
|
||
| Felix exports statistics for monitoring Maglev load balancing. | ||
|
|
||
| #### `felix_bpf_conntrack_maglev_entries_total{destination="local|remote",ip_family="4|6"}` | ||
|
|
||
| Total number of Maglev entries in conntrack table broken down by IP version, and, whether the destination backend | ||
| is remote (we're acting as a frontend) or local (we're the backend node). | ||
|
|
||
| #### `felix_bpf_lb_maintenance` | ||
|
|
||
| Can be 1 or 0 (true or false). Indicates whether Felix's node has been marked for maintenance. No newline at end of file |
There was a problem hiding this comment.
The “Maglev Metrics” section now includes felix_bpf_lb_maintenance, which appears to be a general load balancer maintenance metric (not Maglev-specific). Consider moving this metric to its own “Load balancer maintenance” (or similar) section, or renaming the section to cover both Maglev and maintenance metrics so readers don’t miss it.
| | Field | Description | Accepted Values | Schema | Default | | ||
| | ----------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- | | ||
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string | |
There was a problem hiding this comment.
The accepted values documented for spec.loadBalancer.maintenance (ExcludeLocalBackends, None) don’t match the values shown in the node annotation example in the new maintenance how-to (exclude-local-backends, none). Please align these to the actual API/annotation values, or explicitly document how the Node loadBalancer.maintenance field maps to the Kubernetes node annotation values to avoid user confusion.
| | Field | Description | Accepted Values | Schema | Default | | |
| | ----------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- | | |
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string | | |
| | Field | Description | Accepted Values | Schema | Default | | |
| | ----------- | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | ------- | | |
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends` (Kubernetes node annotation value: `exclude-local-backends`), `None` (Kubernetes node annotation value: `none`) | string | |
| | Field | Description | Accepted Values | Schema | Default | | ||
| | ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- | | ||
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string | |
There was a problem hiding this comment.
The accepted values documented for spec.loadBalancer.maintenance (ExcludeLocalBackends, None) don’t match the values shown in the node annotation example in the new maintenance how-to (exclude-local-backends, none). Please align these to the actual API/annotation values, or explicitly document how the Node loadBalancer.maintenance field maps to the Kubernetes node annotation values to avoid user confusion.
| | Field | Description | Accepted Values | Schema | Default | | |
| | ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- | | |
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string | | |
| | Field | Description | Accepted Values | Schema | Default | | |
| | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | ------ | ------- | | |
| | maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. Maps to Kubernetes node annotation values `exclude-local-backends` and `none`. | Optional: `ExcludeLocalBackends` (`exclude-local-backends` annotation), `None` (`none` annotation) | string | |
Product Version(s):
Node maintenance (how-to, and resource update): EE, CE
Node maintenance metric: EE, CE
New Maglev conntrack metric: OSS, EE, CE
Links to Maglev metrics page from maglev how-to: OSS, EE, CE
Issue:
<!--- Add a link to the Jira ticket or GitHub issue, if applicable. --->
Link to docs preview:
SME review:
DOCS review:
Additional information:
Merge checklist: