Skip to content

Maglev node-maintenance how-to, Node resource updates, and metrics updates#2577

Open
aaaaaaaalex wants to merge 15 commits intotigera:mainfrom
aaaaaaaalex:maglev-node-maintenance-and-metrics
Open

Maglev node-maintenance how-to, Node resource updates, and metrics updates#2577
aaaaaaaalex wants to merge 15 commits intotigera:mainfrom
aaaaaaaalex:maglev-node-maintenance-and-metrics

Conversation

@aaaaaaaalex
Copy link
Contributor

@aaaaaaaalex aaaaaaaalex commented Mar 10, 2026

  • Documents node maintenance metric for EE,CE
  • Documents maglev conntrack metrics for EE,CE,OSS
  • Links maglev metrics from maglev how-to
  • Documents Node resources new "maintenance" field for EE,CE
  • Adds how-to for node-maintenance for EE,CE

Product Version(s):
Node maintenance (how-to, and resource update): EE, CE
Node maintenance metric: EE, CE
New Maglev conntrack metric: OSS, EE, CE
Links to Maglev metrics page from maglev how-to: OSS, EE, CE

Issue:
<!--- Add a link to the Jira ticket or GitHub issue, if applicable. --->

Link to docs preview:

SME review:

  • An SME has approved this change.

DOCS review:

  • A member of the docs team has approved this change.

Additional information:

Merge checklist:

  • Deploy preview inspected wherever changes were made
  • Build completed successfully
  • Test have passed

* Documents node maintenance metric for EE,CE
* Links maglev metrics from maglev how-to
* Documents Node resources new "maintenance" field for EE,CE
* Adds how-to for node-maintenance for EE,CE
@aaaaaaaalex aaaaaaaalex requested a review from a team as a code owner March 10, 2026 10:00
Copilot AI review requested due to automatic review settings March 10, 2026 10:00
@netlify
Copy link

netlify bot commented Mar 10, 2026

Deploy Preview for calico-docs-preview-next ready!

Name Link
🔨 Latest commit 03967fe
🔍 Latest deploy log https://app.netlify.com/projects/calico-docs-preview-next/deploys/69b2984260b4e10008ec1f51
😎 Deploy Preview https://deploy-preview-2577--calico-docs-preview-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify
Copy link

netlify bot commented Mar 10, 2026

Deploy Preview for tigera failed. Why did it fail? →

Built without sensitive environment variables

Name Link
🔨 Latest commit 03967fe
🔍 Latest deploy log https://app.netlify.com/projects/tigera/deploys/69b298427d3b950008ed77c4

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds documentation for Maglev-related Prometheus metrics and introduces docs for marking load balancer nodes as “under maintenance” (plus related Node resource schema updates) across Calico OSS / Enterprise / Cloud.

Changes:

  • Document new Maglev conntrack metric (and maintenance metric for Enterprise/Cloud) in Felix Prometheus metrics pages.
  • Add “mark load balancer node for maintenance” how-to pages for Enterprise and Cloud.
  • Update Node resource reference docs (Enterprise/Cloud) to include new spec.loadBalancer.maintenance field.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
calico/reference/felix/prometheus.mdx Adds Maglev metrics section to OSS Felix Prometheus metrics reference.
calico/networking/configuring/add-maglev-load-balancing.mdx Adds a link to the Maglev Prometheus metrics section from the Maglev how-to.
calico-enterprise/reference/resources/node.mdx Documents new Node loadBalancer field and maintenance sub-field.
calico-enterprise/reference/component-resources/node/felix/prometheus.mdx Adds Maglev metrics + node maintenance metric to Enterprise Felix Prometheus metrics reference.
calico-enterprise/networking/configuring/mark-lb-node-for-maintenance.mdx New Enterprise how-to for marking LB nodes under maintenance.
calico-enterprise/networking/configuring/add-maglev-load-balancing.mdx Adds link to Enterprise Felix Maglev metrics section.
calico-cloud/reference/resources/node.mdx Documents new Node loadBalancer field and maintenance sub-field for Cloud.
calico-cloud/reference/component-resources/node/felix/prometheus.mdx Adds Maglev metrics + node maintenance metric to Cloud Felix Prometheus metrics reference.
calico-cloud/networking/configuring/mark-lb-node-for-maintenance.mdx New Cloud how-to for marking LB nodes under maintenance.
calico-cloud/networking/configuring/add-maglev-load-balancing.mdx Adds link to Cloud Felix Maglev metrics section.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +20 to +22
- Upstream routers might still send traffic to marked nodes, if routing is not configured to exclude
the marked nodes. In this case, the node under maintenance will load balance traffic to a backend
on another node.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapped lines in this list item aren’t indented enough to be part of the same bullet, which can render oddly in Markdown. Indent the continuation lines under the “Upstream routers…” bullet so they stay within that list item.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +54
## Additional resources
* [Enable the eBPF data plane](../../operations/ebpf/enabling-ebpf.mdx)
* [FelixConfiguration](../../reference/resources/felixconfig.mdx)
* [Maglev Load Balancing](add-maglev-load-balancing.mdx)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new how-to page, but it isn’t referenced in the Calico Enterprise sidebar navigation (see sidebars-calico-enterprise.js under networking/configuring). Add it there (and/or link it from an existing page) so readers can discover it.

Copilot uses AI. Check for mistakes.


## Additional resources
* [Prometheus Metrics for Maglev](../../reference/felix/prometheux.mdx#maglev-metrics)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link target filename is misspelled (prometheux.mdx), which will produce a broken link. Update it to point at the existing prometheus.mdx page (and keep the #maglev-metrics anchor).

Suggested change
* [Prometheus Metrics for Maglev](../../reference/felix/prometheux.mdx#maglev-metrics)
* [Prometheus Metrics for Maglev](../../reference/felix/prometheus.mdx#maglev-metrics)

Copilot uses AI. Check for mistakes.
| vxlanTunnelMACAddrV6 | MAC address of the IPv6 VXLAN tunnel. This is system configured and should not be updated manually. | | string |
| orchRefs | Correlates this node to a node in another orchestrator. | | list of [OrchRefs](#orchref) |
| wireguard | WireGuard configuration for this node. This is applicable only if WireGuard is enabled in [Felix Configuration](felixconfig.mdx). | | [WireGuard](#wireguard) |
| loadBalancer | Loadbalancer configuration for this node. This is applicable only if the eBPF dataplane is enabled [Felix Configuration](felixconfig.mdx). | | [LoadBalancer](#loadbalancer) |
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is missing a word: “enabled in [Felix Configuration]…”. Also consider using “Load balancer” (two words) for consistency with the rest of the docs.

Copilot uses AI. Check for mistakes.

| Field | Description | Accepted Values | Schema | Default |
| ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic loadbalancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string |
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/wording: “loadbalancing” should be “load balancing” (two words).

Copilot uses AI. Check for mistakes.

### Maglev Metrics

Felix exports statistics for monitoring Maglev loadbalancing.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/wording: “loadbalancing” should be “load balancing” (two words).

Copilot uses AI. Check for mistakes.

### Maglev Metrics

Felix exports statistics for monitoring Maglev loadbalancing.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/wording: “loadbalancing” should be “load balancing” (two words).

Copilot uses AI. Check for mistakes.

### Maglev Metrics

Felix exports statistics for monitoring Maglev loadbalancing.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo/wording: “loadbalancing” should be “load balancing” (two words).

Copilot uses AI. Check for mistakes.
Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should mention that maglev helps with the upstream router problem (even though it doesn't solve it completely, you still want to take the node out of your upstream routing pro-actively)

Copy link
Collaborator

@ctauchen ctauchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big question about whether we need a new page, otherwise all reference content is good to go.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the new information is a bit thin to merit a standalone page. It's tightly coupled to the existing Maglev content. Can we add this as a section there?

At the extreme end, we could cover this with another section under Additional notes:

### Pause traffic to a load balancer node during maintenance

To drain a node of new LB traffic, annotate it with `X`. Remove the annotation to re-add it. Note: if this would leave a service with zero backends, the annotation is ignored for that service.

That's a very short version, and we can add necessary bits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation works outside of Maglev also - which is what stopped me adding it there. I also considered adding this as a section to other pages related to services, but couldn't find one that felt like a good fit

…P", in both the maglev how-to, and the LB Maintenance how-to. Recommend Maglev in the LB Maintenance How-to
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.


You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +31 to +41
## Exclude backends on a node from load balancer backend selection

To exclude a node's backends from load balancer backend selection, annotate the node with the
following annotation:

```yaml
lb.projectcalico.org/maintenance: "exclude-local-backends"
```

To add the backends back into the selection pool, either remove the annotation, or set the value to "none".

Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure says to “annotate the node” but doesn’t specify whether that means a Kubernetes Node object (via kubectl annotate node ...) or the Calico Node custom resource. Since the Node resource reference also introduces spec.loadBalancer.maintenance, it would help to clarify which object is being modified here, and ensure the documented values (exclude-local-backends / none) are consistent with the API values shown in the Node resource reference.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

Comment on lines +31 to +41
## Exclude backends on a node from load balancer backend selection

To exclude a node's backends from load balancer backend selection, annotate the node with the
following annotation:

```yaml
lb.projectcalico.org/maintenance: "exclude-local-backends"
```

To add the backends back into the selection pool, either remove the annotation, or set the value to "none".

Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This procedure says to “annotate the node” but doesn’t specify whether that means a Kubernetes Node object (via kubectl annotate node ...) or the Calico Node custom resource. Since the Node resource reference also introduces spec.loadBalancer.maintenance, it would help to clarify which object is being modified here, and ensure the documented values (exclude-local-backends / none) are consistent with the API values shown in the Node resource reference.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now clarify which node resource to annotate

Comment on lines +358 to +369
### Maglev Metrics

Felix exports statistics for monitoring Maglev load balancing.

#### `felix_bpf_conntrack_maglev_entries_total{destination="local|remote",ip_family="4|6"}`

Total number of Maglev entries in conntrack table broken down by IP version, and, whether the destination backend
is remote (we're acting as a frontend) or local (we're the backend node).

#### `felix_bpf_lb_maintenance`

Can be 1 or 0 (true or false). Indicates whether Felix's node has been marked for maintenance. No newline at end of file
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Maglev Metrics” section now includes felix_bpf_lb_maintenance, which appears to be a general load balancer maintenance metric (not Maglev-specific). Consider moving this metric to its own “Load balancer maintenance” (or similar) section, or renaming the section to cover both Maglev and maintenance metrics so readers don’t miss it.

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +240
### Maglev Metrics

Felix exports statistics for monitoring Maglev load balancing.

#### `felix_bpf_conntrack_maglev_entries_total{destination="local|remote",ip_family="4|6"}`

Total number of Maglev entries in conntrack table broken down by IP version, and, whether the destination backend
is remote (we're acting as a frontend) or local (we're the backend node).

#### `felix_bpf_lb_maintenance`

Can be 1 or 0 (true or false). Indicates whether Felix's node has been marked for maintenance. No newline at end of file
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Maglev Metrics” section now includes felix_bpf_lb_maintenance, which appears to be a general load balancer maintenance metric (not Maglev-specific). Consider moving this metric to its own “Load balancer maintenance” (or similar) section, or renaming the section to cover both Maglev and maintenance metrics so readers don’t miss it.

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +84
| Field | Description | Accepted Values | Schema | Default |
| ----------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string |
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accepted values documented for spec.loadBalancer.maintenance (ExcludeLocalBackends, None) don’t match the values shown in the node annotation example in the new maintenance how-to (exclude-local-backends, none). Please align these to the actual API/annotation values, or explicitly document how the Node loadBalancer.maintenance field maps to the Kubernetes node annotation values to avoid user confusion.

Suggested change
| Field | Description | Accepted Values | Schema | Default |
| ----------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string |
| Field | Description | Accepted Values | Schema | Default |
| ----------- | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends` (Kubernetes node annotation value: `exclude-local-backends`), `None` (Kubernetes node annotation value: `none`) | string |

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +84
| Field | Description | Accepted Values | Schema | Default |
| ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string |
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The accepted values documented for spec.loadBalancer.maintenance (ExcludeLocalBackends, None) don’t match the values shown in the node annotation example in the new maintenance how-to (exclude-local-backends, none). Please align these to the actual API/annotation values, or explicitly document how the Node loadBalancer.maintenance field maps to the Kubernetes node annotation values to avoid user confusion.

Suggested change
| Field | Description | Accepted Values | Schema | Default |
| ----------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. | Optional: `ExcludeLocalBackends`, `None` | string |
| Field | Description | Accepted Values | Schema | Default |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------- | ------ | ------- |
| maintenance | Indicates if the node is under maintenance, for the purposes of external traffic load balancing. eBPF dataplane only. Maps to Kubernetes node annotation values `exclude-local-backends` and `none`. | Optional: `ExcludeLocalBackends` (`exclude-local-backends` annotation), `None` (`none` annotation) | string |

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants