Skip to content

cluster routing mode#4511

Merged
mazdakn merged 6 commits intotigera:masterfrom
mazdakn:felix-routing
Mar 13, 2026
Merged

cluster routing mode#4511
mazdakn merged 6 commits intotigera:masterfrom
mazdakn:felix-routing

Conversation

@mazdakn
Copy link
Member

@mazdakn mazdakn commented Mar 10, 2026

Description

Currently, configuring cluster routing mode is exposed to users through programClusterRoutes in both FelixConfig and BGPConfig.

This PR introduces a new config option to the installation resource, named clusterRoutingMode to provide better UX for users to set cluster routing mode. Supported values are BIRD (default) and Felix.

When set to :

  • BIRD, Calico route engine, i.e. BIRD, is responsible for programming IPIP and no-encap routes. This effectively updates programClusterRoutes in FelixConfig to Disabled and in BGPConfig to Enabled
  • When set to Felix, Felix will program those routes instead. As such programClusterRoutes is updated in FelixConfig to Disabled and in BGPConfig to Enabled

This PR assumes BGPConfiguration resources is always available and only performs patch operation. As such The following PR is need which created the default BGPConfiguration resource on startup:
projectcalico/calico#12045

Operator now needs to create bgpconfiguration resources which is handed in this PR: projectcalico/calico#12080

Release Note

Config option to control whether BIRD or Felix manages intra-cluster routing. 

For PR author

  • Tests for change.
  • If changing pkg/apis/, run make gen-files
  • If changing versions, run make gen-versions

For PR reviewers

A note for code reviewers - all pull requests must have the following:

  • Milestone set according to targeted release.
  • Appropriate labels:
    • kind/bug if this is a bugfix.
    • kind/enhancement if this is a a new feature.
    • enterprise if this PR applies to Calico Enterprise only.

@mazdakn mazdakn requested a review from a team as a code owner March 10, 2026 16:49
@marvin-tigera marvin-tigera added this to the v1.42.0 milestone Mar 10, 2026
@mazdakn mazdakn changed the title [WIP] cluster routing mode cluster routing mode Mar 10, 2026
// confd and BIRD program that route. When ClusterRoutingMode is Felix, it is expected that Felix will program that route.
// Felix always programs such routes for IP Pools with vxlanMode: Always or vxlanMode: CrossSubnet. [Default: BIRD]
// +optional
// +kubebuilder:validation:Enum=BIRD;Felix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually put the Enum validation on the enum type so it can be re-used wherever the enum is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

reqLogger logr.Logger,
) (bool, error) {
updated := false
desiredValue := "Enabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there constants we should be using for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, for these options in the BGPConfig and FelixConfig we use string type, and we rely on kubebuilder to validate inputs.
We use both approach though. But it's cleaner to declare constants. I'll address it separately since needs merging a few PRs.

desiredValue = "Disabled"
}

if bgpConfig.Spec.ProgramClusterRoutes == nil || *bgpConfig.Spec.ProgramClusterRoutes != desiredValue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we usually try to avoid overwriting end-user modifications of FelixConfig, etc using an annotation. Technically I suppose we should do it here... but we're actually pretty inconsistent about it.

I've started a prototype of how to make this more easily implemented: #4523

But I don't want to block this PR on it. So I'll just make sure to incorporate this into my changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input and including the case in your PR.

Comment on lines +2041 to +2048
func (r *ReconcileInstallation) setClusterRoutingOnFelixConfiguration(
ctx context.Context,
install *operatorv1.Installation,
fc *v3.FelixConfiguration,
reqLogger logr.Logger,
) (bool, error) {
updated := false
desiredValue := "Disabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is unused in this function — can be removed.

Comment on lines +2064 to +2071
func (r *ReconcileInstallation) setClusterRoutingOnBGPConfiguration(
ctx context.Context,
install *operatorv1.Installation,
bgpConfig *v3.BGPConfiguration,
reqLogger logr.Logger,
) (bool, error) {
updated := false
desiredValue := "Enabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is unused here too.

Also, both this and setClusterRoutingOnFelixConfiguration above have stale doc comments — they both say setDefaultOnBGPConfiguration which looks like a copy-paste leftover. Should match the actual function names.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// In BIRD cluster routing mode, IPIP currently requires BGP to be running in order to program routes.
if birdClusterRoutingMode &&
(instance.Spec.CalicoNetwork.BGP == nil || *instance.Spec.CalicoNetwork.BGP == operatorv1.BGPDisabled) {
return fmt.Errorf("IPIP encapsulation requires that BGP is enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message still just says "IPIP encapsulation requires that BGP is enabled" — now that there's a way to fix this without BGP (by setting ClusterRoutingMode: Felix), should the error message mention that option? Same for the unencapsulated case below. Otherwise users won't know about the escape hatch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Expect(*bgpConfig.Spec.ProgramClusterRoutes).To(Equal("Disabled"))
})

It("should set vxlanVNI to 10000 when provider is DockerEE", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good coverage for the happy paths. Might be worth adding a test for the case where the default BGPConfiguration doesn't already exist — to verify the create-on-patch path in PatchBGPConfiguration works correctly (or to verify we get an appropriate error if the prerequisite calico PR hasn't landed).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Is create-on-patch a feature of Operator? I was trying to figure out how felixconfiguration is created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! it's part of the patching logic.

// Verify Calico settings, if specified.
if instance.Spec.CalicoNetwork != nil {
bpfDataplane := instance.Spec.CalicoNetwork.LinuxDataplane != nil && *instance.Spec.CalicoNetwork.LinuxDataplane == operatorv1.LinuxDataplaneBPF
birdClusterRoutingMode := !felixProgramsClusterRoutes(instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I find this a bit confusing with the negation.

Suggested change
birdClusterRoutingMode := !felixProgramsClusterRoutes(instance)
felixClusterRouting := felixProgramsClusterRoutes(instance)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@mazdakn mazdakn requested a review from caseydavenport March 12, 2026 21:49
@mazdakn mazdakn merged commit bb0fdd1 into tigera:master Mar 13, 2026
6 checks passed
@mazdakn mazdakn deleted the felix-routing branch March 13, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants