Enable Network Observability on Day 0 by stleerh · Pull Request #1908 · openshift/enhancements

stleerh · 2025-12-16T13:58:01Z

No description provided.

jotak · 2025-12-16T16:22:39Z

/cc

simonpasquier · 2025-12-17T08:21:10Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+Being able to manage and observe the network in an OpenShift cluster is critical in maintaining the health and integrity of the network.  Without it, there’s no way to verify whether your changes are working as expected or whether your network is experiencing issues.
+
+Currently, Network Observability is an optional operator that many customers are not aware of.  A majority of customers using OpenShift Networking do not have Network Observability installed.  Customers are missing out on features that they should have and have already paid for.


from telemetry, what's the percentage of clusters with NOO installed today?

We don't have telemetry for the percentage of clusters installed. It's one of the top downloaded operators but in terms of usage, it's not in the top 20.

We don't have telemetry for the percentage of clusters installed

we should be able to that information from telemetry already (OLM reports metrics about installed operators).

Does it have percentage of clusters installed?

we have the total number of clusters reporting to telemetry and the total number of clusters with NOO installed.

enhancements/observability/enable-network-observability-on-day-0.md

simonpasquier · 2025-12-17T08:25:47Z

enhancements/observability/enable-network-observability-on-day0.md

+### Risks and Mitigations
+
+* Network Observability requires CPU, memory, and storage that the customer might not be aware of.
+  Mitigation: The default setting stores only metrics at a high sampling interval to minimize the use of resources. If this isn’t sufficient, more fine-tuning and filtering can be done in the provided default configuration (e.g. filtering on specific interfaces only).


Single node OpenShift (SNO) clusters have strict requirements in terms of resource usage: is it ok that NOO gets installed by default in this case?

Clarified the stance on this

simonpasquier · 2025-12-17T08:28:51Z

enhancements/observability/enable-network-observability-on-day0.md

+
+### Risks and Mitigations
+
+* Network Observability requires CPU, memory, and storage that the customer might not be aware of.


Can we get an estimate of the overhead running NOO with a "stripped-down" flow collector on a typical CI cluster (3 control plane nodes + 3 workers)?

Estimates are in the works. The target goals are listed in the Test Plan section of this document.

simonpasquier · 2025-12-17T08:32:35Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+Summary:
+
+* Sampling at 400


Does it depend on the in-cluster Prometheus stack? If yes how does it play with #1880?

Yes. PM has said, "The default OpenShift experience should continue to include Prometheus, alerts, and baseline dashboards."

@simonpasquier I'm currently trying to understand how that plays with optional monitoring, but I don't think that's something tied to this EP (I mean, the question is equally relevant to when netobserv or any other redhat operator is installed from operatorhub?)

My guess is that, when CMO is in lightweight mode like that, netobserv will show no data and potentially query failures;
Given that this is an explicit choice of the user, I don't think it's too problematic; it would make sense, in that case, that the user manually uninstalls netobserv.

To rephrase my question: is it worth deploying netobserv if there's no platform monitoring stack? I understand that users create sub-optimal setups but it might be better to prevent it in the first place?

Then when users disable the in-cluster monitoring stack (as envisioned in #1880), CNO shouldn't install NOO?

I don't think CNO should not install NOO when user explicitly asks to install it. (Or are you talking more specifically about what the default behaviour should be?)
NOO still has valid usages without the monitoring stack, e.g. if configured to export flows to a 3rd party system.
So if the users asks to install it, we shouldn't go against their will.

is there some install pre-flight checks where we can generate warnings in such situations?

Whatever option we pick (never install NOO at day-0 when monitoring is disabled or delegate the decision to the user), I'd rather see it documented somewhere including what it entails in terms of features/limitations (and this EP seems to be the right place).

That should be documented in Network Observability. This EP just enables or not enables Network Observability.

simonpasquier · 2025-12-17T08:37:01Z

enhancements/observability/enable-network-observability-on-day0.md

+
+### Topology Considerations
+
+All topologies are supported where CNO is supported, so this excludes MicroShift.


Can we describe briefly how it works for HyperShift?

This proposal doesn't change how Network Observability works in a Hosted Control Plane (HCP) environment. Network Observability is supported on host clusters and the management cluster.

Should the management cluster admin be able to override this (e.g. disable netobserv for all hosted clusters by default)?

There's no single parameter to do that. You can prevent Network Observability from being installed/enabled on a per-cluster basis.

I was more thinking of a global switch for the admin of the management cluster running all the hosted control planes.

That could be something that's added in the future.

simonpasquier · 2025-12-17T08:40:44Z

enhancements/observability/enable-network-observability-on-day-0.md

+4. Wait for NOO to be ready and the OpenShift web console to be available.
+5. Create the "netobserv" namespace if it doesn't exist.
+6. Check if a FlowCollector instance exists. If yes, exit.
+7. Create a FlowCollector instance.


Concretely what gets deployed in terms of pods and services? Regarding pods, what level of customization is offered in terms of resource requests/limits and scheduling (infrastructure nodes)?

This is subject to change, but the planned deployment is in the FlowCollector Custom Resource (CR) section of this document. There is no customization for the user. If this is needed, you would just edit the FlowCollector custom resource directly.

@simonpasquier how do other in-payload operators typically do that? Through day-0 config?

I think our intent is that the default sampling settings give us a comfortable margin to not run into oomkill from the start. Later on, admins can change the sampling & adapt the limits accordingly.

For compute resources, core operators define default resource requests but no limits (https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits). I can mostly speak for the cluster monitoring operator but in our case, admins would customize the CMO configuration after installation to schedule components on infra nodes or adjust resource requests for instance.

The best practice is to NOT set a CPU limit but to set a memory limit to avoid the OOM killer. Typically,you set the memory limit equal to the memory request.

not for OCP core components as documented in the linked pages.

Going back to your comment, you can also adjust resources after installation.

The current proposal just takes the default values from Network Observability. If those values are not right, the preference is to change it in Network Observability rather than to override the values here.

jan--f · 2025-12-22T15:47:33Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+Rather than actually installing NOO and creating the FlowCollector instance, it is less risky and simpler to just display a panel or a button to let the user install and enable Network Observability.  This resolves the awareness issue.  However, by doing this, it will get much less installs compared to making it enabled by default.  It goes against the principle that networking and network observability should always go hand in hand and be there from the start.
+
+## Alternatives (Not Implemented)


Was installation via the assisted-installer considered? https://github.com/openshift/assisted-service/blob/master/docs/dev/olm-operator-plugins.md

This seems like a viable option to mitigate the drawbacks around topologies and resource constraints.

That's a parallel work:
https://issues.redhat.com/browse/NETOBSERV-2486
openshift/assisted-service#8729

However, assisted installer doesn't cover all the installation cases. That's why we consider having an alternative.

enhancements/observability/enable-network-observability-on-day-0.md

openshift-bot · 2026-02-06T01:15:40Z

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2026-02-13T08:45:57Z

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

enhancements/observability/enable-network-observability-on-day-0.md

simonpasquier · 2026-02-17T11:10:27Z

enhancements/observability/enable-network-observability-on-day-0.md

+  deploymentModel: Service
+  loki:
+    enable: false
+  namespace: netobserv


I'm not very familiar with network observability but does netobserv here mean in which namespace the FlowCollector resources will be installed? If yes, why not an openshift- prefixed namespace?

that's a good point, the CRD default is "netobserv", but it could make more sense to use "openshift-netobserv" now that's baked into the installer.

This was a discussion from many years ago (NETOBSERV-373). The original description was changed so if you look at the History link, it said:

The current project name for Network Observability is "network-observability". The downstream version should be in a project that begins with "openshift-" that is reserved for OpenShift namespaces. The suggested new name is "openshift-network-observability".

Nevertheless, it's a good point that should be reconsidered. However, this proposal would just follow whatever Network Observability decides to do.

I was rather thinking that, on the contrary, what we do here can deviate from upstream

If it is going to be an openshift-owned thing, deviating from upstream and using an openshift-* style namespace would be my recommendation

simonpasquier · 2026-02-17T11:14:28Z

enhancements/observability/enable-network-observability-on-day-0.md

+- Hosted Control Plane (HCP) environment
+- Add e2e tests in [OpenShift Release Tooling](https://github.com/openshift/release)
+
+Performance testing will be done to optimize the use of resources and to determine the specific FlowCollector settings, with the goal of using less than 5% resources (CPU and memory) and an ideal target of less than 3%.


Can we clarify that the measured overhead will be based on the resource utilization of all cluster components? E.g. ingesting more metrics and evaluating more rules will also increase the resource usage of Prometheus.

Good point. It should include external components that are affected.

enhancements/observability/enable-network-observability-on-day-0.md

simonpasquier · 2026-02-19T16:17:06Z

enhancements/observability/enable-network-observability-on-day-0.md

+3. Wait for NOO to be ready and the OpenShift web console to be available.
+4. Create the "netobserv" namespace if it doesn't exist.
+5. Check if a FlowCollector instance exists. If yes, exit.
+6. Create a FlowCollector instance.


How would CNO signal to the user when one of the steps fails?

It will be in the log file.

What about reporting this via status conditions?

+1 for a status condition

yes, that can be done

A sentence was added about this in the last section.

simonpasquier · 2026-02-19T16:21:23Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+### Workflow Description
+
+Network Observability is enabled by default on day 0 (planning stage).  You don’t have to configure anything when using `openshift-install`, and Network Observability Operator will be installed and a FlowCollector custom resource (CR) will be created (Listing 3).


Is NOO supported for all (including future/development) OCP releases and architectures?

It is today. The goal is to support the new OCP releases and architectures.

openshift-bot · 2026-03-06T00:15:22Z

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2026-03-06T00:15:33Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

OlivierCazade · 2026-03-10T10:14:29Z

/reopen

openshift-ci · 2026-03-10T10:15:05Z

@OlivierCazade: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

enhancements/observability/enable-network-observability-on-day-0.md

Add installNetworkObservability field to enable network observability installation during cluster deployment (day-0). - Add InstallNetworkObservability field to NetworkSpec - Add NetworkObservabilityInstall feature gate enabled in DevPreview and TechPreview - Add integration test suite for the new field Related: openshift/enhancements#1908

Add networkObservability nested structure to configure network observability installation during cluster deployment (day-0). - Add NetworkObservability field to NetworkSpec with NetworkObservabilitySpec type - Add NetworkObservabilityInstallationPolicy enum ("", "InstallAndEnable", "DoNotInstall") - Add NetworkObservabilityInstall feature gate enabled in DevPreview and TechPreview - Add integration test suite for the new field When installationPolicy is set to "" (empty string) or "InstallAndEnable", network observability will be installed and enabled. When set to "DoNotInstall", it will not be installed. Related: openshift/enhancements#1908

everettraven · 2026-03-26T08:38:13Z

enhancements/observability/enable-network-observability-on-day-0.md

+api-approvers:
+  - "@jotak"
+  - "@dave-tucker"
+  - {api-team}


@everettraven here please

everettraven · 2026-03-26T08:50:34Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.


How are you intending to evaluate this?

See PR #2925.

Can you elaborate please @stleerh?
I see wasNetworkObservabilityDeployed but that's checking the Status of the network object - but that doesn't seem the same as checking if NOO was installed or not.
What's the expected behaviour when a customer who previously installed NOO via the catalog, upgrades to an OCP version which now attempts to install it via CNO?

There are two different things :

the CNO controller checks for an existing flowcollector CRD to detect any already installed NOO and to not overwrite it.

once the CNO controller has installed NOO, we want the user to be free to own the deployment and to be able to modify anything. This is where the status is used, once the controller has successfully installed NOO, it set the status and any later reconciliation will check for the status and close the reconciliation.

everettraven · 2026-03-26T09:04:30Z

enhancements/observability/enable-network-observability-on-day-0.md

+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.


Have you considered making this fully managed by the cluster-network-operator instead of utilizing OLM?

If this is going to be considered a core component of OpenShift moving forward, it seems like it could be reasonable to try and move it away from being a layered product and thus installed by OLM so there is a tighter coupling between OCP version and NOO version.

If you take the OLM approach, will you be pinning the version of NOO to a particular version to prevent automatic upgrades? Are we going to support customers modifying the OLM resources created by the CNO? Would we support running a newer version of NOO on an older version of OCP (since OLM supports over-the-air style updates and upgrades?

There are some nuances when leveraging OLM as the deployment mechanism and I'd like to better understand the interactions here.

We want Network Observability to be there when you have Cluster Network Operator (CNO), but it can run independent of that in the upstream version.

We have one version of NOO that is backwards-compatible to all versions of OCP. We've been releasing a new X.Y version alongside the OCP version.

It would be helpful to document how this pattern has been used before, i.e in CIO

everettraven · 2026-03-26T09:06:42Z

enhancements/observability/enable-network-observability-on-day-0.md

+  deploymentModel: Service
+  loki:
+    enable: false
+  namespace: netobserv


If it is going to be an openshift-owned thing, deviating from upstream and using an openshift-* style namespace would be my recommendation

everettraven · 2026-03-26T09:09:34Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+#### Hypershift / Hosted Control Planes
+
+This proposal doesn't change how Network Observability works in a Hosted Control Plane (HCP) environment. Network Observability is supported on host clusters and the management cluster, therefore it will be enabled by default.


How does the cluster-network-operator equivalent in HyperShift work? For example, I know that the HyperShift Control Plane Operator has different mechanics than the cluster-authentication-operator for authentication/authorization related things.

Will there need to be any changes to HyperShift's controller behaviors to support this new functionality?

Network Observability is already supported on HCP. Enabling NOO by default does nothing to change that.

This doesn't answer my question. Are there any changes needed on the HyperShift side to support enabling this by default?

everettraven · 2026-03-26T09:10:54Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+Rather than have CNO enable Network Observability, take the existing Network Observability Operator (NOO) and have it be installed by default in the cluster.  There needs to be some logic to accept the values in openshift-install to decide whether NOO should be enabled or not.
+
+The core components of OpenShift are operators like Cluster Network Operator (CNO) and Cluster Storage Operator (CSO).  NOO is a much smaller component and should not reside at the top level.


I'm not sure the size of a component really dictates how it should be deployed.

Regardless, I don't believe it belongs at this level.

I'm not sure the size of a component really dictates how it should be deployed.

I agree with this. Note that we have something called CloudNetworkConfigController which is a core operator on cloud platforms and does very small things scope wise

My preference is to ship NOO as core.
Is my understanding correct that

There needs to be some logic to accept the values in openshift-install to decide whether NOO should be enabled or not

Is the only reason why this alternative was not considered? or am I missing something? - installer code change avoidance doesn't seem like a valid reason..

Is NetObserv completely tied to "OVN-Kubernetes" ? and is never going to expand beyond being an observability solution outside of the components owned by CNO?

Here are the two main points.

If you have CNO (basically you are running OpenShift), then you should have Network Observability. That's the basis for this feature and where it's linked to OVN-Kubernetes. In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

Network Observability can also run with other container network interface (CNI). So no, it's not exclusively tied to OVN-Kubernetes, but it does support unique features of OVN-Kubernetes.

Here are the two main points.

If you have CNO (basically you are running OpenShift), then you should have Network Observability.

again, why not make the same claim around "if you are running OpenShift, then you are running NOO" and hence you are running NetObserv, maybe I am missing that point still.

That's the basis for this feature and where it's linked to OVN-Kubernetes. In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

sure, that's still possible to do via NOO right? can't NOO adopt to OCP release cycle if its part of OCP? again apologies if I am not familiar with the NOO lifecycle

Network Observability can also run with other container network interface (CNI). So no, it's not exclusively tied to OVN-Kubernetes, but it does support unique features of OVN-Kubernetes.

so does Network Observability work for all the 3rd party CNIs allowed on OCP? what happens when those CNIs are used instead of OVN-Kubernetes? example Cilium. Will NetObsev still be running by default and doing observ for cilium? what about Layer7 - do we plan to expand it there? or is netobserv also strictly restricted to l2/l3/l4 always?

I'm having a hard time wrapping my head around why we want this to essentially be a core feature of OpenShift without coupling it to the OCP version by including it as part of the payload.

Maybe it helps to have a sync call with appropriate stakeholders to hash this out, but my biggest concern with the currently proposed approach is that you mention:

my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

I could be misunderstanding here, but if the intention is that we should be tying NOO versions to specific OCP versions due to dependence of features, continuing to use OLM as the deployment mechanism starts to make things more confusing for end users. They have to start understanding that they are responsible for ensuring that NOO is up-to-date and that they aren't accidentally installing a version that is incompatible with the version of OpenShift that they are running.

IIRC, there are some safeguards in OLM to help mitigate this but it still feels like we are putting the onus on the user to be more attentive to the platform requirements rather than us being responsible for making sure we are always installing a compatible version of NOO automatically and that it will always stay compatible without any user intervention.

Surya Seetharaman, Dave Tucker, Ben Bennett and I had a discussion on April 1 to hash this out. The conclusion is that we agreed to have CNO enable NOO, so the current proposal remains unchanged.

Just a few comments to the questions above:

The upstream Network Observability is not guaranteed to work with all CNIs, meaning we don't formally test against other CNIs.

Each new version of Network Observability is backwards-compatible to all supported OCP versions and adapts accordingly.

everettraven · 2026-03-26T09:11:30Z

Proposed API LGTM.

jpinsonneau

LGTM

openshift-ci · 2026-03-26T09:43:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jpinsonneau
Once this PR has been reviewed and has the lgtm label, please assign moadz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

enhancements/observability/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-03-27T16:31:41Z

@stleerh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dave-tucker

have some questions that need answering around resource ownership, upgrades, lifecycle etc...

dave-tucker · 2026-03-31T13:35:39Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.


Can you elaborate please @stleerh?
I see wasNetworkObservabilityDeployed but that's checking the Status of the network object - but that doesn't seem the same as checking if NOO was installed or not.
What's the expected behaviour when a customer who previously installed NOO via the catalog, upgrades to an OCP version which now attempts to install it via CNO?

dave-tucker · 2026-03-31T13:46:28Z

enhancements/observability/enable-network-observability-on-day-0.md

+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.


You call out OLM v0 constructs, but the implementation uses OLM v1.

Also, what if someone installed NOO via OLM v0?

After discussing with OLM team yesterday we did the following change on the implementation :

to check if NOO is installed, it checks if the flowcollector CRD is already present, this way it works with OLM v0, v1 and Helm install (upstream deployment)

changed the installation to use the OLM v1 objects

The goal was to avoid implementation details, so it will check if NOO is installed, regardless of whether it was done using OLM v0 or v1. Step 2 could have avoided implementation details by just saying "Install NOO".

dave-tucker · 2026-03-31T14:01:27Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+The actual enabling of Network Observability is done in the Cluster Network Operator (CNO).  The rationale is that we want the network observability feature to be part of networking.  This is as opposed to being part of the general observability or as a standalone entity.  Yet, there is still a separation at the lower level so that the two can be independently developed and released at different times, particularly for bug fixes.
+
+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:


What if a users changes from InstallAndEnable to DoNotInstall?
Does CNO uninstall the operator? Or is the user expected to clean this up manually?

No, once NOO has been installed, the CNO has finished its work, switching back to DoNotInstall do nothing.

This is why it's called "DoNotInstall" rather than "Uninstall".

It would be beneficial to document that for future reference

dave-tucker · 2026-03-31T14:03:23Z

enhancements/observability/enable-network-observability-on-day-0.md

+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.


What happens if the user mutates the resources created by CNO (i.e ClusterExtension)?

The user own both the ClusterExtension and the Flowcollector.

If the user delete the ClusterExtension, NOO is removed and the CNO does not try to reinstall it.

dave-tucker · 2026-03-31T14:03:57Z

enhancements/observability/enable-network-observability-on-day-0.md

+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.
+3. Wait for NOO to be ready.
+4. Create the "netobserv" namespace if it doesn't exist.


%s/netobserv/openshift-network-observability/g?

This is a discussion we've had in the past. I'll resurface this topic, and while this proposal might influence that decision, it has to be in agreement with what Network Observability does.

dave-tucker · 2026-03-31T14:06:07Z

enhancements/observability/enable-network-observability-on-day-0.md

+3. Wait for NOO to be ready.
+4. Create the "netobserv" namespace if it doesn't exist.
+5. Check if a FlowCollector instance exists. If yes, exit.
+6. Create a FlowCollector instance.


Who owns the FlowCollector. Is it CNO? or the user?
Can a user change this after install?
What happens if, in a later release, we decide to enable a different set of features by default?

The FlowCollector is owned by the user.

The user can change any field or even delete it to disable NOO.

Network Observability Operator (NOO) manages FlowCollector. It is very likely the user will want to change attributes of their FlowCollector instance.

If the default features change, it will only affect future clusters that don't have Network Observability already enabled. We don't anticipate problems with different features on different clusters, since this is primarily a single cluster component. Worse case, we can make a change to update the feature set, but that is something we want to avoid.

To summarize the discussion we had a few weeks ago, it was suggested that CNO deploys an empty FlowCollector, and that NOO is then responsible for establishing the default featureset, potentially writing that back to the CRD if it needs to. That way, CNO doesn't need to be aware of NOO defaults in future.

dave-tucker · 2026-03-31T14:19:10Z

enhancements/observability/enable-network-observability-on-day-0.md

+5. Check if a FlowCollector instance exists. If yes, exit.
+6. Create a FlowCollector instance.
+
+The Reconciler leverages the existing framework and reuses the concept of client, scheme, and manager.  It provides a clear ownership by having a separate controller for it.  If the Network CR changes, the Reconciler will repeat the above steps.  Note it doesn’t monitor NOO or any of NOO's components for changes, and it doesn’t do any upgrades.  That is still the responsibility of NOO.


and it doesn’t do any upgrades

Networking component upgrade is managed via CNO, which, AIUI has pinned versions for every release.
I think it would make sense for NOO to pin a version for a given OpenShift release too, and have CNO edit the ClusterExtension to trigger the upgrade of NOO for consistency with how other components are updated.

In addition, my push is that as features get added to OVN-Kubernetes, they should be immediately supported by Network Observability in the same release.

☝️ is a good reason to do so. Let's say we rollout OVN-K8s 4.1.1 and to observe a new feature you need NOO 24.1.1 (and maybe a new feature being toggled on in the FlowCollector). Managing this through CNO seems easier than asking the user to upgrade NOO from the catalog and edit the flow collector themselves.

I don't see an issue with NOO upgrading as needed for OVN upgrades. We can make that mostly transparent to the user.

In fact, I would say it would make things more complicated if you have to go through COO, because there might be other reasons why you want to upgrade Network Observability. And if you have two ways of upgrading, either through COO or NOO, that just make things more confusing. Having the tie-in plus still be separate and independent gives it the most flexibility.

Paraphrasing from our previous call, NOO will always upgrade itself to the latest - I don't recall if that was the default, or user opt-in. CNO doesn't care, we just keep networking up-to-date.

dave-tucker · 2026-03-31T15:27:44Z

enhancements/observability/enable-network-observability-on-day-0.md

+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.


Can you explain how CNO and NOO lifecycles are supposed to interact?
Maybe take a look over https://github.com/openshift/cluster-network-operator/blob/master/docs/operands.md#cluster-network-operator-operands and how it relates to the rollout of NOO.

Should NOO enter degraded state if NOO has failed to deploy for some reason?

The new controller first install NOO, and then create the flowcollector object.

When the flowcollector object is created, NOO will install the network observability stack. If something fails during this phase NOO will enter degraded state.

But if something fails during NOO installation itself, it does not enter degraded state but the ClusterExtension does display some errors.

Add networkObservability nested structure to configure network observability installation during cluster deployment (day-0). - Add NetworkObservability field to NetworkSpec with NetworkObservabilitySpec type - Add NetworkObservabilityInstallationPolicy enum ("", "InstallAndEnable", "DoNotInstall") - Add NetworkObservabilityInstall feature gate enabled in DevPreview and TechPreview - Add integration test suite for the new field When installationPolicy is set to "" (empty string) or "InstallAndEnable", network observability will be installed and enabled. When set to "DoNotInstall", it will not be installed. Related: openshift/enhancements#1908

dave-tucker

leaving some comments here based on the discussions we had before easter.
overall lgtm

dave-tucker · 2026-04-07T17:46:25Z

enhancements/observability/enable-network-observability-on-day-0.md

+
+The actual enabling of Network Observability is done in the Cluster Network Operator (CNO).  The rationale is that we want the network observability feature to be part of networking.  This is as opposed to being part of the general observability or as a standalone entity.  Yet, there is still a separation at the lower level so that the two can be independently developed and released at different times, particularly for bug fixes.
+
+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:


It would be beneficial to document that for future reference

dave-tucker · 2026-04-07T17:47:48Z

enhancements/observability/enable-network-observability-on-day-0.md

+In CNO, it adds a new controller for observability and adds it to the manager.  The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field.  If set to `InstallAndEnable`, it does the following:
+
+1. Check if Network Observability Operator (NOO) is installed. If yes, exit.
+2. Install NOO using OLM's OperatorGroup and Subscription.


It would be helpful to document how this pattern has been used before, i.e in CIO

dave-tucker · 2026-04-07T17:50:24Z

enhancements/observability/enable-network-observability-on-day-0.md

+3. Wait for NOO to be ready.
+4. Create the "netobserv" namespace if it doesn't exist.
+5. Check if a FlowCollector instance exists. If yes, exit.
+6. Create a FlowCollector instance.


To summarize the discussion we had a few weeks ago, it was suggested that CNO deploys an empty FlowCollector, and that NOO is then responsible for establishing the default featureset, potentially writing that back to the CRD if it needs to. That way, CNO doesn't need to be aware of NOO defaults in future.

dave-tucker · 2026-04-07T17:51:59Z

enhancements/observability/enable-network-observability-on-day-0.md

+5. Check if a FlowCollector instance exists. If yes, exit.
+6. Create a FlowCollector instance.
+
+The Reconciler leverages the existing framework and reuses the concept of client, scheme, and manager.  It provides a clear ownership by having a separate controller for it.  If the Network CR changes, the Reconciler will repeat the above steps.  Note it doesn’t monitor NOO or any of NOO's components for changes, and it doesn’t do any upgrades.  That is still the responsibility of NOO.


Paraphrasing from our previous call, NOO will always upgrade itself to the latest - I don't recall if that was the default, or user opt-in. CNO doesn't care, we just keep networking up-to-date.

dave-tucker · 2026-04-07T17:54:41Z

/lgtm

openshift-ci bot requested review from jcantrill and simonpasquier December 16, 2025 13:58

openshift-ci bot requested a review from jotak December 16, 2025 16:22

simonpasquier reviewed Dec 17, 2025

View reviewed changes

jan--f reviewed Dec 22, 2025

View reviewed changes

enhancements/observability/enable-network-observability-on-day-0.md Show resolved Hide resolved

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2026

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 13, 2026

stleerh force-pushed the enable-network-observability branch 2 times, most recently from 9a77395 to a22a08a Compare February 16, 2026 23:17

simonpasquier reviewed Feb 17, 2026

View reviewed changes

jotak reviewed Feb 17, 2026

View reviewed changes

enhancements/observability/enable-network-observability-on-day-0.md Outdated Show resolved Hide resolved

simonpasquier reviewed Feb 19, 2026

View reviewed changes

openshift-ci bot closed this Mar 6, 2026

openshift-ci bot reopened this Mar 10, 2026

This was referenced Mar 10, 2026

CORENET-6714: Enable Network Observability on Day 0 openshift/api#2752

Open

CORENET-6714: Enable Network Observability on Day 0 openshift/cluster-network-operator#2925

Open

everettraven reviewed Mar 11, 2026

View reviewed changes

enhancements/observability/enable-network-observability-on-day-0.md Outdated Show resolved Hide resolved

everettraven reviewed Mar 11, 2026

View reviewed changes

enhancements/observability/enable-network-observability-on-day-0.md Outdated Show resolved Hide resolved

stleerh added 10 commits March 25, 2026 10:31

Enable Network Observability on Day 0

3236e7f

"not enable" rather than "disable"

56f2fba

Minor edit

2c3467a

Use enabling Network Observability as the feature gate

d1528f4

Add GA requirements, no tech preview

4a83664

Respond to feedback

b0421f5

Fix lint issues

d7dfee5

More lint changes

6187a1e

Clarify HCP support and resource measurement. Update CNO tasks.

5df95e0

Revert back to Tech Preview. Align with changes. Address feedback.

073794e

stleerh force-pushed the enable-network-observability branch from 423c8df to 073794e Compare March 25, 2026 17:33

nit

3a7ea40

everettraven reviewed Mar 26, 2026

View reviewed changes

jpinsonneau approved these changes Mar 26, 2026

View reviewed changes

stleerh added 2 commits March 27, 2026 08:36

fix lint

a4e8e4e

Update reviewers

8972b96

tssurya suggested changes Mar 27, 2026

View reviewed changes

dave-tucker suggested changes Mar 31, 2026

View reviewed changes

stleerh requested a review from everettraven April 7, 2026 15:33

dave-tucker reviewed Apr 7, 2026

View reviewed changes

openshift-ci bot assigned dave-tucker Apr 7, 2026

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2026


		Being able to manage and observe the network in an OpenShift cluster is critical in maintaining the health and integrity of the network. Without it, there’s no way to verify whether your changes are working as expected or whether your network is experiencing issues.

		Currently, Network Observability is an optional operator that many customers are not aware of. A majority of customers using OpenShift Networking do not have Network Observability installed. Customers are missing out on features that they should have and have already paid for.


		### Risks and Mitigations

		* Network Observability requires CPU, memory, and storage that the customer might not be aware of.


		### Topology Considerations

		All topologies are supported where CNO is supported, so this excludes MicroShift.


		Rather than actually installing NOO and creating the FlowCollector instance, it is less risky and simpler to just display a panel or a button to let the user install and enable Network Observability. This resolves the awareness issue. However, by doing this, it will get much less installs compared to making it enabled by default. It goes against the principle that networking and network observability should always go hand in hand and be there from the start.

		## Alternatives (Not Implemented)


		### Workflow Description

		Network Observability is enabled by default on day 0 (planning stage). You don’t have to configure anything when using `openshift-install`, and Network Observability Operator will be installed and a FlowCollector custom resource (CR) will be created (Listing 3).


		In CNO, it adds a new controller for observability and adds it to the manager. The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field. If set to `InstallAndEnable`, it does the following:

		1. Check if Network Observability Operator (NOO) is installed. If yes, exit.


		#### Hypershift / Hosted Control Planes

		This proposal doesn't change how Network Observability works in a Hosted Control Plane (HCP) environment. Network Observability is supported on host clusters and the management cluster, therefore it will be enabled by default.


		Rather than have CNO enable Network Observability, take the existing Network Observability Operator (NOO) and have it be installed by default in the cluster. There needs to be some logic to accept the values in openshift-install to decide whether NOO should be enabled or not.

		The core components of OpenShift are operators like Cluster Network Operator (CNO) and Cluster Storage Operator (CSO). NOO is a much smaller component and should not reside at the top level.


		The actual enabling of Network Observability is done in the Cluster Network Operator (CNO). The rationale is that we want the network observability feature to be part of networking. This is as opposed to being part of the general observability or as a standalone entity. Yet, there is still a separation at the lower level so that the two can be independently developed and released at different times, particularly for bug fixes.

		In CNO, it adds a new controller for observability and adds it to the manager. The controller is a single Go file where the Reconciler reads the state of the `installationPolicy` field. If set to `InstallAndEnable`, it does the following:

Conversation

stleerh commented Dec 16, 2025

Uh oh!

jotak commented Dec 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stleerh Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jotak Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-bot commented Feb 6, 2026

stleerh Feb 13, 2026 •

edited

Loading

jotak Feb 17, 2026 •

edited

Loading

stleerh Feb 17, 2026 •

edited

Loading