NMS-19537: Add toggle to disable metadata interpolation#8311
NMS-19537: Add toggle to disable metadata interpolation#8311JSimo wants to merge 1 commit intoOpenNMS:developfrom
Conversation
8699123 to
d53051e
Compare
cgorantla
left a comment
There was a problem hiding this comment.
Metadata is now embedded as basic feature.
Lot of things like moving to environment variables in https://github.com/OpenNMS/opennms/blob/develop/opennms-base-assembly/src/main/filtered/etc/opennms-datasources.xml depend on it.
Also scv feature won't work without metadata.
Ideally we want to fix the performance issues you are seeing w.r.to metadata instead of disabling the feature itself.
I agree with fixing it at the source, unsure if I agree with this feature in an performance sense, seem tricky to get correct. In the example given for datasource, I would say that can be accomplished in other ways, standardised spring ways of managing beans and wiring in data. Rather than using the custom metadata interpolation feature. Though about how to fix it without disabling. Had some thoughts around it would ideally be some kind of batch based solution. The issue I found looking and greping around that such an change seemed very much major in nature, and not as easily done. And as it impacts both poller and collectd, we thought the least impactful fix would be to disable it so we don’t have these calls made in the first place. I acknowledge such an solution is far from ideal. But at its current state this feature does not seem ready to be used at an larger scale. Honestly I hope any more introduction of this feature in such hot paths are tested thoroughly with tracing and all. Comparing before and after performance wise. As to bes sure the performance difference is negligible and not breaking as it stands now. |
|
@cgorantla I think we need to solve the underlying issue from a performance perspective. The injection of configuration parameters using the metadata in various places becomes a crucial part that allows us to configure, deploy and manage things much easier and we gain a lot of flexibility. The problem we are phasing right now with respect to this PR is about how long does @JSimo and anyone with a larger deployment for metric collection can live with an OpenNMS instance they only can use partially. The workaround right now without this PR would be to disable thresholding completely, and that can be a showstopper for users from here on. The feature flag is just a workaround and trades in the metadata interpolation to have thresholding capabilities back working – and thresholding is critical. The PR is taking away some pressure on this topic by offering a workaround without maintaining a fork of the software. I think the acceptance is mainly depend on what could be a timing expectation fixing this performance problem. The compromise would be to introduce the property as suggested and keep Metadata interpolation enabled by default. As soon we fixed the underlying performance problem we can remove the feature flag. |
We solved one issue and we are already working on another issue w.r.to metadata. Both of these would land in the same release next month so I hope there is no need for this temporary fix. If we are not able to solve performance issues to a satisfactory level, we can consider alternatives. |
Thanks @JSimo for tracking down performance issues and really appreciate your contributions here. I'm not opposed to having this option as short term fix but users will lose lot of necessary features like having env variables and scv. Lot of our features like time series / OpenConfig completely depend on having metadata enabled. We will try to address performance issues sooner by next release otherwise we can think of other options |
Thank you for taking this seriously and investigating. Appreciate the perspective and I understand where you come from. Its far from an ideal to have an feature toggle like this, took some time to get this working, hoping to avoid an fork. We have been on custom fork for too long (prev 26 fork, later 32 fork, now 35 fork) had hoped to finally escape our fork, now that snmp poller performance changes was added. The motivation is that this is the second time we have tried to upgrade and failed due to the metadata feature. We will likely be forking OpenNMS on our side once more to get this feature flag added. Will revisit trying to come back to mainline version of OpenNMS some upcoming major. We really need to upgrade to get the fixes around event id. Re; CLA will fix commit tomorrow so correct email is used in commit. |
@cgorantla As far as I have experienced these issues, it will be very easy to test this very early in a release candidate. We can use the benchmark setup in https://github.com/opennms-forge/opennms-benchmark/wiki/Collectd-SNMP-and-Kafka-Producer with disabled local time series data persistence. If you can tell us a point in time where we can do an A/B test. Deployment A with thresholding enabled and Deployment B with thresholding disabled as @JSimo described it in his issue. You can just look at the CPU usage as a baseline for A and B to get an idea what the left over of the performance impact is. |
Adds system property 'org.opennms.metadata.interpolation.enabled' (default: true) that allows disabling metadata interpolation globally for deployments that don't use metadata expressions. When disabled, the EntityScopeProvider returns empty scopes, avoiding database queries during polling, collection, thresholding, and other operations that normally resolve metadata. - Add NoOpEntityScopeProvider that returns EmptyScope for all requests - Add EntityScopeProviderFactory to switch between implementations - Update documentation Co-authored-by: JSimo <jsimo@users.noreply.github.com> Co-authored-by: WilssoN93 <wilsson93@users.noreply.github.com>
d53051e to
017d822
Compare
|
@cgorantla One more thought; Can the interpolate evaluate what it needs to fetch, and only fetch the metadata it requires? (I.e. only doing the DB calls it requires, SCV checking, or ENV checking). Then this kind of noop behavior will have similar performance. |
Yeah, I have some ideas on fixing this at the root. Will start with Poller, https://opennms.atlassian.net/browse/NMS-19560 |
Adds system property
org.opennms.metadata.interpolation.enabled(default:true) that allows disabling metadata interpolation globally in deployments that don't use metadata expressions.When disabled, the EntityScopeProvider returns empty scopes, avoiding database queries during polling, collection, thresholding, and other operations that normally resolve metadata.
The property has been used and validated in an larger deployment that does not use the metadata features. It has resolved the performance issues previously detected inside
collectd, and as an benefit improved the performance ofpoller.Here is an flamegraph for this change;
Flamegraph 35 fork
All Contributors
External References