[logstash] Enable TSDB for metrics data streams#17401
[logstash] Enable TSDB for metrics data streams#17401AndersonQ merged 4 commits intoelastic:mainfrom
Conversation
1b1af21 to
6cbf6df
Compare
Vale Linting ResultsSummary: 4 warnings, 2 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| packages/logstash/docs/README.md | 33 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'and so on' instead of 'etc'. |
| packages/logstash/docs/README.md | 38 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'. |
| packages/logstash/docs/README.md | 40 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'for example' instead of 'e.g'. |
| packages/logstash/docs/README.md | 75 | Elastic.DontUse | Don't use 'Note that'. |
💡 Suggestions (2)
| File | Line | Rule | Message |
|---|---|---|---|
| packages/logstash/docs/README.md | 36 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| packages/logstash/docs/README.md | 75 | Elastic.WordChoice | Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
🚀 Benchmarks reportTo see the full report comment with |
48bd9ea to
7d21447
Compare
| - name: agent.id | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
agent.id is added so the data stream can accept the events produced by the Cel input or metricbeat when an error occurs. On error they produce an event without the metric metadata and dimensions, causing it to be rejected. The metadata guaranteed to be there is agent.id. Even though it isn't really a metric, those documents are allowed for backwards compatibility.
7d21447 to
d8af58b
Compare
|
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Pull request overview
This PR enables Time Series Data Stream (TSDB) mode for the Logstash integration's health_report and node_cel data streams, and corrects metric type annotations across multiple data streams. The changes allow error events to be properly ingested in TSDB mode by adding agent.id as a dimension across all affected data streams, ensuring backward compatibility.
Changes:
- Enabled TSDB mode for
health_reportandnode_celdata streams - Added dimension annotations to support TSDB routing (agent.id, node names, pipeline identifiers, host names)
- Corrected metric_type from counter to gauge for point-in-time measurements (thread counts, heap max, queue depths, connection counts)
- Added metric_type annotations to previously unannotated numeric fields
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| packages/logstash/manifest.yml | Version bump from 2.8.0 to 2.9.0 |
| packages/logstash/docs/README.md | Updated documentation tables to include agent.id field and Metric Type column for all affected data streams |
| packages/logstash/data_stream/health_report/manifest.yml | Enabled TSDB mode by adding index_mode: "time_series" |
| packages/logstash/data_stream/health_report/fields/fields.yml | Added dimensions (node.name, node.uuid, pipeline.id) and metric_type annotations for worker utilization and severity fields |
| packages/logstash/data_stream/health_report/fields/ecs.yml | Added agent.id as dimension |
| packages/logstash/data_stream/node_cel/manifest.yml | Enabled TSDB mode by adding index_mode: "time_series" |
| packages/logstash/data_stream/node_cel/fields/fields.yml | Added dimension (logstash.name), corrected metric_types (counter→gauge), and added metric_type annotations for pipeline events and queue fields |
| packages/logstash/data_stream/node_cel/fields/ecs.yml | Added agent.id as dimension |
| packages/logstash/data_stream/pipeline/fields/fields.yml | Added dimension (host.name), added metric_type annotations for info fields, corrected queues.events from counter to gauge |
| packages/logstash/data_stream/pipeline/fields/ecs.yml | Added agent.id as dimension |
| packages/logstash/data_stream/plugins/fields/fields.yml | Added dimension (host.name), corrected beats connection metrics from counter to gauge |
| packages/logstash/data_stream/plugins/fields/ecs.yml | Added agent.id as dimension |
| packages/logstash/changelog.yml | Added version 2.9.0 entry documenting TSDB enablement and metric_type fixes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Enable TSDB for health_report and node_cel data streams in the Logstash integration and add metric_type annotations to pipeline and plugins. Dimensions added: - health_report: service.hostname logstash.node.name logstash.node.uuid logstash.pipeline.id logstash.node.version logstash.node.address logstash.pipeline.diagnosis.id logstash.pipeline.diagnosis.help_url logstash.pipeline.impacts.id logstash.pipeline.impacts.impact_areas - node_cel: input.type cloud.image.id host.os.build host.os.codename service.hostname logstash.node.stats.logstash.name logstash.node.stats.logstash.uuid logstash.node.stats.logstash.version logstash.node.stats.logstash.host logstash.node.stats.logstash.http_address logstash.elasticsearch.cluster.id - pipeline: input.type cloud.image.id host.os.build host.os.codename service.hostname logstash.pipeline.host.name, logstash.pipeline.elasticsearch.cluster.id, logstash.pipeline.total.queues.type - plugins: input.type cloud.image.id host.os.build host.os.codename service.hostname logstash.pipeline.host.name logstash.pipeline.name, logstash.pipeline.host.address logstash.pipeline.plugin.type logstash.pipeline.plugin.codec.name logstash.pipeline.plugin.input.name logstash.pipeline.plugin.filter.name logstash.pipeline.plugin.output.name Add agent.id as a dimension so the data streams can accept events produced by the Cel input or Metricbeat on error. Error events lack metric metadata and dimensions, causing them to be rejected. Since agent.id is guaranteed to be present, it serves as a fallback dimension. Although not strictly a metric, these documents are allowed for backwards compatibility. Annotate numeric fields with appropriate metric_type for the health_report, node_cel, pipeline, plugins data streams. metric_type corrections (counter → gauge): - node_cel: jvm.threads.count, jvm.threads.peak_count, jvm.mem.heap_max_in_bytes, queue.events_count - pipeline: logstash.pipeline.queues.events - plugins: beats.peak_connections, beats.current_connections Assisted by Cursor
1065ad8 to
9be4f2d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dimension: true | ||
| - name: mac | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
host.ip and host.mac are marked as dimension: true, but these ECS fields are commonly multi-valued arrays. Time series dimension fields must be single-valued; if a host reports multiple IPs/MACs, TSDB ingestion can be rejected. Consider removing dimension: true from these fields (and keep stable identifiers like host.id, host.name, host.hostname as dimensions).
| dimension: true | |
| - name: mac | |
| external: ecs | |
| dimension: true | |
| - name: mac | |
| external: ecs |
| - name: service.id | ||
| external: ecs | ||
| dimension: true | ||
| - name: service.type | ||
| external: ecs | ||
| dimension: true | ||
| - name: service.version | ||
| external: ecs | ||
| dimension: true | ||
| - name: service.address | ||
| external: ecs | ||
| dimension: true | ||
| - name: service.name | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
This data stream already uses index_mode: "time_series", and this change adds many more dimension: true fields (service/event/agent + cloud/host + pipeline/plugin identifiers). The manifest currently doesn't set index_template.settings.index.mapping.dimension_fields.limit, so once the dimension count exceeds the default (16) indexing/template creation can fail. Please update packages/logstash/data_stream/plugins/manifest.yml to set an appropriate dimension_fields.limit (or reduce the dimension set).
| dimension: true | ||
| - name: mac | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
host.ip and host.mac are marked as dimension: true, but these ECS fields are commonly multi-valued arrays. Time series dimension fields must be single-valued; if a host reports multiple IPs/MACs, TSDB ingestion can be rejected. Consider removing dimension: true from these fields (and keep stable identifiers like host.id, host.name, host.hostname as dimensions).
| dimension: true | |
| - name: mac | |
| external: ecs | |
| dimension: true | |
| - name: mac | |
| external: ecs |
| dimension: true | ||
| - name: mac | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
host.ip and host.mac are marked as dimension: true, but these ECS fields are commonly multi-valued arrays. Time series dimension fields must be single-valued; if an agent reports multiple IPs/MACs (common on dual-stack hosts), TSDB ingestion can be rejected. Consider removing dimension: true from these fields (and rely on stable identifiers like host.id, host.name, host.hostname) or otherwise ensure a single value is indexed.
| dimension: true | |
| - name: mac | |
| external: ecs | |
| dimension: true | |
| - name: mac | |
| external: ecs |
| dimension: true | ||
| - name: service.address | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
This data stream already uses index_mode: "time_series", and this change adds many more dimension: true fields (ECS service/event/agent + cloud/host + logstash pipeline fields). The manifest currently doesn't set index_template.settings.index.mapping.dimension_fields.limit, so once the dimension count exceeds the default (16) indexing/template creation can fail. Please update packages/logstash/data_stream/pipeline/manifest.yml to set an appropriate dimension_fields.limit (or reduce the dimension set).
| dimension: true | |
| - name: service.address | |
| external: ecs | |
| dimension: true | |
| - name: service.address | |
| external: ecs |
There was a problem hiding this comment.
mashhurs
left a comment
There was a problem hiding this comment.
I am still on it this (setting up the stack and testing) but to move forward, I have left some questions.
Also, I wonder how integration upgrade experience would be (sorry not much experience with entire process):
- before upgrade, index isn't with tsds mode. How does it affect after upgrade? Will integration create new index with mapping mode changed?
- on the dashboards, will older metrics be available?
- how does field metric_type change impact?
- can you also please check if package conditions need to change? -
Thank you for the great work.
| external: ecs | ||
| - name: service.id | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
Do we need to add as much as possible more fields to the dimension? I was thinking if it will be meaningful to include node.uuid + pipeline.id granularity (one series for {node vs pipeline}) if it makes sense. Also, AFAIK, there is also limitation (16 fields by default?) for the TSDB dimension if need to change the mapping, but you better know than me...
There was a problem hiding this comment.
Do we need to add as much as possible more fields to the dimension?
I was talking to the ES storage engine team, ideally a TSDB has only dimension and metrics. The data we append to each event has more than that, so to try to achieve it, I added the dimensions that seems to be redundant, which will not create a new time series (tsid). Thus, minimising the number of non-metric non-dimension fields. I'll ping you on slack the discussion with the ES team.
Also, AFAIK, there is also limitation (16 fields by default?)
The dimension limit is pretty high now, 32768, see docs (look for index.mapping.dimension_fields.limit ). Which is the same for the 8.17 stach, see here
I was thinking if it will be meaningful to include node.uuid + pipeline.id granularity (one series for {node vs pipeline})
On this data stream or on all of them? I can add it
|
Hi, let me try to answer your questions
Yes, the index is rolled over and the new backing index is a TSDB
Yes, I haven't observed any issue. But please, double check that. On my tests not visualisation/charts had data since the beginning (before the migration). But as far as The only thing I could imagine having an impact is for any metric that had its type fixed, if there were anything that needed the old type. However, given the old type was wrong, anything depending on the old type should be wrong as well.
It's what tells ES what type of metric it is so ES can store it in the best way possible. It's what I know from the docs.
I'll look into it |
| external: ecs | ||
| - name: service.id | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
Do we need to add as much as possible more fields to the dimension?
I was talking to the ES storage engine team, ideally a TSDB has only dimension and metrics. The data we append to each event has more than that, so to try to achieve it, I added the dimensions that seems to be redundant, which will not create a new time series (tsid). Thus, minimising the number of non-metric non-dimension fields. I'll ping you on slack the discussion with the ES team.
Also, AFAIK, there is also limitation (16 fields by default?)
The dimension limit is pretty high now, 32768, see docs (look for index.mapping.dimension_fields.limit ). Which is the same for the 8.17 stach, see here
I was thinking if it will be meaningful to include node.uuid + pipeline.id granularity (one series for {node vs pipeline})
On this data stream or on all of them? I can add it
| dimension: true | ||
| - name: id | ||
| external: ecs |
There was a problem hiding this comment.
@mashhurs, do you think host.id is stable enough? Some tsdb here use ephemeral_id as dimension, so it seems to me host.id would change as much as ephemeral_id. At the end the question is do you think the host.id can change and it'd still be the same host?
| dimension: true | ||
| - name: service.address | ||
| external: ecs | ||
| dimension: true |
There was a problem hiding this comment.
mashhurs
left a comment
There was a problem hiding this comment.
@AndersonQ, first of all thank you for clarifications. I don't have concerns after reading the slack thread with storage-engine team you clarified dimensions.
- One thing left if you please check the conditions - it doesn't seem we need to change but for the safety if we need to Kibana 8.18+.
- I have done couple of tests, including upgrade and fresh set up etc...
One scenario didn't work (tested twice, maybe I did something wrong) which is when upgrading with policies (screenshot below). I of course updated elastic-agent with the new policy including generated API key. Can you please check it?
Hi @mashhurs, What exactly didn't work? What steps did you follow to produce the error? what was the error you got? |
I spinned up the stack (with elastic-package) where LS integration 2.8.0 was provide. I installed the integration, added standalone agent ( |
Hum... I'd need the agent logs at least, ideally an agent diagnostics. I confess I'm not sure if it's possible to get a diagnostic of an agent that isn't installed. Let me know when you have the doagnostics/logs |
I don't think there is anything wrong, but I never tried like that. I do as I described on "How to test this PR", which is basically: install and enroll the agent, build install the integration, upgrade the integration, check everything. And also the TSDB migration test kit, but my modified version. From the logs, I don't see any error or issue. If the data eventually appears in the dashboards, that means the data is there. I think the issue might be related to the integration upgrade process or just timing between it showing it's updated and all the index rollover. But I'm guessing here. From what I observed, just installing the new version of the integration is enough to update the indexes, even if you don't upgrade it on a specific policy. |
|
@AndersonQ please rebase your PR 🙏 This one got merged first - #17009 |
💚 Build Succeeded
History
cc @AndersonQ |
|
Package logstash - 2.10.0 containing this change is available at https://epr.elastic.co/package/logstash/2.10.0/ |




Proposed commit message
Data stream changes summary
1. health_report
Dimensions
service.hostnameservice.idservice.typeservice.versionservice.addressservice.nameevent.datasetevent.moduleagent.idlogstash.node.namelogstash.node.versionlogstash.node.addresslogstash.node.uuidlogstash.pipeline.idNeither Dimension nor Metric
@timestampprocess.pidecs.versionevent.durationerror.messagelogstash.node.symptomlogstash.node.symptom.textlogstash.node.statuslogstash.pipeline.statuslogstash.pipeline.statelogstash.pipeline.symptomlogstash.pipeline.symptom.textlogstash.pipeline.diagnosis.idlogstash.pipeline.diagnosis.causelogstash.pipeline.diagnosis.cause.textlogstash.pipeline.diagnosis.actionlogstash.pipeline.diagnosis.action.textlogstash.pipeline.diagnosis.help_urllogstash.pipeline.diagnosis.help_url.textlogstash.pipeline.impacts.idlogstash.pipeline.impacts.descriptionlogstash.pipeline.impacts.description.textlogstash.pipeline.impacts.impact_areasMetrics Added / Changed
All are newly added (no pre-existing metric_type in this data stream).
logstash.pipeline.impacts.severitylogstash.pipeline.flow.worker_utilization.currentlogstash.pipeline.flow.worker_utilization.last_1_hourlogstash.pipeline.flow.worker_utilization.last_5_minuteslogstash.pipeline.flow.worker_utilization.last_15_minuteslogstash.pipeline.flow.worker_utilization.lifetimelogstash.pipeline.flow.worker_utilization.last_1_minutelogstash.pipeline.flow.worker_utilization.last_24_hours2. node_cel
Dimensions
service.hostnameservice.idservice.typeservice.versionservice.addressservice.nameevent.datasetevent.moduleagent.idinput.typecloud.account.idcloud.availability_zonecloud.instance.idcloud.instance.namecloud.machine.typecloud.providercloud.regioncloud.project.idcloud.image.idhost.architecturehost.domainhost.hostnamehost.idhost.iphost.machost.namehost.os.familyhost.os.kernelhost.os.namehost.os.platformhost.os.versionhost.typehost.os.buildhost.os.codenamelogstash.elasticsearch.cluster.idlogstash.node.stats.logstash.uuidlogstash.node.stats.logstash.versionlogstash.node.stats.logstash.ephemeral_idlogstash.node.stats.logstash.hostlogstash.node.stats.logstash.http_addresslogstash.node.stats.logstash.namelogstash.node.stats.pipelines.idlogstash.node.stats.pipelines.hashlogstash.node.stats.pipelines.ephemeral_idNeither Dimension nor Metric
@timestampprocess.pidecs.versionevent.durationerror.messagecontainer.idcontainer.image.namecontainer.labelscontainer.namehost.containerizedlogstash.node.stats.timestamplogstash.node.stats.logstash.snapshotlogstash.node.stats.logstash.statuslogstash.node.stats.logstash.pipelineslogstash.node.stats.pipelines.queue.typelogstash.node.stats.os.cgroup.cpuacct.control_grouplogstash.node.stats.os.cgroup.cpu.control_groupMetrics Added / Changed
Changed (counter → gauge):
logstash.node.stats.jvm.threads.countlogstash.node.stats.jvm.threads.peak_countlogstash.node.stats.jvm.mem.heap_max_in_byteslogstash.node.stats.queue.events_countNewly added metric_type:
logstash.node.stats.pipelines.reloads.failureslogstash.node.stats.pipelines.reloads.successeslogstash.node.stats.pipelines.queue.events_countlogstash.node.stats.pipelines.queue.queue_size_in_byteslogstash.node.stats.pipelines.queue.max_queue_size_in_byteslogstash.node.stats.pipelines.events.inlogstash.node.stats.pipelines.events.outlogstash.node.stats.pipelines.events.filteredlogstash.node.stats.pipelines.events.duration_in_millislogstash.node.stats.pipelines.events.queue_push_duration_in_millis3. pipeline
Dimensions
service.hostnameservice.idservice.typeservice.versionservice.addressservice.nameevent.datasetevent.moduleagent.idinput.typecloud.account.idcloud.availability_zonecloud.instance.idcloud.instance.namecloud.machine.typecloud.providercloud.regioncloud.project.idcloud.image.idhost.architecturehost.domainhost.hostnamehost.idhost.iphost.machost.namehost.os.familyhost.os.kernelhost.os.namehost.os.platformhost.os.versionhost.typehost.os.buildhost.os.codenamelogstash.pipeline.namelogstash.pipeline.elasticsearch.cluster.idlogstash.pipeline.info.ephemeral_idlogstash.pipeline.host.namelogstash.pipeline.host.addresslogstash.pipeline.total.flow.queues.typeNeither Dimension nor Metric
@timestampprocess.pidecs.versionevent.durationerror.messagecontainer.idcontainer.image.namecontainer.labelscontainer.namehost.containerizedMetrics Added / Changed
Changed (counter → gauge):
logstash.pipeline.total.flow.queues.eventsNewly added metric_type:
logstash.pipeline.info.batch_sizelogstash.pipeline.info.batch_delaylogstash.pipeline.info.workers4. plugins
Dimensions
service.hostnameservice.idservice.typeservice.versionservice.addressservice.nameevent.datasetevent.moduleagent.idinput.typecloud.account.idcloud.availability_zonecloud.instance.idcloud.instance.namecloud.machine.typecloud.providercloud.regioncloud.project.idcloud.image.idhost.architecturehost.domainhost.hostnamehost.idhost.iphost.machost.namehost.os.familyhost.os.kernelhost.os.namehost.os.platformhost.os.versionhost.typehost.os.buildhost.os.codenamelogstash.pipeline.namelogstash.pipeline.idlogstash.pipeline.elasticsearch.cluster.idlogstash.pipeline.host.namelogstash.pipeline.host.addresslogstash.pipeline.plugin.typelogstash.pipeline.plugin.codec.namelogstash.pipeline.plugin.codec.idlogstash.pipeline.plugin.input.namelogstash.pipeline.plugin.input.idlogstash.pipeline.plugin.input.elasticsearch.cluster.idlogstash.pipeline.plugin.filter.namelogstash.pipeline.plugin.filter.idlogstash.pipeline.plugin.filter.elasticsearch.cluster.idlogstash.pipeline.plugin.output.namelogstash.pipeline.plugin.output.idlogstash.pipeline.plugin.output.elasticsearch.cluster.idNeither Dimension nor Metric
@timestampprocess.pidecs.versionevent.durationerror.messagecontainer.idcontainer.image.namecontainer.labelscontainer.namehost.containerizedlogstash.pipeline.plugin.input.source.columnlogstash.pipeline.plugin.input.source.idlogstash.pipeline.plugin.input.source.linelogstash.pipeline.plugin.input.source.protocollogstash.pipeline.plugin.filter.source.columnlogstash.pipeline.plugin.filter.source.idlogstash.pipeline.plugin.filter.source.linelogstash.pipeline.plugin.filter.source.protocollogstash.pipeline.plugin.output.source.columnlogstash.pipeline.plugin.output.source.idlogstash.pipeline.plugin.output.source.linelogstash.pipeline.plugin.output.source.protocolMetrics Added / Changed
Changed (counter → gauge):
logstash.pipeline.plugin.input.metrics.beats.peak_connectionslogstash.pipeline.plugin.input.metrics.beats.current_connectionsNo newly added metric_type fields in this data stream.
Open questionsOriginal question
As the data streams are migrated to TSDB they do not support metrics which documents
containing the error when fetching a metric as those would not have the necessary
dimensions. When testing, I started the agent before logstash was ready, getting
a few errors, which when testing the migration with https://github.com/elastic/TSDB-migration-test-kit
could not be migrated.
If we want to keep this behaviour, we'd need to add more dimensions to allow
those events to be ingested.
On my tests, the error events were present on
health_reportand onnode_celdata streams.
Here is an example of the error documents:
{"error": {"type": "illegal_argument_exception", "reason": "Error extracting routing: source didn't contain any routing fields"}, "status": 400, "document": {"cloud": {"availability_zone": "us-central1-f", "instance": {"name": "anderson-logstash", "id": "8077647130981829769"}, "provider": "gcp", "service": {"name": "GCE"}, "machine": {"type": "e2-standard-4"}, "project": {"id": "elastic-observability"}, "region": "us-central1", "account": {"id": "elastic-observability"}}, "input": {"type": "cel"}, "agent": {"name": "anderson-logstash", "id": "27ad10fc-cef3-424a-879c-23721c867517", "type": "filebeat", "ephemeral_id": "c2cfc104-c50f-4906-af6e-7ddcf911012a", "version": "8.17.10"}, "@timestamp": "2026-02-12T11:32:41.562Z", "ecs": {"version": "8.0.0"}, "data_stream": {"namespace": "default", "type": "metrics", "dataset": "logstash.node"}, "elastic_agent": {"id": "27ad10fc-cef3-424a-879c-23721c867517", "version": "8.17.10", "snapshot": false}, "host": {"hostname": "anderson-logstash", "os": {"kernel": "6.1.0-43-cloud-amd64", "codename": "bookworm", "name": "Debian GNU/Linux", "type": "linux", "family": "debian", "version": "12 (bookworm)", "platform": "debian"}, "containerized": false, "ip": ["10.128.0.72", "fe80::4001:aff:fe80:48"], "name": "anderson-logstash", "id": "370ef8b742434f90a470fd961035344e", "mac": ["42-01-0A-80-00-48"], "architecture": "x86_64"}, "error": {"message": "failed eval: ERROR: <input>:7:7: Get \"http://localhost:9600/_node/stats?graph=true&vertices=true\": dial tcp [::1]:9600: connect: connection refused\n | ? {\n | ......^"}, "event": {"agent_id_status": "verified", "ingested": "2026-02-12T11:32:50Z", "dataset": "logstash.node"}}}One option could be to have the
agent.idas dimension, that way the error eventscould be indexed, having their TSDB ID from the
agent.id+timestamp, what seemsok. However, it means additional mappings and dimensions for this corner case.
I checked and the errors are still logged and appear on the agent dashboards
which show errors, like the "concerning agents". So, the errors aren't lost,
nevertheless, it's still a breaking change if anyone would rely on documents with
the
errorkey to know something is wrong.Talking to the team we agreed there is no issue adding the
agent.idasdimension to all data streams, so the events with errors when fetching metrics
can be indexed, having their TSDB ID from the
agent.id+timestamp. See abovefor details.
Also, I confirmed with the ES team that "redundant" dimensions, dimensions that
do not "create new IDs" have a negligible impact.
Here it's create a new "metric series", but only one for agent to allow the
errors to be ingested and keep the backwards compatibility, which seems a good
trade-off.
Checklist
[ ] I have reviewed tips for building integrations and this pull request is aligned with them.[ ] I have added an entry to my package'schangelog.ymlfile.[ ] I have verified that Kibana version constraints are current according to guidelines.[ ] I have verified that any added dashboard complies with Kibana's Dashboard good practicesHow to test this PR locally
I used a modified version
of the TSDB migration test kit
to test the migration of the data streams to TSDB. It reindex the source index
to the destination index. This approach fails if there are documents that cannot
be ingested into the TSDB. For example the "error" events I mentioned above.
Thus, I modified it to scan the source index and use the bulk API to index the
documents, saving the failed and duplicated documents into 2 different files.
You may try both versions of the test kit.
click to show instructions
Agent + Logstash output setup
Let's setup 2 nodes. All paths are relative to the Logstash directory
logstash: node 1:
agent.conf:
config/logstash.yml:
cd logstasn-node-1 ./bin/logstash -f agent.conflogstash: node 2:
agent.conf:
config/logstash.yml:
cd logstasn-node-2 ./bin/logstash -f agent.confElastic Agent
["localhost:5044", "localhost:5045"]flog -t log -o /tmp/agent/in/log.ndjson -w -f json -l -p 1048576 -d 500ms--namespace logstash-outputElastic Agent with Logstash integration
http://localhost:9600/tmp/logstash-node-1/logs/logstash-plain*.log/tmp/logstash-node-1/logs/logstash-slowlog-plain*.loghttp://localhost:9600http://localhost:9601/tmp/logstash-node-2/logs/logstash-plain*.log/tmp/logstash-node-2/logs/logstash-slowlog-plain*.loghttp://localhost:9601--namespace monitoringVerify
path
/tmp/agent/in/log.ndjsonI recomend to let it run for a good while, so there will be a good amount of data
https://github.com/elastic/TSDB-migration-test-kit can use when testing the TSDB
migration.
elastic-package build -v && elastic-package install -vCheck failures are ingested as metrics
tl;dr: the cel input produces an event with
error.messageif it fails to fetchdata. The only dimension present on this event is
agent.id. Given it has atleast one dimension, present, the event should be ingested.
{ "log.level": "warn", "message": "Failed to index 4 events in last 10s: events were dropped! Look at the event log to view the event and cause.", "log.logger": "elasticsearch" }error.messageexists:error.message : *error making http request: Get "http://localhost:9601/": dial tcp 127.0.0.1:9601: connect: connection refusedRelated issues