feat: add distributed tracing for webhook handling and PipelineRun timing#2605
feat: add distributed tracing for webhook handling and PipelineRun timing#2605zakisk merged 1 commit intotektoncd:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of Pipelines-as-Code by integrating OpenTelemetry distributed tracing. This allows operators and developers to gain deeper insights into the performance and flow of webhook event processing and the various stages of PipelineRun execution. By propagating trace context, it facilitates a unified view of operations across PaC and Tekton Pipelines, streamlining debugging and performance analysis. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry distributed tracing to Pipelines-as-Code. Key changes include integrating tracing into the event handling flow, propagating trace context to PipelineRuns via annotations, and emitting timing spans for PipelineRun lifecycle events. The observability configuration has been updated to include new tracing options, but the removal of existing metrics-protocol and metrics-endpoint configurations requires clarification and documentation. Additionally, an improvement opportunity was identified to ensure consistent tracing data by always setting VCS repository and revision attributes, even when empty.
I am having trouble creating individual review comments. Click here to see my feedback.
config/305-config-observability.yaml (24-25)
The metrics-protocol and metrics-endpoint configurations are being removed from the data section. This change was not explicitly mentioned in the pull request description, which focuses on adding tracing. If these metrics were actively used, their removal could be a breaking change or an unintended side effect. Please clarify if this removal is intentional and, if so, document it in the PR description or release notes.
pkg/adapter/adapter.go (212-217)
For better consistency in tracing data, consider always setting the VCSRepositoryKey and VCSRevisionKey attributes, even if l.event.URL or l.event.SHA are empty. This ensures the attribute key is always present in the span, which can simplify querying and analysis in tracing backends. You could set them to an empty string or a placeholder like "unknown" if the values are not available, instead of omitting the attribute entirely.
if l.event.URL != "" {
span.SetAttributes(tracing.VCSRepositoryKey.String(l.event.URL))
} else {
span.SetAttributes(tracing.VCSRepositoryKey.String(""))
}
if l.event.SHA != "" {
span.SetAttributes(tracing.VCSRevisionKey.String(l.event.SHA))
} else {
span.SetAttributes(tracing.VCSRevisionKey.String(""))
}393b9d3 to
3870a7f
Compare
|
/ok-to-test |
3870a7f to
bcbb64e
Compare
|
@zakisk can you have a look pls |
bcbb64e to
cf3108c
Compare
cf3108c to
501e750
Compare
|
@ci-operator for E2E run we're working on permission workaround in this PR #2611 |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements distributed tracing for Pipelines-as-Code using OpenTelemetry. It adds logic to extract trace context from incoming webhook headers, propagate it to PipelineRuns via a new annotation, and emit timing spans for event processing and PipelineRun execution. The PR also includes configuration updates and new documentation. Feedback points out that an error during JSON marshalling of the span context is currently ignored and should be logged to assist with debugging.
|
this conflitcs with recently merged 0faad24 |
|
/ok-to-test |
|
@ci-operator there are still some conflicts there I think, can you please fix them |
|
due to recent changes in #2622, label is not deleted automatically because we've changed the way it should have been deleted. |
d01d86f to
f190d03
Compare
|
/ok-to-test |
|
/ok-to-test |
| // payload validation | ||
| var event map[string]any | ||
| if err := json.Unmarshal([]byte(reqBody), &event); err != nil { | ||
| return nil, &log, fmt.Errorf("invalid event body format: %w", err) | ||
| } |
There was a problem hiding this comment.
Unintentional - restored.
There was a problem hiding this comment.
I know it may have been a ⇥ tab in cursor and it delete next few lines 😄 no worries
| ) | ||
|
|
||
| span.SetAttributes( | ||
| semconv.VCSProviderNameKey.String(gitProvider.GetConfig().Name), |
There was a problem hiding this comment.
Fixed. The provider name isn't resolved until SetClient runs, which happens after handleEvent. Moved the attribute to SetupAuthenticatedClient where it's actually available.
|
for linters and go-testing, can you please setup pre-commit? |
9f9c728 to
ed26d26
Compare
|
/ok-to-test |
ed26d26 to
286cdd7
Compare
|
/label ok-to-test |
|
/ok-to-test |
|
✅ Added labels: ok-to-test. |
|
/ok-to-test |
|
/retest |
| | --- | --- | | ||
| | `vcs.provider.name` | Git provider name | | ||
| | `vcs.repository.url.full` | Repository URL | | ||
| | `vcs.ref.head.revision` | Head commit SHA | |
There was a problem hiding this comment.
https://opentelemetry.io/docs/specs/semconv/registry/attributes/vcs/#vcs-change-id
@ci-operator what about also having vcs.change.id which is I guess pull request number? I know you've added revision but it would be helpful for tracer to have pull request number.
|
@ci-operator it looks good! can you please fix linters.. |
Instrument PaC with OpenTelemetry tracing. The controller emits a ProcessEvent span for each webhook, honoring incoming W3C trace context when present. The watcher emits waitDuration and executeDuration timing spans for completed PipelineRuns. Trace context is propagated onto created PipelineRuns via annotation for end-to-end delivery traces. Tracing configuration uses the existing pipelines-as-code-config-observability ConfigMap (located via CONFIG_OBSERVABILITY_NAME env var). Label-sourced span attributes are configurable via the main pipelines-as-code ConfigMap. See docs/content/docs/operations/tracing.md for the schema.
286cdd7 to
d45452c
Compare
|
ok-to-test |
|
/ok-to-test |
|
/lgtm |
LGTM Vote Breakdown
Votes Summary:
Automated by the PAC Boussole 🧭 |
|
@ci-operator thanks for your contributions! |

Description
Add OpenTelemetry distributed tracing to Pipelines-as-Code. When tracing is
enabled via the
pipelines-as-code-config-observabilityConfigMap, PaC emitstrace spans for webhook event processing and PipelineRun lifecycle timing.
Controller: Emits a
PipelinesAsCode:ProcessEventspan for each webhookevent. Extracts W3C trace context from inbound webhook headers when present,
otherwise creates a new root span. Propagates the span context onto created
PipelineRuns via the
tekton.dev/pipelinerunSpanContextannotation, enablingend-to-end traces when downstream controllers also have tracing enabled.
Watcher: Emits
waitDuration(creation → start) andexecuteDuration(start → completion) timing spans for completed PipelineRuns, using resource
timestamps for accurate wall-clock timing.
Configuration
Tracing uses two ConfigMaps (both already exist in the PaC deployment manifests):
pipelines-as-code-config-observability— tracing protocol, endpoint, andsampling rate. The controller and watcher locate this ConfigMap via the
CONFIG_OBSERVABILITY_NAMEenvironment variable already set in their deploymentYAMLs. Changes require a pod restart — the trace exporter is created once at
startup.
pipelines-as-code— three new keys (tracing-label-action,tracing-label-application,tracing-label-component) configure whichPipelineRun labels are read for span attributes. Changes are picked up
automatically without restart.
See
docs/content/docs/operations/tracing.mdfor the full schema, attributetables, and deployment guidance.
Testing tracing on a cluster
The default
config/305-config-observability.yamlships with tracing disabled.To enable it, add
tracing-protocol,tracing-endpoint, andtracing-sampling-rateto the observability ConfigMap and restart the controllerand watcher pods. Note that
ko apply -f config/will overwrite theobservability ConfigMap with defaults — re-apply tracing fields afterward.
Linked Issue
https://redhat.atlassian.net/browse/PVO11Y-5067
Testing Strategy
Unit tests cover span attribute emission, trace context injection/extraction,
W3C parentage (incoming context honored vs. new root created), malformed
annotation handling, result enum mapping, and message truncation. Manually
verified end-to-end on a local cluster with Jaeger.
AI Assistance
Co-authored-by trailers are on each commit.
Submitter Checklist
make testandmake lintpass locally