feat(preprod): Add Datadog metrics for snapshot upload and diff lifecycle#111024
Open
NicoHinderling wants to merge 5 commits intomasterfrom
Open
feat(preprod): Add Datadog metrics for snapshot upload and diff lifecycle#111024NicoHinderling wants to merge 5 commits intomasterfrom
NicoHinderling wants to merge 5 commits intomasterfrom
Conversation
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
…ycle Instruments the snapshot upload endpoint and compare_snapshots task with distribution metrics to enable observability into upload patterns, image volumes, diff durations, and build quality signals on the Preprod Health dashboard. Co-Authored-By: Claude <noreply@anthropic.com>
diff_duration_s was measured from comparison.date_added, which is set on first attempt creation via get_or_create. On retries the same record is reused, so the metric would include idle time between attempts. Now measured from task_start_time captured at the top of the function. zero_changes was not accounting for renamed_pairs, causing rename-only diffs to incorrectly increment the zero_changes counter. Co-Authored-By: Claude <noreply@anthropic.com>
The PreprodArtifact count query for bundles_per_commit sat unguarded in the critical path before downstream task dispatch. A DB timeout or error would prevent create_preprod_snapshot_status_check_task from ever being dispatched, orphaning the artifact. Wrap in try/except so a metrics failure cannot block the upload completion flow. Co-Authored-By: Claude <noreply@anthropic.com>
When all eligible image pairs error (e.g., objectstore outage, oversized images), changed_count stays 0 and added/removed/renamed are empty, causing zero_changes to fire incorrectly. Add error_count check so the metric only increments when the diff genuinely found no differences. Co-Authored-By: Claude <noreply@anthropic.com>
16280ab to
6e476b8
Compare
rbro112
reviewed
Mar 18, 2026
| v["image_file_name"] for v in images.values() if v.get("image_file_name") | ||
| ) | ||
| duplicate_count = sum(c - 1 for c in file_name_counts.values() if c > 1) | ||
| metrics.distribution( |
Member
There was a problem hiding this comment.
How often would we actually get duplicate images? This seems like a strange thing to record metrics on
Contributor
Author
There was a problem hiding this comment.
I think max added that in my requests doc
Contributor
Author
There was a problem hiding this comment.
i'll add a comment as a reminder to consider removing this
Contributor
Author
There was a problem hiding this comment.
actually im just going to kill this for now
rbro112
reviewed
Mar 18, 2026
rbro112
approved these changes
Mar 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Adds
metrics.distributionandmetrics.incrinstrumentation to the snapshot upload endpoint andcompare_snapshotstask so the Preprod Health dashboard can track snapshot usage and build quality signals.Metrics added
On upload (
ProjectPreprodSnapshotEndpoint.post):preprod.snapshots.upload.image_count— number of images per upload, taggedhas_vcsto distinguish CI builds from standalone uploadspreprod.snapshots.upload.duplicate_image_file_names— count ofimage_file_namecollisions within a single manifest (proxy for same screen uploaded under multiple hashes)preprod.snapshots.upload.bundles_per_commit— how many snapshot bundles have been uploaded for the same commit (only emitted whenhas_vcs=True)On diff completion (
compare_snapshotstask):preprod.snapshots.diff.duration_s— time from comparison record creation to diff task completionpreprod.snapshots.e2e_duration_s— time from artifact upload to diff completion, mirrors the existingpreprod.size_analysis.results_e2epatternpreprod.snapshots.image.avg_size_bytes— average byte size of images actually fetched for pixel diff (excludes added/removed)preprod.snapshots.diff.zero_changes— incremented when a diff completes with no changed, added, or removed imagesAll metrics use
sample_rate=1.0. No Amplitude analytics events are included — this is Datadog/tracemetrics only, targeting the existing Preprod Health dashboard.