Skip to content

Comments

feat(preprod): Add snapshots subcommand#3110

Draft
lcian wants to merge 28 commits intomasterfrom
lcian/feat/snapshots
Draft

feat(preprod): Add snapshots subcommand#3110
lcian wants to merge 28 commits intomasterfrom
lcian/feat/snapshots

Conversation

@lcian
Copy link
Member

@lcian lcian commented Jan 28, 2026

Updated version of #3049 to discuss and iterate on things.

Notable changes:

  • Removed shard_index parameter from the command. I'm not sure what the purpose of that was originally.
  • This uses the new many (batch) API from objectstore_client. All uploads are executed as batch requests, reducing network overhead. Unfortunately, with they way things are implemented now, we will still have to buffer all files in memory before sending the request, as we need to hash their contents to determine the filename. If we could just use the filename as the key in objectstore, it would be much better because that way we could stream the files over.

Note that auth enforcement still needs to be enabled for objectstore, so that's currently blocking this to be used for anything but internal testing.

Ref FS-233

@github-actions

This comment was marked as outdated.

use anyhow::{Context, Result};

/// Given an org and project slugs or IDs, returns the IDs of both.
pub fn get_org_project_id(api: impl AsRef<Api>, org: &str, project: &str) -> Result<(u64, u64)> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, for all commands a user can provide --project and --org as slugs or IDs.
These utils are needed to get the corresponding IDs, so that objects from the same org/proj all have the same paths in objectstore regardless if the user passed in slugs or IDs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be weird to take in Api and it might be possible that these functions should live somewhere else, IDK.
They certainly don't belong to the Api struct as its methods seem to all map 1to1 to Sentry API calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the backend interpret the slugs in to ids instead of the frontend?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the backend interpret the slugs in to ids instead of the frontend?

Indeed, I agree that this would be preferable. The current code adds two additional API calls that could be avoided if the backend resolves the slugs/IDs appropriately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what backend and frontend here refers to.
How objectstore works is that the scope (org and project in this case) is an arbitrary sequence of key-value pairs. We recommend using org and project ID, or at least org ID, but that's not mandatory.
Therefore it's not a responsibility of objectstore or its endpoint in the monolith to normalize org and project, we simply take what the user provides. It's responsibility of the user to normalize.

Cargo.toml Outdated
lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indirectly adds a dependency to reqwest and we need to double check what the implications of that are, especially in regard to rustls vs native-tls which could be problematic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also adds a bunch of deps but I think most of them are inevitable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the TLS stuff next week, because I am uncertain what problems you're concerned about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcian and I discussed offline today. We agreed we should use native-tis and native-tls-vendored (the latter possibly only on Linux), as these should pull in the same underlying dependencies as curl, thus limiting the increase in binary size.

@linear
Copy link

linear bot commented Jan 28, 2026

@lcian lcian marked this pull request as ready for review February 5, 2026 18:32
@lcian lcian requested review from a team and szokeasaurusrex as code owners February 5, 2026 18:32
@szokeasaurusrex
Copy link
Member

Hey @lcian, are you ready for me to review this, or are you still planning to check the feedback from Bugbot and/or iterate further?

Copy link
Member

rbro112 commented Feb 10, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

rbro112 added a commit to getsentry/sentry that referenced this pull request Feb 11, 2026
Adds the initial snapshots POST API. This api does a few things and is
intended to be invoked by the CLI:
- Creates `PreprodArtifact`, `PreprodSnapshotMetrics` and
`CommitComparison` DB models
- Creates image metadata based on what's uploaded from CLI
- Stores metadata in objectstore

Notably images are uploaded directly to objectstore from CLI

Tested E2E locally with objectstore and WIP CLI branch
(getsentry/sentry-cli#3110)

Resolves EME-773
jaydgoss pushed a commit to getsentry/sentry that referenced this pull request Feb 12, 2026
Adds the initial snapshots POST API. This api does a few things and is
intended to be invoked by the CLI:
- Creates `PreprodArtifact`, `PreprodSnapshotMetrics` and
`CommitComparison` DB models
- Creates image metadata based on what's uploaded from CLI
- Stores metadata in objectstore

Notably images are uploaded directly to objectstore from CLI

Tested E2E locally with objectstore and WIP CLI branch
(getsentry/sentry-cli#3110)

Resolves EME-773
Copy link
Member

@szokeasaurusrex szokeasaurusrex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Most of them are minor and optional, but I would call your attention to the comment about memory usage and the potential for leaking tokens.

Let's discuss further next week

Cargo.toml Outdated
lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss the TLS stuff next week, because I am uncertain what problems you're concerned about.

Cargo.toml Outdated
lazy_static = "1.4.0"
libc = "0.2.139"
log = { version = "0.4.17", features = ["std"] }
objectstore-client = { git = "https://github.com/getsentry/objectstore.git", branch = "lcian/feat/rust-batch-client" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Before we merge these changes, we should ensure we are depending on a version that's been published to crates.io, not a Git branch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lcian will let you take this one as it's more objectstore related

rbro112 and others added 7 commits February 13, 2026 17:09
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


// SOF markers: C0-CF except C4 (DHT), C8 (JPG extension), and CC (DAC)
if (0xC0..=0xCF).contains(&marker) && marker != 0xC4 && marker != 0xC8 && marker != 0xCC {
if i + 7 < data.len() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off-by-one in JPEG dimension bounds check

Low Severity

The bounds check i + 7 < data.len() is one byte too strict. The highest index accessed is i + 6, which requires i + 7 <= data.len(). The current check requires data.len() >= i + 8, meaning valid JPEG files where the SOF segment ends exactly at the data boundary will incorrectly return None instead of the parsed dimensions.

Fix in Cursor Fix in Web

debug!("Processing image: {}", image.path.display());

let contents = fs::read(&image.path)
.with_context(|| format!("Failed to read image: {}", image.path.display()))?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double file read causes TOCTOU hash-content mismatch

Medium Severity

Each image file is read twice: once in collect_image_info to compute the SHA256 hash and dimensions, and again in upload_images to get the upload content. The hash from the first read becomes the objectstore key, but the content from the second read is what gets uploaded. If a file changes between reads, the manifest hash won't match the uploaded content. Storing the file contents in ImageInfo from the first read would fix both the correctness issue and avoid redundant I/O.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

debug!("Processing image: {}", image.path.display());

let contents = fs::read(&image.path)
.with_context(|| format!("Failed to read image: {}", image.path.display()))?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files read twice causing potential hash-content mismatch

Medium Severity

Each image file is read from disk twice: once in collect_image_info to compute the SHA-256 hash and dimensions, then again in upload_images to get the content for upload. The hash stored in ImageInfo comes from the first read, but the content uploaded to objectstore comes from the second fs::read. If a file is modified between the two reads, the obj_key (derived from the first read's hash) won't match the actually uploaded content, causing a silent data integrity mismatch in the manifest. Storing the file contents in ImageInfo during the first read would fix both the correctness issue and avoid the redundant I/O.

Additional Locations (1)

Fix in Cursor Fix in Web


// SOF markers: C0-CF except C4 (DHT), C8 (JPG extension), and CC (DAC)
if (0xC0..=0xCF).contains(&marker) && marker != 0xC4 && marker != 0xC8 && marker != 0xCC {
if i + 7 < data.len() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JPEG dimension bounds check off by one

Low Severity

The bounds check i + 7 < data.len() is one byte too strict. The highest index accessed is i + 6 (for the width bytes), so the correct guard is i + 7 <= data.len() (equivalently, i + 6 < data.len()). This causes read_jpeg_dimensions to return None for valid JPEG data where the SOF segment ends exactly at the buffer boundary.

Fix in Cursor Fix in Web

@lcian lcian marked this pull request as draft February 20, 2026 17:30
@lcian
Copy link
Member Author

lcian commented Feb 20, 2026

Converting to draft until CI passes and this is reviewable again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants