Skip to content

Parallelize chunk uploads in record tests for better throughput#1268

Open
Konboi wants to merge 1 commit intov1from
improve-v1-performance
Open

Parallelize chunk uploads in record tests for better throughput#1268
Konboi wants to merge 1 commit intov1from
improve-v1-performance

Conversation

@Konboi
Copy link
Copy Markdown
Contributor

@Konboi Konboi commented Apr 2, 2026

Why

When uploading millions of test records (e.g. with the raw profile), the record tests command is too slow because event chunks are posted sequentially. With the default chunk size of 1000, uploading 1M records requires ~1000 HTTP round-trips, each blocking until the response arrives. The main thread idles during network I/O instead of parsing the next chunk.

What

Use concurrent.futures.ThreadPoolExecutor (3 workers) to POST event chunks concurrently instead of sequentially:

  • The first chunk is sent synchronously to handle no-build mode, where the response sets the session ID for subsequent requests
  • All subsequent chunks are submitted to the thread pool and uploaded in parallel
  • Parsing continues on the main thread while uploads happen in the background
  • After all chunks are submitted, we wait for all futures to complete and propagate any exceptions

This overlaps network I/O with parsing and reduces wall-clock time by approximately 3x for large uploads.

Please Review Here

  • Thread safety: payload() (which mutates count) runs only on the main thread. send() runs in worker threads but only reads self.session (set after the first sync call) and sets is_observation (same value every response). requests.Session is thread-safe for concurrent requests.
  • The first-chunk-sync pattern ensures no-build mode works correctly since the response establishes the session ID used by all subsequent requests.

Use ThreadPoolExecutor to POST event chunks concurrently instead of
sequentially. The first chunk is sent synchronously to handle no-build
mode (where the response sets the session ID), then subsequent chunks
are uploaded in parallel with up to 3 workers. This significantly
reduces wall-clock time when uploading millions of test records by
overlapping network I/O with parsing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Konboi Konboi changed the base branch from main to v1 April 3, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant