Multiprocess Parallel Random Data Generation for Benchmark Serving. by Duyi-Wang · Pull Request #1038 · SemiAnalysisAI/InferenceX

Duyi-Wang · 2026-04-16T07:34:09Z

Summary

Accelerate random prompt generation in benchmark_serving.py by parallelizing the sample_random_requests() function using Python multiprocessing.Pool. This addresses the bottleneck where generating large numbers of long prompts (e.g., 20K+ prompts at 8K+ input tokens) takes tens of minutes due to sequential tokenizer encode/decode operations.

Problem

When running benchmarks with high concurrency and long input sequences, the data preparation phase dominates total wall time. For example:

2048 concurrency × 10 = 20,480 prompts @ 8,192 input tokens: the original serial path would take ~25 minutes just to generate prompt data before any actual benchmarking begins.

The root cause is that each prompt requires multiple tokenizer.decode() → tokenizer.encode() round-trips (up to 10 retries) to calibrate token length, and this entire loop runs sequentially in a single process.

Solution

Added multiprocessing support to sample_random_requests() via multiprocessing.Pool
Each worker process initializes its own tokenizer instance once (via Pool(initializer=...))
The prompt generation workload is split into chunks and distributed across workers
Added --random-num-workers CLI argument (grouped with other --random-* options):
- 0 (default): auto-select min(cpu_count, 8) workers
- 1: force serial execution (original behavior, full backward compatibility)
- N: use exactly N worker processes
New parameters (tokenizer_id, tokenizer_mode, trust_remote_code, num_workers) added to sample_random_requests() function signature; all are optional with backward-compatible defaults

Test Results

Tested with DeepSeek-R1 tokenizer (vocab_size=128,000), input_len=8192, output_len=1024, range_ratio=0.8:

Correctness Verification (2,048 prompts, serial vs parallel)

Metric	Result
Prompt length exact match	2048/2048 (100.0%)
Output length exact match	2048/2048 (100.0%)
Prompt text exact match	1949/2048 (95.2%)
Prompt length mean diff	0.00
Output lengths identical	True
Overall	PASS

Note: ~5% prompt text difference is expected — the retry loop uses random token padding, and multiprocessing workers use independent RNG states. However, all prompt/output lengths match exactly, which is what matters for benchmark accuracy.

Performance (8 worker processes)

Scenario	Serial	Parallel (8 workers)	Speedup
2,048 prompts × 8K input	150.88s	24.74s	6.10x
20,480 prompts × 8K input	~1,508s (est.)	228.37s	~6.6x

Statistical Consistency

Serial   prompt_len: mean=7379.3  std=478.8  min=6553  max=8192
Parallel prompt_len: mean=7379.3  std=478.8  min=6553  max=8192

Serial   output_len: mean=920.6  std=60.2  min=819  max=1024
Parallel output_len: mean=920.6  std=60.2  min=819  max=1024

Files Changed

utils/bench_serving/benchmark_serving.py — Added multiprocessing support for prompt generation (+152/-28 lines)

Usage

# Default: auto-parallel with up to 8 workers (no change needed to existing scripts)
python benchmark_serving.py --dataset-name random --random-input-len 8192 --num-prompts 20480 ...

# Explicit worker count
python benchmark_serving.py --dataset-name random --random-num-workers 16 ...

# Force serial (original behavior)
python benchmark_serving.py --dataset-name random --random-num-workers 1 ...

Reproducibility Verification

The parallel path is fully deterministic: given the same --seed and --random-num-workers, multiple runs produce byte-identical results.

Verified by running 3 consecutive executions with seed=0, num_workers=4, num_prompts=200, input_len=1024 and computing MD5 over all prompt texts and lengths:

Run 1: hash=28004a3db05c9f6b98cc8169405f83c0, prompt_lens_sum=184378
Run 2: hash=28004a3db05c9f6b98cc8169405f83c0, prompt_lens_sum=184378
Run 3: hash=28004a3db05c9f6b98cc8169405f83c0, prompt_lens_sum=184378
All identical: True

This is guaranteed because:

The main process np.random.seed() makes input_lens, output_lens, offsets, and per-worker seeds all deterministic
Each worker creates an independent np.random.RandomState(seed) with its assigned fixed seed
pool.map() returns results in chunk order (not completion order)

Note: changing --random-num-workers will change per-worker seed assignments, so results will differ from serial mode or a different worker count. However, the prompt/output length distributions remain statistically identical across any worker configuration.

Backward Compatibility

Default behavior changes from serial to parallel, but results are statistically equivalent
--random-num-workers 1 preserves exact original behavior
No changes to benchmark output format or metrics calculation
No new package dependencies (uses stdlib multiprocessing)

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Multiprocess Parallel Random Data Generation for Benchmark Serving.

7aaed12

Duyi-Wang requested a review from a team April 16, 2026 07:34

github-project-automation bot added this to InferenceMAX Board Apr 16, 2026

claude bot reviewed Apr 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocess Parallel Random Data Generation for Benchmark Serving.#1038

Multiprocess Parallel Random Data Generation for Benchmark Serving.#1038
Duyi-Wang wants to merge 1 commit intoSemiAnalysisAI:mainfrom
Duyi-Wang:mp_benchmark

Duyi-Wang commented Apr 16, 2026

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Duyi-Wang commented Apr 16, 2026

Summary

Problem

Solution

Test Results

Correctness Verification (2,048 prompts, serial vs parallel)

Performance (8 worker processes)

Statistical Consistency

Files Changed

Usage

Reproducibility Verification

Backward Compatibility

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant