Skip to content

scalar: Install prefetch packfiles in parallel#876

Open
derrickstolee wants to merge 2 commits intomicrosoft:vfs-2.53.0from
derrickstolee:prefetch-parallel
Open

scalar: Install prefetch packfiles in parallel#876
derrickstolee wants to merge 2 commits intomicrosoft:vfs-2.53.0from
derrickstolee:prefetch-parallel

Conversation

@derrickstolee
Copy link
Copy Markdown

@derrickstolee derrickstolee commented Apr 7, 2026

When using Scalar clones with microsoft/git against Azure DevOps and GVFS Cache Servers, git fetch will download potentially multiple precomputed prefetch packfiles. The current mechanism indexes these files sequentially.

Let's make those git index-pack processes run somewhat in parallel.

For now, I've chosen to have a maximum of four parallel processes to limit the potential load on the disk. However, this already has some significant gains. When testing an internal monorepo (that uses Codespaces, for easy Linux testing) and deleting a few days of recent prefetch packfiles, the end-to-end git fetch time improved as follows:

Command Mean [s] Min [s] Max [s] Relative
new 40.306 ± 1.598 37.564 42.383 1.00
old 85.213 ± 2.389 82.402 89.207 2.11 ± 0.10

When downloading fewer prefetch packfiles, the improvement is still relevant:

Command Mean [s] Min [s] Max [s] Relative
new 6.411 ± 0.800 5.559 7.553 1.00
old 13.906 ± 2.848 10.941 17.697 2.17 ± 0.52

I should mention that I first tried streaming data directly from the curl download into a sequence of git index-pack processes, but that did not make any serious difference in the performance. Based on these numbers, we are clearly blocked on the CPU time spent computing deltas and evaluating object hashes and not blocked on the "download to disk, then index from disk" I/O.

I think it would be worthwhile to do some performance testing on Windows, at minimum, before merging this change. I'd like to get some feedback on the concept before going through those actions.

Another question to ask is whether it is worth making this behavior configurable: should it be possible to disable parallel indexing in favor of a sequential process if a certain config option is set? Should we allow increasing the parallelism via config?

Refactor install_prefetch() to process prefetch packs in two
distinct phases:

Phase 1 (extraction): Read the multipack stream sequentially,
copying each packfile to its own temp file and recording its
checksum and timestamp in a prefetch_entry array.  This must be
sequential because the multipack is a single byte stream.

Phase 2 (indexing): Run 'git index-pack' on each extracted temp
file and finalize it into the ODB.  Today this still runs
sequentially, but the separation makes it straightforward to
parallelize in a subsequent commit.

The new extract_packfile_from_multipack() only does I/O against
the multipack fd plus temp-file creation.  The new
index_and_finalize_packfile() only does the index-pack and rename
work.  Neither depends on the other's state, so they can operate
on different entries concurrently once the extraction phase
completes.

No behavioral change; this is a pure refactor.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Replace the sequential index-pack loop in install_prefetch() with
run_processes_parallel(), spawning up to four concurrent 'git
index-pack' workers.

The packfiles are already ordered by timestamp (oldest first) in
the multipack response.  In the common fresh-clone scenario the
oldest pack is by far the largest, so it starts indexing
immediately on the first worker while the remaining workers cycle
through the smaller daily and hourly packs.

Note that this works for the GVFS prefetch endpoint as all prefetch
packfiles are non-thin packs. The bundle URI feature uses thin bundles
that must be unpacked sequentially.

Worker count is min(np, PREFETCH_MAX_WORKERS) where
PREFETCH_MAX_WORKERS is 4, so we never create more workers than
there are packfiles.  When there is only a single packfile the
parallel infrastructure is skipped entirely and index-pack runs
directly.

The default grouped mode of run_processes_parallel() is used so
that child-process completion is detected via poll() on stderr
pipes rather than the ungroup mode's aggressive
mark-all-slots-WAIT_CLEANUP approach, which can misfire on slots
that never started a process.

The run_processes_parallel() callbacks are always invoked from the
main thread, so finalize_prefetch_packfile() (which renames files
into the ODB) needs no locking.  If any index-pack fails, the
error is recorded and remaining tasks still complete so that
successfully-indexed packs are not lost.

I performed manual performance testing on Linux using an internal
monorepo. I deleted a set of recent prefetch packfiles, leading to a
download of a couple daily packfiles and several hourly packfiles. This
led to an improvement from 85.2 seconds to 40.3 seconds.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
@tyrielv
Copy link
Copy Markdown

tyrielv commented Apr 8, 2026

Excellent change.

From what I've learned implementing similar behavior in VFSForGit, adding just 1 parallel thread provides the vast majority of the gain because of the structure of the pack files provided by the cache servers - there is 1 very large file followed by perhaps a dozen much smaller files. 1 separate worker handling the smaller files can typically index them all while the large file is still in its single-threaded analysis phase. If the cache server format changes in the future (e.g., to limit files to 1GB instead of having one "everything older than 3 months" file) then more parallelism could have a bigger effect.

@derrickstolee
Copy link
Copy Markdown
Author

derrickstolee commented Apr 9, 2026

From what I've learned implementing similar behavior in VFSForGit, adding just 1 parallel thread provides the vast majority of the gain because of the structure of the pack files provided by the cache servers - there is 1 very large file followed by perhaps a dozen much smaller files.

While that structure is true for fresh clones, I was showing performance gains even when fetching only the last few days of packfiles. This will improve even incremental git fetch performance (assuming for some reason that background prefetches were halted, perhaps by credentials expiring).

And you're right that we may benefit from some further gains by rearranging our prefetch packfile structure, such as breaking the "everything" packfile into smaller chunks. Perhaps monthly or yearly packs as a "maximum time" interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants