Skip to content

Tecnix eval caching#16

Draft
joshheinrichs-shopify wants to merge 12 commits intotecnix-gcsfrom
tecnix-eval-caching
Draft

Tecnix eval caching#16
joshheinrichs-shopify wants to merge 12 commits intotecnix-gcsfrom
tecnix-eval-caching

Conversation

@joshheinrichs-shopify
Copy link

No description provided.

joshheinrichs-shopify and others added 12 commits February 28, 2026 15:46
…paths

Previously, fetchToStore2 skipped the fingerprint cache entirely when a
PathFilter was present, and had no caching for on-disk store paths that
lacked an accessor-level fingerprint.

Two changes:

1. Always call getFingerprint() regardless of filter. When a filter is
   present, prefix the cache key with "filtered:" to separate filtered
   and unfiltered results. This allows filtered paths with stable
   fingerprints (e.g., git-backed zones) to cache across evaluations.

2. For paths without an accessor fingerprint, check if the physical path
   is an immutable store path. If so, use "storePath:<physPath>" as a
   stable fingerprint for the SQLite cache. This avoids re-hashing
   on-disk nixpkgs store subpaths on every evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a virtual getGitTreeHash() method to SourceAccessor that returns the
git tree/blob SHA1 for a path, if available. Implement it in:

- GitSourceAccessor: returns the OID from the git tree entry
- GitExportIgnoreSourceAccessor: returns a synthetic hash incorporating
  "exportIgnore:" prefix to distinguish from raw trees
- FilteringSourceAccessor: returns nullopt (arbitrary filters invalidate
  the tree hash)
- MountedSourceAccessor: propagates through mounts
- UnionSourceAccessor: returns first non-null result

Use this in fetchToStore2 as a third cache tier: when the fingerprint
cache misses (e.g., first run after cache clear), look up the git tree
hash in the treeHashToNarHash SQLite cache. This maps git SHA1 tree OIDs
to NAR SHA256 hashes, avoiding expensive NAR serialization when the
mapping is already known from a previous evaluation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DirtyOverlaySourceAccessor (used for tectonix zones with uncommitted
changes) had no getFingerprint() override, so it always returned nullopt.
Combined with StorePath::random() generating a different virtual path
each evaluation, the fetchToStore cache could never identify the same
dirty zone across runs, causing expensive NAR serialization every time.

Add a getFingerprint() that combines the base git accessor's fingerprint
with the actual content of dirty files. Since dirty files are few and
small, reading them is much cheaper than NAR-serializing the entire zone.
The fingerprint changes when any dirty file is modified, ensuring cache
correctness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three changes to the dirty overlay accessor:

1. readFile/readLink now route clean files through the git ODB (base
   accessor) instead of always reading from disk. Only dirty files are
   read from disk. This is important because we can't always trust
   on-disk content for clean files (e.g. sparse checkouts).

2. Removed getPhysicalPath override — clean files should not expose
   disk paths since they're served from the git ODB.

3. Fingerprint computation now uses a HashSink and caches the result.
   The fingerprint is the base accessor's fingerprint (git tree SHA)
   plus a hash of dirty file paths and content, avoiding redundant
   re-computation across multiple fetchToStore calls in the same eval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This wires in tracking into our source accessor and allows us to see
exactly what files affect a given target.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant