fix(md-exports): Fix 100% cache miss rate on Vercel by using content-only cache keys by BYK · Pull Request #16313 · getsentry/sentry-docs

BYK · 2026-02-10T00:54:30Z

Problem

The md-exports script converts pre-rendered HTML pages to markdown files for LLM consumption. It uses a file-based cache keyed on the MD5 hash of the HTML content to avoid re-processing unchanged pages. While cache hit rates were near-perfect locally (99.99%), Vercel builds consistently showed 0% cache hits — every single page was re-processed on every deploy, adding ~70 extra seconds to each build.

Root Cause

The cache key was computed from the full stripped HTML, which still contained Emotion CSS hashes (css-o2ofml, etc.) in <style data-emotion> tags and class attributes throughout the page. These hashes change between Vercel builds even for the same commit, invalidating every cache entry.

Other build-specific artifacts in the layout shell (sidebar HTML from merged PRs, font variable classes, CSS module hashes) also contributed to instability, but Emotion CSS was the primary culprit since it wasn't covered by the existing normalization.

Solution

Instead of trying to strip/normalize all unstable patterns from the full HTML, compute the cache key from only the content the pipeline actually uses:

<title> — becomes the H1 heading
<link rel="canonical"> — used for link rewriting
<div id="main"> — becomes the markdown body

Everything else (header, sidebar, footer, scripts, styles, fonts) is excluded from the cache key entirely since it's irrelevant for markdown output. Within div#main, Emotion classes and CSS module hashes are still normalized since code block components use those inside the content area.

This approach is fundamentally more robust than pattern-matching unstable elements — any new source of non-determinism in the layout shell is automatically ignored.

Results

Vercel build with warm cache:

Worker[3]: Cache stats: 2362 hits, 0 misses (0.0% miss rate)
Worker[2]: Cache stats: 2362 hits, 0 misses (0.0% miss rate)
Worker[1]: Cache stats: 2362 hits, 0 misses (0.0% miss rate)
Worker[0]: Cache stats: 2361 hits, 1 misses (0.0% miss rate)

9447/9448 cache hits (99.99%) — the single miss was a legitimate content change (updated SDK registry data). The md-exports step dropped from ~80s to ~10s.

…e keys The previous fix stripped script/link/style tags but missed build-specific hashes embedded in the HTML body itself: - next/font variable classes on <body> (e.g., __variable_c58dd6) - CSS module class name hashes (e.g., style_sidebar__iEJoR, 60+ occurrences) - /_next/static/media/ content hashes (e.g., sentry-logo-dark.fc8e1eeb.svg) These change on every Next.js rebuild even when content is unchanged, causing 100% cache miss rates. Verified locally: back-to-back builds now achieve 99.99% cache hit rate (9447/9448 files), with the single miss being a legitimate content change. Co-Authored-By: Claude <noreply@anthropic.com>

vercel · 2026-02-10T00:54:37Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
develop-docs	Ready	Preview, Comment	Feb 10, 2026 9:22pm
sentry-docs	Ready	Preview, Comment	Feb 10, 2026 9:22pm

scripts/generate-md-exports.mjs

…add cache miss diagnostics - Split stripUnstableElements() into two functions: stripUnstableElements() for safe tag-level removal (used as pipeline input) and normalizeForCacheKey() for hash normalization (used only for cache key computation). This fixes the CSS module regex corrupting actual page content like Sentry__Debug -> Sentry__X. - Add temporary diagnostic logging on cache misses that logs per-section hashes (head, main, layout) for well-known files, enabling cross-build comparison on Vercel to identify why cache hit rate is 0%. - Bump CACHE_VERSION 5 -> 6 for the changed cache key computation. Co-Authored-By: Claude <noreply@anthropic.com>

…f full HTML normalization Root cause identified: Emotion CSS hashes (css-o2ofml, etc.) in <style data-emotion> tags and class attributes change between Vercel builds even for the same commit. These were not being stripped or normalized, causing 100% cache miss rate. Instead of trying to normalize all unstable patterns in the full HTML, this change extracts only the three elements the pipeline actually uses (title, canonical URL, div#main content) and hashes just those. This makes the cache key immune to: - Layout/sidebar/header changes (from merged PRs) - Emotion CSS hash changes - Font variable class changes - CSS module hash changes in layout elements - Any other build-specific variation in the HTML shell Within div#main, we still normalize Emotion classes and CSS module hashes since code block components use those inside the content area. Bumps CACHE_VERSION 6 -> 7. Co-Authored-By: Claude <noreply@anthropic.com>

scripts/generate-md-exports.mjs

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

scripts/generate-md-exports.mjs

Cache miss root cause identified and fixed — Emotion CSS hashes were the culprit. The content-only extraction approach achieves 99.99% cache hit rate on Vercel (9447/9448 hits). Diagnostic logging is no longer needed. Co-Authored-By: Claude <noreply@anthropic.com>

sergical

you're the best!

vercel bot deployed to Preview – develop-docs February 10, 2026 00:59 View deployment

vercel bot deployed to Preview – sentry-docs February 10, 2026 01:04 View deployment

cursor bot reviewed Feb 10, 2026

View reviewed changes

scripts/generate-md-exports.mjs Outdated Show resolved Hide resolved

BYK requested a review from sergical February 10, 2026 12:14

vercel bot deployed to Preview – develop-docs February 10, 2026 14:34 View deployment

vercel bot deployed to Preview – sentry-docs February 10, 2026 14:38 View deployment

sentry bot reviewed Feb 10, 2026

View reviewed changes

scripts/generate-md-exports.mjs Show resolved Hide resolved

vercel bot deployed to Preview – develop-docs February 10, 2026 16:30 View deployment

cursor bot reviewed Feb 10, 2026

View reviewed changes

scripts/generate-md-exports.mjs Show resolved Hide resolved

vercel bot deployed to Preview – sentry-docs February 10, 2026 16:34 View deployment

BYK changed the title ~~fix(md-exports): Normalize CSS module and font hashes for stable cache keys~~ fix(md-exports): Fix 100% cache miss rate on Vercel by using content-only cache keys Feb 10, 2026

BYK enabled auto-merge (squash) February 10, 2026 21:17

sergical approved these changes Feb 10, 2026

View reviewed changes

vercel bot deployed to Preview – develop-docs February 10, 2026 21:19 View deployment

vercel bot deployed to Preview – sentry-docs February 10, 2026 21:22 View deployment

BYK merged commit a51f343 into master Feb 10, 2026
14 checks passed

BYK deleted the byk/fix-md-cache-hash-normalization branch February 10, 2026 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(md-exports): Fix 100% cache miss rate on Vercel by using content-only cache keys#16313

fix(md-exports): Fix 100% cache miss rate on Vercel by using content-only cache keys#16313
BYK merged 4 commits intomasterfrom
byk/fix-md-cache-hash-normalization

BYK commented Feb 10, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

sergical left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

BYK commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Solution

Results

Uh oh!

vercel bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergical left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

BYK commented Feb 10, 2026 •

edited

Loading

vercel bot commented Feb 10, 2026 •

edited

Loading