TanStack · Sheraff · Feb 1, 2026 · Feb 1, 2026 · Feb 1, 2026 · Feb 1, 2026
diff --git a/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-empty.png b/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-empty.png
diff --git a/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-links.png b/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-links.png
diff --git a/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-nested.png b/public/blog-assets/tanstack-start-ssr-performance-600-percent/elu-nested.png
diff --git a/src/blog/tanstack-start-ssr-performance-600-percent.md b/src/blog/tanstack-start-ssr-performance-600-percent.md
@@ -0,0 +1,304 @@
+---
+published: 2026-02-04
+authors:
+  - Manuel Schiller
+  - Florian Pellet
+title: 'From 3000ms to 14ms: CPU profiling of TanStack Start SSR under heavy load'
+title: 'Profile, Fix, Repeat: 2x SSR Throughput in 20 PRs'
+title: '99.5% Latency Reduction in 20 PRs'
+title: '231x Latency Drop: SSR Flamegraphs under heavy load'
+title: '343x Faster Latency p95: Profiling SSR Hot Paths in TanStack Start'
+---
+
+## Executive summary
+
+We improved TanStack Router's SSR performance dramatically. Under sustained load (100 concurrent connections, 30 seconds):
+
+- **Throughput**: 477 req/s → 1,041 req/s (**2.2x**)
+- **Average latency**: 3,171ms → 14ms (**231x faster**)
+- **p95 latency**: 10,001ms (timeout) → 29ms (**343x faster**)
+- **Success rate**: 75% → 100% (the server stopped failing under load)
+
+For SSR-heavy deployments, this translates directly to lower hosting costs, the ability to handle traffic spikes without scaling, and eliminating user-facing errors.
+
+We did it with a repeatable process, not a single clever trick:
+
+- **Measure under load**, not in microbenchmarks.
+- Use CPU profiling to find the highest-impact work.
+- Remove entire categories of cost from the server hot path:
+  - avoid `URL` construction/parsing when it is not required
+  - avoid reactivity work during SSR (subscriptions, structural sharing, batching)
+  - add server-only fast paths behind a build-time `isServer` flag
+  - avoid `delete` in performance-sensitive code
+
+The changes span ~20 PRs; we highlight the highest-impact patterns below. This article focuses on methodology and mechanisms you can reuse in any SSR framework.
+
+## What we optimized (and what we did not)
+
+This work started after `v1.154.4` and targets server-side rendering performance. The goal was to increase throughput and reduce server CPU time per request while keeping correctness guarantees.
+
+We are not claiming that any single line of code is "the" reason. This work spanned over 20 PRs, with still more to come. And every change was validated by:
+
+- a stable load test
+- a CPU profile (flamegraph)
+- a before/after comparison on the same benchmark endpoint
+
+## Methodology: feature-focused endpoints + flamegraphs
+
+### Why feature-focused endpoints
+
+We did not benchmark "a representative app page". We used endpoints that exaggerate a feature so the profile is unambiguous:
+
+- **`links-100`**: renders ~100 links to stress link rendering and location building.
+- **`layouts-26-with-params`**: deep nesting + params to stress matching and path/param work.
+- **`empty`**: minimal route to establish a baseline for framework overhead.
+
+This is transferable: isolate the subsystem you want to improve, and benchmark that.
+
+### Load generation with `autocannon`
+
+We used `autocannon` to generate a 30s sustained load. We tracked:
+
+- req/s
+- latency distribution (avg, p95, p99)
+
+Example command (adjust concurrency and route):
+
+```bash
+autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100
+```
+
+### CPU profiling with `@platformatic/flame`
+
+While the server handled load, we recorded CPU profiles using `@platformatic/flame`.
+
+How we read the flamegraph:
+
+- Focus on **self time** first. That is where the CPU is actually spent, not just where time is waiting on children.
+- Fix one hotspot, re-run, and re-profile.
+- Prefer changes that remove work in the steady state, not just shift it.
+
+Placeholders you should replace with real screenshots:
+
+- `<!-- FLAMEGRAPH: links-100 before -->`
+- `<!-- FLAMEGRAPH: links-100 after -->`
+- `<!-- FLAMEGRAPH: layouts-26-with-params before -->`
+- `<!-- FLAMEGRAPH: layouts-26-with-params after -->`
+
+### Reproducing these benchmarks
+
+**Environment:**
+
+Our benchmarks were stable enough to produce very similar results on a range of setups. However here are the exact environment details we used to run the benchmarks:
+
+- Node.js: v24.12.0
+- Hardware: Macbook Pro M3
+- OS: macOS 15.7
+
+**Running the benchmark:**
+
+For fast iteration, we setup a single `pnpm bench` command what would concurrently
+
+- start the built server through `@platformatic/flame` to profile it
+  ```sh
+  flame run ./dist/server.mjs
+  ```
+- run `autocannon` to stress the server by firing many requests at it
+  ```sh
+  autocannon -d 30 -c 100 --warmup [ -d 2 -c 20 ] http://localhost:3000/bench/links-100
+  ```
+
+## Finding 1: `URL` is expensive in server hot paths
+
+### The mechanism
+
+In our SSR profiles, `URL` construction/parsing showed up as significant self-time in the hot path on link-heavy endpoints. The cost comes from doing real work (parsing/normalization) and allocating objects. When you do it once, it does not matter. When you do it per link, per request, it dominates.
+
+### The transferable pattern
+
+Use cheap predicates first, then fall back to heavyweight parsing only when needed.
+
+- If a value is clearly internal (eg starts with `/`, `.`, `..`), don't try to parse it as an absolute URL.
+- If a feature is only needed in edge cases (eg rewrite logic), keep it off the default path.
+
+### What we changed
+
+```typescript
+// Before: always parse
+const url = new URL(to, base)
+
+// After: check first, parse only if needed
+if (safeInternalUrl(to)) {
+  // fast path: internal navigation, no parsing needed
+} else {
+  const url = new URL(to, base)
+  // ...external URL handling
+}
+```
+
+The `safeInternalUrl` check can be orders of magnitude cheaper than constructing a `URL` object[^url-cost] as long as we're ok with some false negatives in a few cases.
+
+See: [#6442](https://github.com/TanStack/router/pull/6442), [#6447](https://github.com/TanStack/router/pull/6447), [#6516](https://github.com/TanStack/router/pull/6516)
+
+### How we proved it internally
+
+This claim should be backed by your flamegraphs and measurements, not by opinion.
+
+- `<!-- EVIDENCE: flamegraph shows URL construction/parsing as top self-time hotspot before -->`
+- `<!-- EVIDENCE: same hotspot reduced/removed after -->`
+
+## Finding 2: SSR does not need reactivity
+
+### The mechanism
+
+SSR renders once per request.[^ssr-streaming] There is no ongoing UI to reactively update, so on the server:
+
+- store subscriptions add overhead but provide no benefit
+- structural sharing[^structural-sharing] (replace-equal) reduces re-renders, but SSR does not re-render
+- batching reactive notifications is irrelevant if nothing is subscribed
+
+### The transferable pattern
+
+If you have a runtime that supports both client reactivity and SSR, separate them:
+
+- on the server: compute a snapshot and return it
+- on the client: subscribe and use structural sharing to reduce render churn
+
+This is the difference between "server = a function" and "client = a reactive system".
+
+### What we changed
+
+```typescript
+// Before: same code path for client and server
+store.subscribe(() => {
+  /* ... */
+}) // overhead on server
+const next = replaceEqualDeep(prev, value) // unnecessary structural sharing
+
+// After: server gets a simple snapshot
+if (isServer) {
+  return computeSnapshot() // no subscriptions, no structural sharing
+}
+```
+
+See: [#6497](https://github.com/TanStack/router/pull/6497), [#6502](https://github.com/TanStack/router/pull/6502)
+
+## Finding 3: server-only fast paths are worth it (when gated correctly)
+
+### The mechanism
+
+Client code cares about bundle size. Server code cares about CPU time per request. Those constraints are different.
+
+If you can guard a branch with a **build-time constant** like `isServer`, you can:
+
+- add server-only fast paths for common cases
+- keep the general algorithm for correctness and edge cases
+- allow bundlers to delete the server-only branch from client builds
+
+In TanStack Router, `isServer` is provided via build-time resolution (client: `false`, server: `true`, dev/test: `undefined` with fallback). Modern bundlers like Vite, Rollup, and esbuild perform dead code elimination (DCE)[^dce], removing unreachable branches when the condition is a compile-time constant.
+
+### The transferable pattern
+
+Write two implementations:
+
+- **fast path** for the common case
+- **general path** for correctness
+
+And gate them behind a build-time constant so you don't ship server-only logic to clients.
+
+### What we changed
+
+```typescript
+// isServer is resolved at build time:
+// - Vite/bundler replaces it with `true` (server) or `false` (client)
+// - Dead code elimination removes the unused branch
+
+if (isServer) {
+  // server-only fast path (removed from client bundle)
+  return fastServerPath(input)
+}
+// general algorithm (used on client, fallback on server in dev)
+return generalPath(input)
+```
+
+See: [#4648](https://github.com/TanStack/router/pull/4648), [#6505](https://github.com/TanStack/router/pull/6505), [#6506](https://github.com/TanStack/router/pull/6506)
+
+## Finding 4: `delete` can be expensive
+
+### The mechanism
+
+Modern engines optimize property access using object "shapes" (e.g. V8 HiddenClasses / JSC Structures) and inline caches. `delete` changes an object's shape and can force a slower internal representation (e.g. dictionary/slow properties), which can disable or degrade those optimizations and deopt optimized code.[^v8-fast-properties][^webkit-delete-ic]
+
+### The transferable pattern
+
+Avoid `delete` in hot paths. Prefer patterns that don't mutate object shapes in-place:
+
+- set a property to `undefined` (when semantics allow)
+- create a new object without the key (object rest destructuring) when you need a "key removed" shape
+
+### What we changed
+
+```typescript
+// Before: mutates shape
+delete linkProps.activeProps
+delete linkProps.inactiveProps
+
+// After: create new object without keys
+const { activeProps, inactiveProps, ...rest } = linkProps
+return rest
+```
+
+See: [#6456](https://github.com/TanStack/router/pull/6456), [#6515](https://github.com/TanStack/router/pull/6515)
+
+## Results
+
+Benchmark: placeholder text, should link to Matteo's article.
+
+### Summary
+
+| Metric       |    Before |      After | Improvement  |
+| ------------ | --------: | ---------: | ------------ |
+| Success Rate |    75.52% |       100% | +32%         |
+| Throughput   | 477 req/s | 1041 req/s | +118% (2.2x) |
+| Avg Response |   3,171ms |     13.7ms | 231x faster  |
+| p(90)        |  10,001ms |     23.0ms | 435x faster  |
+| p(95)        |  10,001ms |     29.1ms | 343x faster  |
+
+The "before" numbers show a server under severe stress: 25% of requests failed (likely timeouts), and p90/p95 hit the 10s timeout ceiling. After the optimizations, the server handles the same load comfortably with sub-30ms tail latency and zero failures.
+
+To be clear: TanStack Router was not broken before these changes. Under normal traffic, SSR worked fine. These numbers reflect behavior under _sustained heavy load_—the kind you see during traffic spikes or load testing. The optimizations ensure the server degrades gracefully instead of falling over.
+
+### Flamegraph evidence slots
+
+- `<!-- FLAMEGRAPH: links-100 before -->`
+- `<!-- FLAMEGRAPH: links-100 after -->`
+- `<!-- FLAMEGRAPH: layouts-26-with-params before -->`
+- `<!-- FLAMEGRAPH: layouts-26-with-params after -->`
+
+## Conclusion
+
+The biggest gains came from removing whole categories of work from the server hot path. The general lesson is simple: throughput improves when you eliminate repeated work, allocations, and unnecessary generality in the steady state.
+
+There were many other improvements (client and server) not covered here. SSR performance work is ongoing.
+
+## Fill-in checklist before publishing
+
+- [x] Replace throughput placeholders with final numbers.
+- [x] Replace latency placeholders (avg/p90/p95) with final numbers.
+- [ ] Insert flamegraph screenshots and annotate the "before" hotspots and "after" removal.
+- [ ] Ensure every external claim has a citation and every internal claim has evidence.
+- [ ] Add `layouts-26-with-params` benchmark results (if desired).
+
+## References
+
+[^v8-fast-properties]: V8 team, "Fast properties in V8" `https://v8.dev/blog/fast-properties`
+
+[^webkit-delete-ic]: WebKit, "A Tour of Inline Caching with Delete" `https://webkit.org/blog/10298/inline-caching-delete/`
+
+[^structural-sharing]: Structural sharing is a pattern from immutable data libraries (Immer, React Query, TanStack Store) where unchanged portions of data structures are reused by reference to minimize allocation and enable cheap equality checks.
+
+[^ssr-streaming]: With streaming SSR and Suspense, the server may render multiple chunks, but each chunk is still a single-pass render with no reactive updates.
+
+[^url-cost]: The WHATWG URL Standard requires significant parsing work: scheme detection, authority parsing, path normalization, query string handling, and percent-encoding. See the [URL parsing algorithm](https://url.spec.whatwg.org/#url-parsing) for the full state machine.
+
+[^dce]: Dead code elimination is a standard compiler optimization. See esbuild's documentation on [tree shaking](https://esbuild.github.io/api/#tree-shaking) and Rollup's [tree-shaking guide](https://rollupjs.org/introduction/#tree-shaking).