docs: CI performance and warm Docker CI research by paddymul · Pull Request #613 · buckaroo-data/buckaroo

paddymul · 2026-03-01T17:51:09Z

Summary

CI-performance.md: Analysis of current Depot CI — latency breakdown, runner tier comparison (2/4/8 CPU), per-job timing, path-gated optimization proposals
warm-docker-ci.md: Research into replacing Depot with a persistent Hetzner server running warm Docker containers — framework comparison, Dockerfile structure, sidecar pattern, CPU contention analysis, Hetzner Cloud vs Dedicated, provisioning automation

Context

Research/brainstorming docs, no code changes. Captures findings for future reference when implementing a faster CI setup.

🤖 Generated with Claude Code

Research into current Depot CI performance (latency breakdown, runner tier comparison, path-gated optimizations) and a proposed warm Docker CI setup on Hetzner (sidecar containers, lockfile-hash caching, Playwright parallelism, Cloud vs Dedicated comparison). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-01T17:52:52Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22636557718

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.12.12.dev22636557718

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.12.12.dev22636557718" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18a7fbd4de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-01T17:53:48Z

docs/llm/research/warm-docker-ci.md

+  pytest -vv tests/unit/ &
+  (cd packages/buckaroo-js-core && pnpm test) &
+  wait


Propagate background test failures in trigger script

This script backgrounds both test commands and then calls wait with no job IDs; in Bash, help wait states that when no IDs are given it waits for all children and returns status 0, so a failing pytest or pnpm test can still produce an overall success status. In the CI-trigger context here, that creates false-green runs and can let broken commits appear healthy.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-01T17:53:48Z

docs/llm/research/warm-docker-ci.md

+
+# 1. Activate rescue system (~5s API call)
+curl -s -u "$AUTH" "$API/boot/$SERVER_NUM/rescue" \
+  -d "os=linux&authorized_key[]=$SSH_FINGERPRINT"


Define SSH key variable before invoking Robot rescue API

The rebuild script uses authorized_key[]=$SSH_FINGERPRINT but never initializes SSH_FINGERPRINT, so running the snippet as written sends an empty key and the later SSH wait loops cannot authenticate to the rescue system. This makes the documented wipe/reprovision flow fail unless callers add hidden external setup.

Useful? React with 👍 / 👎.

- Pin uv/node/pnpm versions (don't track releases, bump when needed) - Bump Node 20 → 22 LTS - Add HETZNER_SERVER_ID/IP to .env.example - Add development verification section (how Claude tests each script locally) - Add monitoring & alerting section (health endpoint, systemd watchdog, disk hygiene, dead man's switch) - Expand testing & ongoing verification (Depot as canary, deprecation criteria) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds ci/hetzner/ with everything needed to run CI on a persistent CCX33: - Dockerfile: Ubuntu 24.04, uv 0.6.6, Python 3.11-3.14, Node 22 LTS, pnpm 9.10.0, all deps pre-installed, Playwright chromium - docker-compose.yml: warm sidecar container (sleep infinity), bind-mounts repo + logs, named volume for Playwright browsers - webhook.py: Flask on :9000, HMAC-SHA256, per-branch cancellation via pkill, /health + /logs/<sha> endpoints, systemd watchdog - run-ci.sh: 5-phase orchestrator (parallel lint+test-js+test-py-3.13 → build-wheel → sequential py 3.11/3.12/3.14 → parallel mcp+smoke → sequential playwright) with lockfile-aware dep skipping - lib/status.sh: GitHub commit status API helpers - lib/lockcheck.sh: SHA256 lockfile comparison, rebuilds deps only on change - cloud-init.yml: one-shot CCX33 provisioning - .env.example: template for required secrets Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add lib/status.sh (GitHub commit status API) and lib/lockcheck.sh (lockfile hash comparison for warm dep skipping). Unblock them from the lib/ gitignore rule which was intended for Python venv dirs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove owner:ci:ci from write_files (ci user doesn't exist yet at that stage) - Fix echo runcmd entry with colon causing YAML dict parse error - status.sh: skip GitHub API calls gracefully when GITHUB_TOKEN unset Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…it branch fix - Add build-essential + libffi-dev + libssl-dev so cffi can compile - cloud-init: clone --branch main (not default), add safe.directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e unused import - Dockerfile: git config --system safe.directory /repo so git checkout works inside the container (bind-mount owned by ci on host, root in container) - test_playwright_jupyter.sh: add --allow-root so JupyterLab starts as root - webhook.py: remove unused import signal Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… SHA Dockerfile COPYs ci/hetzner/run-ci.sh and lib/ into /opt/ci-runner/. run-ci.sh sources lib from CI_RUNNER_DIR (/opt/ci-runner/) instead of /repo/ci/hetzner/lib/, so they survive `git checkout <sha>` even when the SHA has no ci/hetzner/ directory (e.g. commits on main branch). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

job_lint_python was running uv sync --dev --no-install-project on the 3.13 venv, which strips --all-extras packages (e.g. pl-series-hash) because optional extras require the project to be installed. This ran in parallel with job_test_python_3.13, causing a race condition that randomly removed pl-series-hash from the venv before tests ran. ruff is already installed in the venv from the image build — no sync needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

JupyterLab refuses to start as root without --allow-root. Rather than patching every test script, bake c.ServerApp.allow_root = True into /root/.jupyter/jupyter_lab_config.py in the image. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- mp_timeout tests: forkserver subprocess spawn takes >1s in Docker (timeout) - test_server_killed_on_parent_death: SIGKILL propagation differs in containers - Python 3.14.0a5: segfaults on pytest startup (CPython pre-release bug) All three disabled with a note to revisit once timing/stability is known. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Documents all 9 bugs fixed during bringup, known Docker-incompatible tests (disabled), and final timing: 8m59s wall time, all jobs passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each version has its own venv at /opt/venvs/3.11-3.14 — no shared state, safe to run concurrently. Saves ~70-80s wall time on CCX33. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Run 7 (warm, sequential Phase 3): 8m23s Run 8 (warm, parallel Phase 3): 7m21s — saves 1m07s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All 5 jobs bind to distinct ports (6006/8701/2718/8765/8889) — no port conflicts. Redirect PLAYWRIGHT_HTML_OUTPUT_DIR per job to avoid playwright-report/ write collisions. Expected saving: ~3m. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Every run should collect mpstat data so we can correlate flakes with CPU contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Start pw-jupyter alongside other wheel-dependent jobs instead of waiting for pw-server/marimo/wasm-marimo to finish. With early warmup (exp 28) + window.jupyterapp kernel check (exp 21), pw-jupyter should be reliable under CPU contention. Also adds mpstat CPU sampling to every CI run. Expected: 2m25s → ~1m44s if pw-jupyter passes under contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mpstat not installed in container, vmstat is. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

pw-jupyter passes 7/7 under CPU contention (40-75%) with window.jupyterapp + early warmup. Heavyweight PW gate was unnecessary — total CI drops from 2m25s to 1m43s (-42s). Also fixes CPU monitoring to use vmstat (mpstat not in container). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With window.jupyterapp kernel check + early warmup + no heavyweight gate, CPU during pw-jupyter is only 6-20%. Increase from P=4 (3 batches: 4+4+1) to P=9 (1 batch: all 9 at once). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Revert PARALLEL=9 → 4 (P=9 too many processes, Exp 31 confirmed) - Move pw-wasm-marimo from Wave 0 to wheel-dependent (needs widget.js) - Only test-python-3.13 in Wave 0 for fast signal - Delay 3.11/3.12/3.14 by 5s after wheel-dependent jobs start to reduce CPU contention during PW job startup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Exp 31: PARALLEL=9 still too slow (4m+), confirmed P=4 optimal. Exp 32: lean Wave 0 + defer pytest = 1m51s median, +8s vs Exp 30 (1m43s). Exp 30 remains best config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- pw-jupyter starts first (critical path) at PARALLEL=6 - Other PW jobs staggered every 5s: marimo → wasm-marimo → server → pytest - Single JUPYTER_PARALLEL variable controls concurrency - Fine-grain CPU monitoring via /proc/stat at 100ms intervals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…chdog - Extract warmup_one_kernel to top-level so it's available between batches - After shutdown_kernels_on_port, re-warm next batch's servers via WebSocket nudge (fixes batch 2 hang — kernels stuck in "starting" without nudge) - Add timeout 120 on pw-jupyter to prevent infinite hangs - Add 210s CI watchdog (kill -TERM 0) to cap total CI time - Add Exp 34 (early pnpm install) to future experiments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert PARALLEL back to 6 and BASE_PORT to 8889. Add pre-run cleanup, between-batch re-warmup, and 120s/210s timeouts as permanent improvements. P=9 failed all 4 attempts (0s/1s/2s stagger, port 8900) due to CPU starvation: 9 servers + 9 kernels + 9 Chromiums = ~27 processes on 16 vCPU. P=6 batched (6+3) passes 9/9 notebooks in 66s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 1, 2026 17:52 — with GitHub Actions Inactive

chatgpt-codex-connector bot reviewed Mar 1, 2026

View reviewed changes

paddymul and others added 2 commits March 1, 2026 13:23

WIP, questions for claude

963b499

paddymul temporarily deployed to testpypi March 1, 2026 18:30 — with GitHub Actions Inactive

paddymul and others added 3 commits March 1, 2026 13:55

paddymul had a problem deploying to testpypi March 1, 2026 19:16 — with GitHub Actions Error

fix: Dockerfile needs build-essential for cffi/cryptography; cloud-in…

5ee2550

…it branch fix - Add build-essential + libffi-dev + libssl-dev so cffi can compile - cloud-init: clone --branch main (not default), add safe.directory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 1, 2026 19:19 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 19:43 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:12 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:37 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 20:58 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 1, 2026 21:01 — with GitHub Actions Inactive

docs: update hetzner-ci-bringup with final clean run results

a373b9b

Documents all 9 bugs fixed during bringup, known Docker-incompatible tests (disabled), and final timing: 8m59s wall time, all jobs passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:26 — with GitHub Actions Inactive

perf: parallelize Phase 3 Python tests (3.11/3.12/3.14)

f05e4d7

Each version has its own venv at /opt/venvs/3.11-3.14 — no shared state, safe to run concurrently. Saves ~70-80s wall time on CCX33. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:44 — with GitHub Actions Inactive

docs: add warm cache and parallel Phase 3 timing results

1773af1

Run 7 (warm, sequential Phase 3): 8m23s Run 8 (warm, parallel Phase 3): 7m21s — saves 1m07s Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 2, 2026 02:55 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 3, 2026 16:18 — with GitHub Actions Inactive

paddymul and others added 2 commits March 3, 2026 11:23

docs: add CPU monitoring requirement for CI experiments

d24bbc4

Every run should collect mpstat data so we can correlate flakes with CPU contention. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 16:27 — with GitHub Actions Inactive

fix: use vmstat instead of mpstat for CPU monitoring

526a120

mpstat not installed in container, vmstat is. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 16:35 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 3, 2026 16:47 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 3, 2026 16:52 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 3, 2026 17:01 — with GitHub Actions Inactive

paddymul and others added 2 commits March 3, 2026 12:08

paddymul temporarily deployed to testpypi March 3, 2026 17:11 — with GitHub Actions Inactive

paddymul temporarily deployed to testpypi March 3, 2026 17:35 — with GitHub Actions Inactive

fix: remove local outside function in batch re-warmup loop

076f40f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 17:41 — with GitHub Actions Inactive

feat: exp 33 — try PARALLEL=9 for pw-jupyter (all 9 notebooks at once)

0e98e13

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 17:48 — with GitHub Actions Inactive

feat: exp 33 — stagger PARALLEL=9 Chromium launches by 1s

75a81b2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 17:53 — with GitHub Actions Inactive

feat: exp 33 — try 2s stagger for PARALLEL=9

b566296

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 17:57 — with GitHub Actions Inactive

paddymul and others added 2 commits March 3, 2026 13:00

feat: exp 33 — BASE_PORT=8900 for PARALLEL=9 (test port theory)

553bea0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: add pre-run cleanup to run-ci.sh — kill stale processes, rm temp…

9dcc5e0

… files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddymul temporarily deployed to testpypi March 3, 2026 18:04 — with GitHub Actions Inactive

paddymul deployed to testpypi March 3, 2026 18:11 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: CI performance and warm Docker CI research#613

docs: CI performance and warm Docker CI research#613
paddymul wants to merge 131 commits intomainfrom
docs/ci-research

paddymul commented Mar 1, 2026

Uh oh!

github-actions bot commented Mar 1, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paddymul commented Mar 1, 2026

Summary

Context

Uh oh!

github-actions bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 TestPyPI package published

MCP server for Claude Code

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 1, 2026 •

edited

Loading