This repository is a D-first compiler-benchmark prototype for DMD. It measures release-to-release compile behaviour, attributes compile cost with -ftime-trace, runs targeted experiments, and packages the output into review-ready evidence.
The project uses a dual-track workflow:
latest20: a literal "latest ~20 releases" window fromdownloads.dlang.orgcompatible20: a host-compatible release window for stable regression scoring on this macOS arm64 machine
Dennis asked for literal latest-release evidence and concrete regression findings. On this host, those goals conflict if they are forced into one dataset.
latest20keeps compatibility reality visible, including failures.compatible20keeps the timing dataset stable enough for meaningful regression analysis.
This repo has one simple loop:
- pick the release window or experiment path,
- run the benchmark or experiment,
- generate CSVs, charts, and Markdown reports,
- fold the strongest evidence into the submission docs.
flowchart TD
A["Pick a path\nlatest20 | compatible20 | experiments | verification"] --> B["Run the driver\nMakefile | shell scripts | dmdbench"]
B --> C["Generate raw artifacts\nCSV | trace JSON | per-task logs"]
C --> D["Reduce and analyse\nPython reports | D-native reports | SVG/PNG charts"]
D --> E["Review-facing docs\nREADME + submission/*.md"]
The repository is organized into four layers:
- D workloads:
benchmark.dis the primary compile-time workload.benchmarks/d/ctfe.d,benchmarks/d/mixed.d,benchmarks/d/semantics.d, andbenchmarks/d/templates.dprovide alternate benchmark shapes.benchmarks/dub_pgo_workspace/is a local multi-package D workspace used fordub/PGO experiments.
- Shell orchestration:
bench_releases.sh,run_trace.sh,linux_gap_close.sh,build_parser_threaded_dmd.sh, andparser_threading_compare.shdrive reproducible end-to-end runs.Makefileexposes standard entry points for common workflows.
- Python analysis:
analyze_results.pybuilds summary tables, regression tables, reports, and PNG plots.trace_phase.pyreduces-ftime-traceJSON into phase and event summaries.switch_case_experiment.pyandnot_done_experiments.pyrun targeted idea-validation experiments.
- D-native CLI:
tools/dmdbenchmirrors the core sweep, analyze, trace, switch-scale, and native not-done workflows in D.
flowchart TD
subgraph Inputs
A["benchmark.d\nprimary compile-time workload"]
A2["benchmarks/d/*.d\nalternate workload shapes"]
A3["benchmarks/dub_pgo_workspace/\nlocal dub PGO workspace"]
A4["versions_latest20.txt\nlatest release snapshot"]
A5["versions_compatible20.txt\nhost-compatible window"]
end
subgraph Orchestration
B["Makefile\nentry-point matrix"]
C["bench_releases.sh\nrelease sweeps"]
D["run_trace.sh\nftime-trace capture"]
E["switch_case_experiment.py\nswitch-width benchmark"]
F["not_done_experiments.py\nidea-validation suite"]
G["parser_threading_compare.sh\nbaseline vs threaded compiler"]
H["tools/dmdbench\nD-native CLI mirror"]
end
subgraph Generated_Artifacts
I["artifacts/latest20 + compatible20\nresults_raw.csv results_summary.csv regression_table.csv"]
J["artifacts/trace_*\nphase summary event summary granularity sweep"]
K["artifacts/switch_scaling/*\nresults_summary plot report"]
L["artifacts/not_done/*\nstatus runtime reports task outputs"]
M["artifacts/parser_thread_compare/*\ncomparison speedup diagnostics"]
N["DataAnalysisExpert/*.csv *.svg *.md\nverification summaries and chart indexes"]
end
subgraph Review_Docs
O["README.md\nrepo map + workflow"]
P["submission/*.md\nmentor packet runbooks findings manifest"]
end
A --> C
A2 --> C
A3 --> F
A4 --> C
A5 --> C
B --> C
B --> D
B --> E
B --> F
B --> G
B --> H
C --> I
D --> J
E --> K
F --> L
G --> M
H --> I
H --> J
H --> K
H --> L
B --> N
I --> O
J --> O
K --> O
L --> P
M --> P
N --> P
- Release-sweep raw data:
artifacts/latest20/results_raw.csv,artifacts/compatible20/results_raw.csv - Regression summaries:
artifacts/<track>/results_summary.csv,artifacts/<track>/regression_table.csv,artifacts/<track>/regression_table_advanced.csv - Multi-track consensus:
artifacts/regression_consensus_advanced.csv,artifacts/report_consensus.md - Trace outputs:
artifacts/trace.json,artifacts/trace_phase_summary.csv,artifacts/trace_event_summary.csv,artifacts/trace_granularity_sweep.csv - Experiment outputs:
artifacts/switch_scaling/*,artifacts/not_done/*,artifacts/parser_thread_compare/* - Command-matrix outputs:
DataAnalysisExpert/command_run_summary.csv,DataAnalysisExpert/manual_smoke_summary.csv, and the generated SVG charts - Mentor-facing docs:
submission/*.md
- Build reproducible release-history compile-time evidence for DMD.
- Separate host-compatibility reality from regression-quality timing analysis.
- Attribute compile-time cost by compiler phase with
-ftime-trace. - Validate targeted hypotheses such as switch-scaling behavior, parser-threading behavior, runtime-library kernels, and
dub/PGO workflow gaps. - Package the results into review-ready notes for mentors and upstream contributors.
A local source snapshot, excluding external/, local toolchains, generated artifacts, and command-log outputs, currently looks like this:
D: 11 files, 5,625 linesPython: 5 files, 5,625 linesShell: 10 files, 1,777 linesMarkdown: 10 files, 788 lines
Conclusion: the repository is still centered on D workloads and a D-native CLI, with Python and shell used for orchestration, analysis, and reporting.
The current verification run covered both the D-native path and the shell/Python path.
- Benchmark compilation:
benchmark.dand every file inbenchmarks/d/*.dnow compile cleanly with DMDv2.112.0.- The local benchmark executable runs successfully and prints
rows=6653 aggregate=4230614.
dubverification:tools/dmdbenchbuilds successfully with workspace-localdub.- The checked-in
dubworkspace packages all passdub testwhenDUB_HOME=.tmp-dub-homeis set.
- Smoke verification summary:
make verify-smokenow writesDataAnalysisExpert/smoke_command_summary.csv.- The current smoke matrix records 11 command runs, all passing.
- It covers release sweep smoke, analysis, trace, switch scaling, runtime-library kernels, cached
dubPGO, delegated Linux workflows, and parser-threading checks.
- D-native CLI smoke summary:
DataAnalysisExpert/manual_smoke_summary.csvremains as a focused D-native/manual reference with 11 passing command runs.- Verified commands include
dmdbench analyze,dmdbench trace,dmdbench switch-scale,dmdbench not-done --list-tasks,dmdbench not-done --native, and a compatible-track release sweep smoke run.
- Current compatible-track sweep smoke:
artifacts/verification_20260320/bench_smoke/results_raw.csvcontains 20 successfulcompatible20rows.- Fastest release in the smoke run:
2.090.1at1867 ms. - Slowest release in the smoke run:
2.096.0at5283 ms.
- Trace smoke summary:
artifacts/verification_20260320/run_trace/trace_phase_summary.csvshowssemantic_analysisas the top phase at68.52%, followed byctfeat19.27%.
- Full verification matrix:
make verify-fullnow writesDataAnalysisExpert/command_run_summary.csv.- The current full matrix records 16 command runs, all passing.
- Slowest command in the current full run on March 20, 2026:
broader-gistat693 s.
The current verification refresh generated these named chart files:
DataAnalysisExpert/command_status_counts.svgDataAnalysisExpert/command_duration_by_target.svgDataAnalysisExpert/full_status_counts.svgDataAnalysisExpert/full_duration_by_target.svgDataAnalysisExpert/smoke_status_counts.svgDataAnalysisExpert/smoke_duration_by_target.svgDataAnalysisExpert/manual_smoke_status_counts.svgDataAnalysisExpert/manual_smoke_duration_by_target.svgartifacts/verification_20260320/dmdbench_analyze/compile_time_trend.svgartifacts/verification_20260320/dmdbench_analyze/artifact_size_trend.svgartifacts/verification_20260320/run_trace/trace_phase_bar.pngartifacts/verification_20260320/python_switch/compile_time_vs_cases.png
The graph indexes are recorded in:
DataAnalysisExpert/chart_index.mdDataAnalysisExpert/full_chart_index.mdDataAnalysisExpert/smoke_chart_index.mdDataAnalysisExpert/manual_smoke_chart_index.md
The repository now exposes two verification tiers and one maintenance tier:
make verify-smoke: fast local readiness checks with cached inputs and delegated CI summaries for Linux-only workflowsmake verify-full: broader command coverage with chart generation for the full matrix summaryTIMEOUT_SCALE=<n> make verify-full: increase full-matrix ceilings when you want a longer local run without editing the scriptmake refresh-latest-snapshot: refreshesversions_latest20.txtexplicitlymake bootstrap-external-cache: populates the release-archive cache and the cacheddlang/dubsource checkout for offline runs
Current verification summary files:
- Smoke summary:
DataAnalysisExpert/smoke_command_summary.csv - Full summary:
DataAnalysisExpert/command_run_summary.csv - Manual smoke reference:
DataAnalysisExpert/manual_smoke_summary.csv
Current verification chart prefixes:
- Smoke charts:
DataAnalysisExpert/smoke_status_counts.svg,DataAnalysisExpert/smoke_duration_by_target.svg - Full charts:
DataAnalysisExpert/full_status_counts.svg,DataAnalysisExpert/full_duration_by_target.svg - Manual smoke charts:
DataAnalysisExpert/manual_smoke_status_counts.svg,DataAnalysisExpert/manual_smoke_duration_by_target.svg
- macOS for the current local workflow, plus
bash,curl,tar, andpython3 matplotlibif you want PNG plots from the Python path- A local D toolchain such as
./.locald/dmd-nightly/osx/bin/dmd - For sandboxed or restricted environments, set a local
dubhome:
export DUB_HOME="$PWD/.tmp-dub-home"
mkdir -p "$DUB_HOME"Recommended Python setup:
python3 -m venv .venv
./.venv/bin/pip install matplotlib# 1) Refresh the pinned latest snapshot only when you want to update it
make refresh-latest-snapshot
# 2) Bootstrap caches for offline release sweeps and dub PGO source
make bootstrap-external-cache
# 3) Run both tracks with the shell workflow
./bench_releases.sh --track both --latest-source snapshot --archive-source cache
# 4) Analyze both tracks with Python
./.venv/bin/python ./analyze_results.py \
--input-dir artifacts \
--tracks latest20,compatible20 \
--out-dir artifacts
# 5) Install a workspace-local D toolchain if needed
curl -fsSL https://dlang.org/install.sh | bash -s -- -p ./.locald install dmd-nightly
# 6) Run trace attribution
./run_trace.sh \
--python-bin ./.venv/bin/python \
--dmd-bin ./.locald/dmd-nightly/osx/bin/dmd \
--granularity 1 \
--granularity-sweep 1,10,50,100
# 7) Run the switch-scaling experiment
./.venv/bin/python ./switch_case_experiment.py \
--compiler ./.locald/dmd-nightly/osx/bin/dmd \
--case-counts 100,1000,10000 \
--runs 7 \
--warmups 2 \
--out-dir artifacts/switch_scaling
# 8) Run the broader not-done suite with the cached dub source
./.venv/bin/python ./not_done_experiments.py \
--out-dir artifacts/not_done \
--dub-upstream-source cached
# 9) Run the fast readiness matrix and charts
make verify-smoke
# 10) Run the broader matrix when you want more coverage
make verify-fullThe repository includes a D-native CLI that covers the core workflow.
export DUB_HOME="$PWD/.tmp-dub-home"
mkdir -p "$DUB_HOME"
# Build
(cd tools/dmdbench && ../../.locald/dmd-nightly/osx/bin/dub build)
# Sweep
./tools/dmdbench/bin/dmdbench sweep --track compatible20 --latest-source snapshot --archive-source cache
# Prepare caches only
./tools/dmdbench/bin/dmdbench sweep --track latest20 --latest-source snapshot --archive-source cache --prepare-cache-only
# Analyze
./tools/dmdbench/bin/dmdbench analyze \
--input-dir artifacts \
--tracks latest20,compatible20 \
--out-dir artifacts
# Trace
./tools/dmdbench/bin/dmdbench trace \
--dmd-bin ./.locald/dmd-nightly/osx/bin/dmd \
--granularity 1 \
--granularity-sweep 1,10,50,100
# Switch scaling
./tools/dmdbench/bin/dmdbench switch-scale \
--compiler ./.locald/dmd-nightly/osx/bin/dmd \
--case-counts 100,1000,10000
# Native not-done subset
./tools/dmdbench/bin/dmdbench not-done --list-tasks
./tools/dmdbench/bin/dmdbench not-done \
--native \
--tasks zero_cost,gc_kernels,aa_kernels,linker_strip,float_to_string_kernels,phobos_sectionsBenchmark-suite selection for release sweeps is available through --bench-suite core|ctfe|templates|semantics|mixed.
- Compile benchmarking uses
-cmode so the metric focuses on compiler work instead of linker noise. - Artifact size is compile-output object size, not final linked executable size.
- Regression flags remain intentionally conservative: percentage jump plus non-overlapping confidence intervals.
- The parser-prototype frontend change is preserved in
patches/external_dmd_parser_parallel_prototype.patch, and the helper/CI path pinsexternal/dmdto upstream commit4faeee39cf33c1e3491b7e1da83a71111f05606fbefore applying it. - Linux-only workflows (
strict-perf-probe,linux-gap-close) now return delegated CI pass summaries on non-Linux hosts instead of host-mismatch failures. latest20is snapshot-first by default. Normal runs use the pinnedversions_latest20.txtfile and only refresh when you explicitly request--latest-source refreshormake refresh-latest-snapshot.- Release sweeps are cache-first by default. Normal runs use the local archive cache and only download archives during explicit bootstrap flows such as
make bootstrap-external-cache. - The
dub_pgoworkflow is cache-first by default and reusesartifacts/cache/dub_pgo/dlang__dubunless you explicitly bootstrap or point it at another checkout. - The parser prototype has both
coarseandnarrowlock modes.narrowis the real split parse/commit path, but it is still performance-partial on this host.
Running not_done_experiments.py writes:
artifacts/not_done/zero_cost_ldc/*:std.range/std.algorithmvsforeachwithldc2 -O3artifacts/not_done/libphobos_sections/*: section-size sort forlibphobos2.aartifacts/not_done/linker_strip_unused_data/*: linker dead-strip behaviorartifacts/not_done/c_vs_d_assembly/*:clangvsldc2assembly comparisonsartifacts/not_done/large_char_array_4gb/*:char[]larger-than-4GB truncation probeartifacts/not_done/compiler_fuzz/*: random mutation fuzz runs overdmd/compiler/testseedsartifacts/not_done/allocator_compare/*: allocator swap comparisonartifacts/not_done/ast_field_order/*: AST field-order experiment artifactsartifacts/not_done/dmd_profile_compare/*:dmd -profilecomparison artifactsartifacts/not_done/large_non_zero_init_structs/*: non-zero-init struct scan resultsartifacts/not_done/lexer_parser_parallel/*: parser-threading prototype artifactsartifacts/not_done/perfetto/*: Perfetto capture helpers and outputsartifacts/not_done/status.md: checklist-style done/blocked summary