Skip to content

New build system using nix#1304

Draft
mvachhar wants to merge 33 commits intomainfrom
pr/mvachhar/new-build-system
Draft

New build system using nix#1304
mvachhar wants to merge 33 commits intomainfrom
pr/mvachhar/new-build-system

Conversation

@mvachhar
Copy link
Contributor

@mvachhar mvachhar commented Feb 24, 2026

This PR is a continuation of the work started by @daniel-noland to move to a proper nix based build system.

Most of this PR was built based on #1275 and the work of Claude Code using Opus 4.6. As such it should be reviewed carefully. I have tried to do the work in small chunks with the AI to get some review as we go along, but I am not a nix expert and had to rely a bit on the AI's judgement as to the best approach for certain things.

TODO:

  • Make failing new sanitizer runs optional - the sanitizers found real bugs we need to fix in separate PRs
    • These got commented out, the github action-fu to make it work is too hard for this PR
  • ~~ Create cachix "githedgehog" cache so that these runs come from the cache ~~ DONE
  • Have @Fredi-raspall, @qmonnet, and @daniel-noland rebase on this branch to make sure their workflow still works
  • Careful manual review of this PR before signing off
  • Co-pilot review of this PR before signoff DONE
  • Remove scripts/todo.sh. DONE
  • Remove scripts/install-real-nix.sh. DONE
  • Make sure the proper just targets for building and pushing containers is there (I believe we are good, but I want to confirm)

@mvachhar mvachhar requested a review from a team as a code owner February 24, 2026 16:06
@mvachhar mvachhar self-assigned this Feb 24, 2026
@mvachhar mvachhar requested review from sergeymatov and removed request for a team February 24, 2026 16:06
@mvachhar mvachhar marked this pull request as draft February 24, 2026 16:06
@mvachhar mvachhar requested review from Copilot and removed request for sergeymatov February 24, 2026 16:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the migration to a Nix-based build and CI workflow, replacing the prior compile-env/docker-based approach and wiring sysroot/toolchain configuration through Nix shells and Nix builds.

Changes:

  • Replaces the legacy compile-env + fake-nix workflow with default.nix/overlays, nix-shell, and updated just recipes.
  • Updates CI (dev.yml) to build/test via Nix targets and introduces new Nix packaging pieces (FRR packaging, platform/profile plumbing).
  • Refactors sysroot usage in Rust build scripts and updates docs to match the new Nix-first workflow.

Reviewed changes

Copilot reviewed 55 out of 56 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
testing.md Updates testing instructions to assume nix-shell tooling.
sysfs/build.rs Removes sysroot build script logic.
sysfs/Cargo.toml Drops dpdk-sysroot-helper build-dependency.
shell.nix Switches shell entrypoint to default.nix devenv.
scripts/update-doc-headers.sh Bumps KaTeX version used in docs.
scripts/todo.sh Adds a Nix-based build/test “checklist” script.
scripts/test-runner.sh Removes legacy docker-based test runner wrapper.
scripts/rust.env Removes legacy RUSTFLAGS/profile env file.
scripts/k8s-crd.env Updates gateway CRD ref env file (now likely legacy).
scripts/installl-real-nix.sh Adds helper to replace “fake nix” with real Nix install.
scripts/dpdk-sys.env Updates pinned dpdk-sys commit.
scripts/doc/custom-header.html Updates KaTeX CDN links and integrity hashes.
rust-toolchain.toml Removes rustup toolchain file in favor of Nix toolchain sourcing.
routing/Cargo.toml Cleans tokio features and adds dev tokio “full”.
npins/sources.json Updates Nix pins (crane, frr, gateway, nixpkgs, rust, rust-overlay).
nix/profiles.nix Adjusts compile/link/security profile flags and profile mapping.
nix/platforms.nix Adds platform name mapping for bluefield2 → bluefield.
nix/pkgs/frr/patches/yang-hack.patch Adds FRR/libyang-related patch.
nix/pkgs/frr/patches/xrelifo.py.fix.patch Adds FRR python/xrelfo patch.
nix/pkgs/frr/default.nix Introduces FRR derivation with configurable protocol support.
nix/pkgs/frr/clippy-helper.nix Adds split derivation for FRR “clippy” tool for cross builds.
nix/pkgs/dpdk/default.nix Simplifies DPDK build params and uses platform-provided properties.
nix/overlays/llvm.nix Reworks LLVM+Rust toolchain overlay to source versions from pins.
nix/overlays/frr.nix Adds overlay customizing dependencies for FRR static/cross builds.
nix/overlays/default.nix Registers new overlays (rust/llvm/dataplane/frr).
nix/overlays/dataplane.nix Wires platform/profile into DPDK build and tweaks deps.
nix/overlays/dataplane-dev.nix Uses llvmPackages’ stdenv and adds a static-leaning gdb override.
net/src/buffer/test_buffer.rs Cleans doc-only import; adds explicit PacketBuffer doc link.
mgmt/tests/reconcile.rs Adds VM-runner attribute to a test.
mgmt/src/tests/mgmt.rs Removes unused imports and disables a VM test during refactor.
mgmt/Cargo.toml Adds n-vm + tracing-subscriber for tests.
k8s-intf/build.rs Refactors CRD generation to OUT_DIR and env-driven inputs.
k8s-intf/Cargo.toml Swaps build deps to dpdk-sysroot-helper.
justfile Replaces compile-env/sterile/docker flows with Nix build/test/container commands.
init/build.rs Switches to dpdk_sysroot_helper::use_sysroot() behind feature gate.
init/Cargo.toml Introduces sysroot feature and makes sysroot helper optional.
hardware/src/os/mod.rs Fixes a typo in a clippy lint comment.
hardware/build.rs Switches to centralized use_sysroot().
dpdk/src/lcore.rs Updates lcore ID call to rte_lcore_id().
dpdk/build.rs Switches to centralized use_sysroot().
dpdk-sysroot-helper/src/lib.rs Changes sysroot discovery to DATAPLANE_SYSROOT and adds use_sysroot().
dpdk-sys/build.rs Updates bindgen/sysroot handling and link libs list.
development/code/running-tests.md Updates test-running docs to Nix-first commands.
default.nix Major Nix build definition: dev shell env, profiles, test archives, container tars.
dataplane/src/drivers/dpdk.rs Gates DPDK driver file behind dpdk feature.
dataplane/build.rs Switches to centralized use_sysroot() behind dpdk feature.
dataplane/Cargo.toml Makes dpdk deps optional behind a dpdk feature (default on).
cli/build.rs Removes sysroot build script logic.
cli/Cargo.toml Drops dpdk-sysroot-helper build-dependency.
README.md Updates developer setup/docs to nix-shell workflow.
Cargo.toml Updates workspace version and dependency versions.
Cargo.lock Updates lockfile to match dependency/version changes.
.github/workflows/dev.yml.old Keeps old workflow as .old (new file added).
.github/workflows/dev.yml Reworks CI to use Nix builds and archives.
.envrc Simplifies direnv env vars for the new devroot/sysroot layout.
.cargo/config.toml Updates env vars and rustflags for sysroot/devroot-based builds.

@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch 7 times, most recently from d2a1beb to cddb251 Compare February 24, 2026 21:12
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from cddb251 to 3591e49 Compare February 24, 2026 21:27
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch from 3591e49 to 921adf0 Compare February 24, 2026 21:49
@mvachhar mvachhar added ci:+vlab Enable VLAB tests labels Feb 24, 2026
@mvachhar mvachhar closed this Feb 24, 2026
@mvachhar mvachhar reopened this Feb 24, 2026
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from e3be498 to eb71953 Compare February 24, 2026 22:25
@mvachhar mvachhar added the ci:-upgrade Disable VLAB upgrade tests label Feb 24, 2026
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch 2 times, most recently from bae29e6 to 6a688dd Compare February 24, 2026 23:09
@mvachhar mvachhar force-pushed the pr/mvachhar/new-build-system branch 2 times, most recently from 81e9456 to 0059740 Compare February 24, 2026 23:19
mvachhar and others added 12 commits March 17, 2026 14:29
De-duplicate tokio feature flags in routing/Cargo.toml and add tokio
with full features to dev-dependencies for test support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Update mgmt tests for compatibility with the nix build environment:
add n_vm test dependencies, simplify test_sample_config, and add

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Update npins sources (crane, frr, gateway, nixpkgs, rust,
rust-overlay) and refresh Cargo.lock.  Bump workspace version and
update dependency versions in Cargo.toml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Update KaTeX version in custom-header.html and update-doc-headers.sh.
Fix a doc typo in hardware/src/os/mod.rs and clean up an unnecessary
include in net/src/buffer/test_buffer.rs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Daniel Noland <daniel@githedgehog.com>
Nix now sets up the entire environment, so rust.env is not needed.
Remove all docker/compile-env recipes and variables that are dead code
after the migration to nix-based builds. Rewrite build-container and
push recipes to use nix build and skopeo directly, and update remaining
recipes to call cargo without the old wrapper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all references to the old docker/compile-env workflow with the
new nix-shell based development environment across README.md, testing.md,
and development/code/running-tests.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a justfile recipe to create devroot and sysroot symlinks via nix
build, making it easy to set up the local development environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use continue-on-error with a per-matrix optional flag so that
sanitize/address and sanitize/thread failures show as warnings
instead of blocking the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Previously, we were using the committed generated file
without updating it.  This fixes that so that we now
generate the kopium gateway_agent_crd.rs file in the
target directory and properly use it.

A big change here is that the gateway agent version
now comes from npins/sources.json and not
scripts/k8s-crd.env.  The procedure to update the CRD
is now in the README.md

We also must not exclude json files from the nix sources or the
npins files are not available within nix build.

Co-authored-by: Daniel Noland <daniel@githedgehog.com>
Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
The earlier series of commits adds the address and thread sanitizer
to the dev workflows.  These fail due to real bugs that need to be
addressed.  However, that is for later commits.

While the sanitizer jobs are marked as optional and do not cause
build failure, the summary job still sees them as failed and fails.
A future commit should make the summary job somehow look at the
optional flag and not fail.

Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
Remove install-real-nix.sh and todo.sh as these are not
needed.

Signed-off-by: Manish Vachharajani <manish@githedgehog.com>
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from c0b7fa2 to 9d20a83 Compare March 17, 2026 20:30
@daniel-noland daniel-noland added ci:+release Enable VLAB release tests ci:+hlab Enable hybrid VLAB tests and removed ci:-upgrade Disable VLAB upgrade tests labels Mar 17, 2026
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from fb032ab to 106d5e0 Compare March 17, 2026 22:35
@daniel-noland daniel-noland removed ci:+release Enable VLAB release tests ci:+hlab Enable hybrid VLAB tests labels Mar 18, 2026
Add npins entries for new FRR component dependencies:
- dplane-plugin (githedgehog/dplane-plugin, master)
- dplane-rpc (githedgehog/dplane-rpc, master)
- frr-agent (githedgehog/frr-agent, master)
- frr: updated revision (stable/10.5)

Update scripts/gen-pins.sh with corresponding npins add commands.
Update npins pins:
- gateway: v0.42.0 -> v0.43.5
- nixpkgs: channel update
- perftest: updated revision
- rust: 1.93.1 -> 1.94.0
- rust-overlay: updated revision
- Remove "man" and "dev" outputs from libmd (not needed for this build)
- Remove ethtool and iproute2 from rdma-core override inputs
Add nix package definitions for FRR container components:
- nix/pkgs/dplane-rpc: C library (cmake build) for dplane RPC
- nix/pkgs/dplane-plugin: C library (cmake build) for FRR dplane plugin
- nix/pkgs/frr-agent: Rust package for FRR agent
- nix/pkgs/frr-config: FRR configuration files package including:
  - daemon configuration (etc/frr/daemons, vtysh.conf, zebra.conf)
  - user/group definitions (etc/passwd, etc/group)
  - nsswitch.conf for DNS resolution
  - container entrypoint script (libexec/frr/docker-start)
Rework nix/overlays/frr.nix:
- Add removeReferencesTo/nukeReferences to strip compiler references
  from all FRR dependencies via reworked dep function
- Rework FRR build LDFLAGS: add readline, json-c, libatomic linking;
  use --push-state/--pop-state for static linking control
- Switch to --disable-static-bin (dynamically linked FRR binaries)
- Add preFixup reference stripping with nuke-refs for FRR builds
- Add reference stripping to json_c and readline builds
- Add readline static+shared configure flags
- Switch libelf to shared
- Wire in new packages: frr-agent, frr-config, dplane-rpc, dplane-plugin

Update nix/pkgs/frr/default.nix:
- Add nukeReferences and removeReferencesTo to build inputs
- Add commented-out preFixup reference stripping code
- Remove unused inputs (nixosTests, xz)
- Remove nixosTests passthru
Rework dataplane-tar (formerly min-tar):
- Merge the two-stage min-tar + dataplane-tar build into a single
  dataplane-tar derivation
- Add busybox symlinks and workspace binary symlinks directly
- Add dontPatchShebangs, dontFixup, dontPatchElf
- Use local libc binding, remove bash/ncurses/readline deps
- Add seccomp filter comment cleanup

Add docs-builder:
- New docs-builder function for building rustdoc documentation
- Add docs attribute set with per-package and all-docs targets

Add tag parameter:
- New tag parameter (default "dev") for container image tagging
- Add VERSION=tag to cargo build environment
- Flip reference-stripping flags to removeReferencesTo* (now removing)

Rework container definitions:
- Rename package-builder -> workspace-builder
- Rename containers.libc -> containers.dataplane with proper ghcr.io
  name and production contents (busybox, fakeNss, workspace binaries)
- Update dataplane-debugger name and tag inheritance
- Add containers.frr.dataplane with full FRR stack
- Add containers.frr.host with FRR host packages
- Add Entrypoint/Cmd config to all containers
- Export docs, frr-pkgs, dataplane-tar (replacing min-tar)
Change k8s-intf/build.rs get_gateway_version() to read VERSION from
the environment instead of parsing npins/sources.json (old code
commented out for reference).
- Add frr.dataplane and frr.host to nix build matrix
- Comment out cargo deny check (TODO: re-enable before merge)
- Reformat sanitizer comments
- Split build/test/push into separate CI steps
- Add per-target push-container steps using just recipes
- Add push container for vlab step
- Use ${UPSTREAM_REGISTRY} for oci_repo instead of hardcoded ghcr.io
- Add FRR version bumping to vlab prebuild
- Remove refresh-compile-env step
- Add FRR version bumping in fabricator bump job
Add local vlab development environment tooling:
- Dockerfile for vlab container (ubuntu + docker, qemu, etc.)
- run.sh to set up TLS certs, zot OCI registry, and hhfab vlab
- control.sh helper to exec into vlab via SSH
- Zot registry configuration (cert.ini, config.json)
- .gitignore entries for TLS artifacts (*.pem, *.crt, *.key, *.csr,
  creds.json)
- Use /usr/bin/env bash instead of ${SHELL:-bash} for shell interpreter
- Add docker_sock variable and _setup_docker_env_ helper
- Change default docker socket to /var/run/docker.sock
- Simplify docker env setup: always set DOCKER_HOST and DOCKER_SOCK
  unconditionally with unix:// prefix
- Add FRR OCI image variables (oci_frr_prefix, oci_image_frr_dataplane,
  oci_image_frr_host)
- Change default oci_repo to 192.168.19.1:30000 (vlab default)
- Add sanitizer component to version string
- Remove bolero-specific sanitizers variable
- Give build recipe a default target (dataplane-tar), add mkdir -p
  results, --out-link, --print-build-logs, --show-trace, --argstr tag
- Add tag arg and quoting fixes to setup-roots recipe
- Add test recipe (builds test archive, runs with cargo nextest)
- Add docs recipe
- Rework build-container and push-container into multi-target dispatch
  supporting dataplane, dataplane-debugger, frr.dataplane, frr.host
- Add --src-daemon-host to skopeo copy calls in push-container
- Remove load-container (merged into build-container)
- Remove bolero fuzz recipes (list-fuzz-tests, fuzz, fuzz-afl)
@daniel-noland daniel-noland force-pushed the pr/mvachhar/new-build-system branch from c68bc0e to cd18b19 Compare March 18, 2026 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+vlab Enable VLAB tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants