Problem
Obol Stack currently runs exclusively on k3d (k3s-in-Docker). This works well for most users, but introduces friction in several real-world scenarios:
- Bare-metal and VPS deployments — running k3s-in-Docker on cloud VMs adds overhead and complexity that isn't necessary on a Linux host
- Performance-sensitive workloads — the Docker abstraction adds I/O overhead for persistent volumes and networking, which matters for blockchain nodes syncing large amounts of data
- Environments where Docker isn't available or desirable — some server setups, CI runners, or minimal Linux installs don't have Docker and shouldn't need it just to run a local Kubernetes cluster
- Debugging and observability — with k3d, logs, networking, and storage sit behind a Docker layer that makes troubleshooting harder; native k3s exposes everything directly on the host
TEE requirement: k3d fundamentally cannot support Trusted Execution Environments
The most critical driver for native k3s support is Confidential Computing / TEE workloads. k3d runs k3s inside Docker containers, adding an extra virtualization layer that blocks hardware TEE access entirely. This means:
- AMD SEV-SNP and Intel TDX require direct hardware access — the Docker container boundary in k3d prevents the guest kernel from negotiating with the host's TEE firmware
- Kubernetes CoCo (Confidential Containers) with
kata-qemu-snp or kata-qemu-tdx runtimes requires bare-metal k3s
- GPU TEE workloads (NVIDIA H100/H200 Confidential Computing) similarly need direct hardware access through bare-metal k3s + NVIDIA GPU Operator
This creates a natural two-profile architecture for Obol Stack:
| Environment |
Runtime |
TEE Hardware |
Use Case |
| Local dev (k3d) |
Standard containers |
None |
Business logic, x402, routing |
| TEE dev (k3s) |
kata-qemu-coco-dev |
None (virtualization only) |
Test CoCo workflow, no real security |
| TEE production (k3s) |
kata-qemu-snp |
AMD EPYC (SEV-SNP) |
Real confidential inference |
| GPU TEE production (k3s) |
CoCo + NVIDIA Operator |
H100/H200 + AMD EPYC |
High-throughput confidential inference |
This maps directly to the consumer/provider split in the marketplace design — consumers run the standard k3d stack locally, while inference providers run bare-metal k3s with TEE hardware for verifiable private inference.
Proposal
Introduce a pluggable backend system that lets users choose between k3d (default, Docker-based) and k3s (native, bare-metal) when initializing their stack:
# Docker-based (default, unchanged)
obol stack init
# Native k3s (new)
obol stack init --backend k3s
The backend choice is persisted in .stack-backend so all subsequent commands (up, down, purge) work transparently regardless of backend.
Key design considerations
- Backend interface — a common
Backend interface (Init, Up, Down, Destroy, IsRunning, DataDir) so the stack lifecycle code is backend-agnostic
- k3d remains the default — no breaking changes for existing users; stacks without a
.stack-backend file fall back to k3d
- k3s process management — native k3s runs as a root process via
sudo, requiring PID tracking, process group signals, and proper cleanup (k3s-killall.sh)
- DataDir divergence — k3d mounts host paths into Docker containers (always
/data inside), while k3s uses host paths directly. The DataDir() method abstracts this so helmfile templates work on both
- Shared infrastructure — both backends use the same helmfile, charts, and values templates; only the cluster lifecycle differs
- TEE-ready foundation — the k3s backend provides the bare-metal Kubernetes surface needed for CoCo runtime classes, kata containers, and GPU TEE operators in future work
Problem
Obol Stack currently runs exclusively on k3d (k3s-in-Docker). This works well for most users, but introduces friction in several real-world scenarios:
TEE requirement: k3d fundamentally cannot support Trusted Execution Environments
The most critical driver for native k3s support is Confidential Computing / TEE workloads. k3d runs k3s inside Docker containers, adding an extra virtualization layer that blocks hardware TEE access entirely. This means:
kata-qemu-snporkata-qemu-tdxruntimes requires bare-metal k3sThis creates a natural two-profile architecture for Obol Stack:
This maps directly to the consumer/provider split in the marketplace design — consumers run the standard k3d stack locally, while inference providers run bare-metal k3s with TEE hardware for verifiable private inference.
Proposal
Introduce a pluggable backend system that lets users choose between
k3d(default, Docker-based) andk3s(native, bare-metal) when initializing their stack:The backend choice is persisted in
.stack-backendso all subsequent commands (up,down,purge) work transparently regardless of backend.Key design considerations
Backendinterface (Init,Up,Down,Destroy,IsRunning,DataDir) so the stack lifecycle code is backend-agnostic.stack-backendfile fall back to k3dsudo, requiring PID tracking, process group signals, and proper cleanup (k3s-killall.sh)/datainside), while k3s uses host paths directly. TheDataDir()method abstracts this so helmfile templates work on both