re-implement compose logic by ndeloof · Pull Request #13641 · docker/compose

ndeloof · 2026-03-17T08:25:53Z

re-implement compose logic as a reconciliation between observed and desired state, producing a plan with pending operations.

build observed state
compute reconciliation plan by comparing observed vs desired state
execute plan

This allow to decorelate reconciliation (aka "convergence") logic from docker API, and write simpler and efficient tests to cover various scenarios

This PR was created with AI assistance

What I did

told Claude about the architecture I had in mind, and expectations regarding tests structure

codecov · 2026-03-17T09:04:10Z

Codecov Report

❌ Patch coverage is 89.57107% with 124 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pkg/compose/plan_executor.go	75.00%	74 Missing and 11 partials ⚠️
pkg/compose/reconcile.go	96.60%	13 Missing and 11 partials ⚠️
pkg/compose/create.go	82.19%	8 Missing and 5 partials ⚠️
pkg/compose/observed_state.go	96.82%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR re-implements the Docker Compose "up/create" convergence logic as a reconciliation between observed and desired state, producing an explicit plan with pending operations. The old convergence struct is removed in favor of a pure Reconcile function that computes a ReconciliationPlan, and a separate ExecutePlan method that executes it.

Changes:

Introduces ObservedState, Reconcile(), and ReconciliationPlan types to separate state inspection, diff computation, and execution
Replaces the convergence struct with executionState for resolving service references during plan execution
Adds comprehensive unit tests for the reconciliation logic, with cross-references to existing e2e tests

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pkg/compose/reconcile.go	New pure reconciliation logic computing operation plans from observed vs desired state
pkg/compose/plan_executor.go	New DAG-based concurrent plan executor and display utilities
pkg/compose/observed_state.go	New types and Docker API queries for capturing current project state
pkg/compose/create.go	Rewired `create()` to use InspectState → Reconcile → ExecutePlan pipeline
pkg/compose/convergence.go	Removed old `convergence` struct, `mustRecreate`, `recreateContainer`, etc.
pkg/compose/run.go	Updated to use `executionState` instead of old `convergence`
pkg/compose/reconcile_test.go	Comprehensive unit tests for reconciliation logic
pkg/compose/observed_state_test.go	Tests for observed state types and plan utilities
pkg/e2e/*.go	Added TODO comments cross-referencing new unit tests
pkg/compose/publish.go	Minor `fmt.Fprintf` cleanup
pkg/compose/progress.go	Fixed misleading comment
cmd/compose/compose.go	Unrelated: use `consts.ComposeProfiles` for default profile env var
AI_POLICY.md	New AI usage policy document

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

pkg/compose/reconcile.go

- build observed state - compute reconciliation plan by comparing observed vs desired state - execute plan this decorelates reconciliation (aka "convergence") logic from docker API, and write simpler and efficient tests to cover various scenarios Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

Copilot

Pull request overview

This PR refactors Compose “up”/create behavior into a pure reconciliation phase (observed vs desired state) that produces an execution plan (DAG of operations), then executes that plan, with extensive new unit tests for reconcile decision logic.

Changes:

Add ObservedState + Reconcile(...) to compute a deterministic reconciliation plan (networks/volumes/containers/orphans/dependency edges).
Add a DAG-based ExecutePlan(...) executor to run operations concurrently while respecting dependencies.
Add large reconcile_test.go suite and annotate some e2e tests with TODOs pointing to the new unit coverage.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/compose/reconcile.go	New reconciliation planner (ops, dependency edges, recreate/scale logic).
pkg/compose/plan_executor.go	New plan executor + execution-state used for resolving service references during execution.
pkg/compose/observed_state.go	New “observed state” snapshot builder from Docker engine (containers/networks/volumes/orphans).
pkg/compose/observed_state_test.go	Unit tests for observed-state types and plan helpers (`Roots`, `String`, `ContainerTouched`).
pkg/compose/reconcile_test.go	Large new unit test suite for reconcile scenarios (scale, recreate, networks/volumes/orphans, dependency edges).
pkg/compose/create.go	Switch create flow to InspectState → Reconcile → ExecutePlan; add external network validation; emit events for untouched containers (when plan non-empty).
pkg/compose/convergence.go	Remove old convergence/apply logic (scale/recreate/start logic) and related helpers.
pkg/compose/run.go	Replace convergence-based service-reference resolution with `executionState`.
pkg/compose/publish.go	Minor `strings.Builder` formatting change (`fmt.Fprintf` instead of `WriteString(fmt.Sprintf(...))`).
pkg/compose/progress.go	Comment tweak.
cmd/compose/compose.go	Default `--profile` values now come from env (`COMPOSE_PROFILES`) via compose-go consts.
pkg/e2e/volumes_test.go	Add TODO linking e2e volume recreate coverage to new reconcile unit tests.
pkg/e2e/up_test.go	Add TODO linking e2e scale/no-recreate coverage to new reconcile unit tests.
pkg/e2e/scale_test.go	Add TODOs linking e2e scale behaviors to reconcile unit tests.
pkg/e2e/recreate_no_deps_test.go	Add TODO linking force-recreate/no-deps coverage to reconcile unit tests.
pkg/e2e/orphans_test.go	Add TODO linking orphan removal coverage to reconcile unit tests.
pkg/e2e/networks_test.go	Add TODOs linking network recreate/change detection coverage to reconcile unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T15:20:54Z

pkg/compose/reconcile.go

+	// Precompute which containers are obsolete to avoid repeated hashing in the sort comparator.
+	obsolete := make(map[string]bool, len(containers))
+	for _, ctr := range containers {
+		recreate, _, _ := needsRecreate(service, ctr, networkIDs, volumeNames, policy)


The obsolete precomputation ignores the error from needsRecreate (recreate, _, _ := ...). If ServiceHash/hash checks fail, the sort will silently treat the container as non-obsolete and proceed, which can mask real errors and produce inconsistent plans. Suggest handling the error (e.g., return it from reconcileServiceContainers) instead of discarding it.

Suggested change

recreate, _, _ := needsRecreate(service, ctr, networkIDs, volumeNames, policy)

recreate, _, err := needsRecreate(service, ctr, networkIDs, volumeNames, policy)

if err != nil {

return err

}

Copilot · 2026-03-23T15:20:55Z

pkg/compose/plan_executor.go

+// topologicalSort returns operations in dependency order using Kahn's algorithm.
+func topologicalSort(plan *ReconciliationPlan) []*Operation {
+	inDegree := make(map[string]int, len(plan.Operations))
+	for _, op := range plan.Operations {
+		inDegree[op.ID] = len(op.DependsOn)
+	}
+
+	// Start with nodes that have no dependencies
+	var queue []string
+	for _, op := range plan.Operations {
+		if inDegree[op.ID] == 0 {
+			queue = append(queue, op.ID)
+		}
+	}
+	sort.Strings(queue) // deterministic ordering
+
+	var sorted []*Operation
+	for len(queue) > 0 {
+		id := queue[0]
+		queue = queue[1:]
+		sorted = append(sorted, plan.Operations[id])
+
+		var next []string
+		for _, depID := range plan.Dependents[id] {
+			inDegree[depID]--
+			if inDegree[depID] == 0 {
+				next = append(next, depID)
+			}
+		}
+		sort.Strings(next)
+		queue = append(queue, next...)
+	}
+
+	return sorted
+}


topologicalSort (and ExecutePlan in general) doesn't detect cycles or missing dependencies. With a cycle, topologicalSort will return a partial list and ExecutePlan can deadlock waiting for ops whose dependencies can never be satisfied. Suggested fix: validate the plan before execution (e.g., ensure every DependsOn target exists and that a topo-sort returns len(sorted)==len(Operations)), and return an explicit error if a cycle is detected.

Copilot · 2026-03-23T15:20:55Z

pkg/compose/create.go

+	if plan.IsEmpty() {
+		return nil
 	}
+
+	s.emitUntouchedContainerEvents(project, observed, plan)
+


create() returns early when plan.IsEmpty(), which means no progress events are emitted for already-running, up-to-date containers. Previously (via convergence) running containers would still emit runningEvent(...). If callers rely on those events (e.g., UX/progress output consistency), consider calling emitUntouchedContainerEvents even when the plan is empty, or intentionally document/adjust the behavior change.

Suggested change

if plan.IsEmpty() {

return nil

}

s.emitUntouchedContainerEvents(project, observed, plan)

s.emitUntouchedContainerEvents(project, observed, plan)

if plan.IsEmpty() {

return nil

}

Copilot · 2026-03-23T15:20:55Z

pkg/compose/create.go

+	// Validate external networks exist before reconciling
+	if err := s.validateExternalNetworks(ctx, project, options.Services); err != nil {
 		return err
 	}


create() now validates external networks before reconciling, but there’s no equivalent validation for external volumes. Previously, ensureProjectVolumes called ensureVolume(...) for all project volumes, which returned a clear error when an external volume was missing. With the new plan-based flow, external volumes are skipped by reconcileVolumes, so a missing external volume will only surface later during container create (and potentially with a less specific error). Consider adding a validateExternalVolumes step similar to validateExternalNetworks, limited to volumes referenced by the targeted services.

Copilot · 2026-03-23T15:20:56Z

pkg/compose/reconcile.go

+			// Stop the container (needed before disconnect)
+			stopID := fmt.Sprintf("stop-container:%s", ctrName)
+			if _, exists := plan.Operations[stopID]; !exists {
+				plan.Operations[stopID] = &Operation{
+					ID:          stopID,
+					Type:        OpStopContainer,
+					ServiceName: ctr.Labels[api.ServiceLabel],
+					Resource:    ctrName,
+					ContainerOp: &ContainerOperation{
+						ContainerName: ctrName,
+						Existing:      &ctr,
+						Timeout:       opts.Timeout,
+					},
+					Reason: fmt.Sprintf("network %q is being recreated", n.Name),


Stop ops created during network recreation don’t populate ContainerOp.Service (and therefore executePlanStopContainer passes service=nil to stopContainer). This skips service-level pre_stop hooks and other service-aware stop behavior, and also risks missing timeout propagation. Consider looking up the service config from project.Services[ctr.Labels[api.ServiceLabel]] and setting ContainerOp.Service (and Timeout) so the stop behaves consistently with other stop paths.

Copilot · 2026-03-23T15:20:56Z

pkg/compose/reconcile.go

+			// Stop the container
+			stopID := fmt.Sprintf("stop-container:%s", ctrName)
+			if _, exists := plan.Operations[stopID]; !exists {
+				plan.Operations[stopID] = &Operation{
+					ID:          stopID,
+					Type:        OpStopContainer,
+					ServiceName: ctr.Labels[api.ServiceLabel],
+					Resource:    ctrName,
+					ContainerOp: &ContainerOperation{
+						ContainerName: ctrName,
+						Existing:      &ctr,
+					},
+					Reason: fmt.Sprintf("volume %q is being recreated", v.Name),
+				}
+			}
+
+			// Remove the container (volumes require container removal, not just stop)
+			removeID := fmt.Sprintf("remove-container:%s", ctrName)
+			if _, exists := plan.Operations[removeID]; !exists {
+				plan.Operations[removeID] = &Operation{
+					ID:          removeID,
+					Type:        OpRemoveContainer,
+					ServiceName: ctr.Labels[api.ServiceLabel],
+					Resource:    ctrName,
+					ContainerOp: &ContainerOperation{
+						ContainerName: ctrName,
+						Existing:      &ctr,
+					},
+					DependsOn: []string{stopID},
+					Reason:    fmt.Sprintf("volume %q is being recreated", v.Name),
+				}


reconcileVolumes generates stop/remove container ops for containers using a recreated volume, but it can’t propagate ReconcileOptions.Timeout or service-level stop behavior because it doesn’t take opts and it doesn’t set ContainerOp.Service. This changes semantics vs other stop/remove paths (no pre_stop hooks; daemon default timeout). Suggested fix: pass opts into reconcileVolumes and set ContainerOp.Service (lookup by service label) and Timeout on the generated container ops.

Copilot · 2026-03-23T15:20:56Z

pkg/compose/reconcile.go

+	readyOpsByService := indexReadyOps(plan)
+
+	for _, op := range plan.Operations {
+		if (op.Type == OpCreateContainer || op.Type == OpStartContainer) && op.ContainerOp != nil {


buildDependencyEdges applies addOperationDependencies to all OpCreateContainer ops, including the temp container create used in a recreate chain (ContainerOp.Existing != nil). When a network is being recreated, this makes the temp create depend on create-network:<name>, while create-network depends on remove-network -> disconnect-network -> stop-container, and that stop-container is also reused by the recreate chain (often depending on the temp create). This can introduce a dependency cycle and/or force the temp container to be created on a network that is about to be removed (network remove can fail because the temp container wasn't part of findContainersOnNetwork). Suggested fix: only add network/volume/service dependency edges to “final” create/start ops (e.g., OpCreateContainer where Existing == nil, and OpStartContainer), or otherwise explicitly exclude temp recreate-create ops from addOperationDependencies and handle the recreate+network-recreate interaction as a special case.

Suggested change

if (op.Type == OpCreateContainer || op.Type == OpStartContainer) && op.ContainerOp != nil {

if op.ContainerOp == nil {

continue

}

// Only add dependency edges for "final" create operations (fresh containers)

// and for start operations. Temporary recreate creates (Existing != nil)

// should not get network/volume/service dependency edges, to avoid cycles

// and incorrect ordering with network recreation.

if op.Type == OpCreateContainer && op.ContainerOp.Existing != nil {

continue

}

if op.Type == OpCreateContainer || op.Type == OpStartContainer {

Copilot · 2026-03-23T15:20:56Z

pkg/compose/reconcile.go

+			if len(op.DependsOn) == 1 {
+				if dep := plan.Operations[op.DependsOn[0]]; dep != nil && dep.Type == OpRenameContainer {
+					readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id)


indexReadyOps only treats a service as “ready” when its OpStartContainer has exactly one dependency and that dependency is a rename op. Start ops can legitimately have additional deps (e.g., reconnects when a container is connected to multiple recreated networks, or other future edges), which would make the service disappear from readyOpsByService and break depends_on ordering. Suggest checking for the presence of any OpRenameContainer in DependsOn (or another explicit marker on the op) rather than requiring len(DependsOn) == 1.

Suggested change

if len(op.DependsOn) == 1 {

if dep := plan.Operations[op.DependsOn[0]]; dep != nil && dep.Type == OpRenameContainer {

readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id)

for _, depID := range op.DependsOn {

if dep := plan.Operations[depID]; dep != nil && dep.Type == OpRenameContainer {

readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id)

break

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

ndeloof force-pushed the reconciliation branch 2 times, most recently from 6129bb0 to 53c93f1 Compare March 17, 2026 08:51

ndeloof marked this pull request as ready for review March 17, 2026 09:10

ndeloof requested a review from a team as a code owner March 17, 2026 09:10

ndeloof requested review from Copilot and glours March 17, 2026 09:10

Copilot started reviewing on behalf of ndeloof March 17, 2026 09:11 View session

ndeloof force-pushed the reconciliation branch from 53c93f1 to d30635e Compare March 17, 2026 09:13

Copilot AI reviewed Mar 17, 2026

View reviewed changes

pkg/compose/reconcile.go Outdated Show resolved Hide resolved

ndeloof force-pushed the reconciliation branch from d30635e to 23fb362 Compare March 19, 2026 07:17

more tests

137c312

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

ndeloof force-pushed the reconciliation branch 3 times, most recently from 428b020 to f1743e1 Compare March 23, 2026 13:40

ndeloof requested a review from Copilot March 23, 2026 15:13

Copilot started reviewing on behalf of ndeloof March 23, 2026 15:13 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

more tests

9c24b89

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>

ndeloof force-pushed the reconciliation branch from f1743e1 to 9c24b89 Compare March 24, 2026 06:33

ndeloof mentioned this pull request Mar 24, 2026

fix: treat container exit code 0 as success in --wait mode #13661

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-implement compose logic#13641

re-implement compose logic#13641
ndeloof wants to merge 3 commits intodocker:mainfrom
ndeloof:reconciliation

ndeloof commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Copilot AI Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-		if (op.Type == OpCreateContainer || op.Type == OpStartContainer) && op.ContainerOp != nil {
+		if op.ContainerOp == nil {
+			continue
+		}
+		// Only add dependency edges for "final" create operations (fresh containers)
+		// and for start operations. Temporary recreate creates (Existing != nil)
+		// should not get network/volume/service dependency edges, to avoid cycles
+		// and incorrect ordering with network recreation.
+		if op.Type == OpCreateContainer && op.ContainerOp.Existing != nil {
+			continue
+		}
+		if op.Type == OpCreateContainer || op.Type == OpStartContainer {

Conversation

ndeloof commented Mar 17, 2026

Uh oh!

codecov bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 17, 2026 •

edited

Loading