Conversation
6129bb0 to
53c93f1
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR re-implements the Docker Compose "up/create" convergence logic as a reconciliation between observed and desired state, producing an explicit plan with pending operations. The old convergence struct is removed in favor of a pure Reconcile function that computes a ReconciliationPlan, and a separate ExecutePlan method that executes it.
Changes:
- Introduces
ObservedState,Reconcile(), andReconciliationPlantypes to separate state inspection, diff computation, and execution - Replaces the
convergencestruct withexecutionStatefor resolving service references during plan execution - Adds comprehensive unit tests for the reconciliation logic, with cross-references to existing e2e tests
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/compose/reconcile.go | New pure reconciliation logic computing operation plans from observed vs desired state |
| pkg/compose/plan_executor.go | New DAG-based concurrent plan executor and display utilities |
| pkg/compose/observed_state.go | New types and Docker API queries for capturing current project state |
| pkg/compose/create.go | Rewired create() to use InspectState → Reconcile → ExecutePlan pipeline |
| pkg/compose/convergence.go | Removed old convergence struct, mustRecreate, recreateContainer, etc. |
| pkg/compose/run.go | Updated to use executionState instead of old convergence |
| pkg/compose/reconcile_test.go | Comprehensive unit tests for reconciliation logic |
| pkg/compose/observed_state_test.go | Tests for observed state types and plan utilities |
| pkg/e2e/*.go | Added TODO comments cross-referencing new unit tests |
| pkg/compose/publish.go | Minor fmt.Fprintf cleanup |
| pkg/compose/progress.go | Fixed misleading comment |
| cmd/compose/compose.go | Unrelated: use consts.ComposeProfiles for default profile env var |
| AI_POLICY.md | New AI usage policy document |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- build observed state - compute reconciliation plan by comparing observed vs desired state - execute plan this decorelates reconciliation (aka "convergence") logic from docker API, and write simpler and efficient tests to cover various scenarios Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
428b020 to
f1743e1
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors Compose “up”/create behavior into a pure reconciliation phase (observed vs desired state) that produces an execution plan (DAG of operations), then executes that plan, with extensive new unit tests for reconcile decision logic.
Changes:
- Add
ObservedState+Reconcile(...)to compute a deterministic reconciliation plan (networks/volumes/containers/orphans/dependency edges). - Add a DAG-based
ExecutePlan(...)executor to run operations concurrently while respecting dependencies. - Add large
reconcile_test.gosuite and annotate some e2e tests with TODOs pointing to the new unit coverage.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/compose/reconcile.go | New reconciliation planner (ops, dependency edges, recreate/scale logic). |
| pkg/compose/plan_executor.go | New plan executor + execution-state used for resolving service references during execution. |
| pkg/compose/observed_state.go | New “observed state” snapshot builder from Docker engine (containers/networks/volumes/orphans). |
| pkg/compose/observed_state_test.go | Unit tests for observed-state types and plan helpers (Roots, String, ContainerTouched). |
| pkg/compose/reconcile_test.go | Large new unit test suite for reconcile scenarios (scale, recreate, networks/volumes/orphans, dependency edges). |
| pkg/compose/create.go | Switch create flow to InspectState → Reconcile → ExecutePlan; add external network validation; emit events for untouched containers (when plan non-empty). |
| pkg/compose/convergence.go | Remove old convergence/apply logic (scale/recreate/start logic) and related helpers. |
| pkg/compose/run.go | Replace convergence-based service-reference resolution with executionState. |
| pkg/compose/publish.go | Minor strings.Builder formatting change (fmt.Fprintf instead of WriteString(fmt.Sprintf(...))). |
| pkg/compose/progress.go | Comment tweak. |
| cmd/compose/compose.go | Default --profile values now come from env (COMPOSE_PROFILES) via compose-go consts. |
| pkg/e2e/volumes_test.go | Add TODO linking e2e volume recreate coverage to new reconcile unit tests. |
| pkg/e2e/up_test.go | Add TODO linking e2e scale/no-recreate coverage to new reconcile unit tests. |
| pkg/e2e/scale_test.go | Add TODOs linking e2e scale behaviors to reconcile unit tests. |
| pkg/e2e/recreate_no_deps_test.go | Add TODO linking force-recreate/no-deps coverage to reconcile unit tests. |
| pkg/e2e/orphans_test.go | Add TODO linking orphan removal coverage to reconcile unit tests. |
| pkg/e2e/networks_test.go | Add TODOs linking network recreate/change detection coverage to reconcile unit tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Precompute which containers are obsolete to avoid repeated hashing in the sort comparator. | ||
| obsolete := make(map[string]bool, len(containers)) | ||
| for _, ctr := range containers { | ||
| recreate, _, _ := needsRecreate(service, ctr, networkIDs, volumeNames, policy) |
There was a problem hiding this comment.
The obsolete precomputation ignores the error from needsRecreate (recreate, _, _ := ...). If ServiceHash/hash checks fail, the sort will silently treat the container as non-obsolete and proceed, which can mask real errors and produce inconsistent plans. Suggest handling the error (e.g., return it from reconcileServiceContainers) instead of discarding it.
| recreate, _, _ := needsRecreate(service, ctr, networkIDs, volumeNames, policy) | |
| recreate, _, err := needsRecreate(service, ctr, networkIDs, volumeNames, policy) | |
| if err != nil { | |
| return err | |
| } |
| // topologicalSort returns operations in dependency order using Kahn's algorithm. | ||
| func topologicalSort(plan *ReconciliationPlan) []*Operation { | ||
| inDegree := make(map[string]int, len(plan.Operations)) | ||
| for _, op := range plan.Operations { | ||
| inDegree[op.ID] = len(op.DependsOn) | ||
| } | ||
|
|
||
| // Start with nodes that have no dependencies | ||
| var queue []string | ||
| for _, op := range plan.Operations { | ||
| if inDegree[op.ID] == 0 { | ||
| queue = append(queue, op.ID) | ||
| } | ||
| } | ||
| sort.Strings(queue) // deterministic ordering | ||
|
|
||
| var sorted []*Operation | ||
| for len(queue) > 0 { | ||
| id := queue[0] | ||
| queue = queue[1:] | ||
| sorted = append(sorted, plan.Operations[id]) | ||
|
|
||
| var next []string | ||
| for _, depID := range plan.Dependents[id] { | ||
| inDegree[depID]-- | ||
| if inDegree[depID] == 0 { | ||
| next = append(next, depID) | ||
| } | ||
| } | ||
| sort.Strings(next) | ||
| queue = append(queue, next...) | ||
| } | ||
|
|
||
| return sorted | ||
| } |
There was a problem hiding this comment.
topologicalSort (and ExecutePlan in general) doesn't detect cycles or missing dependencies. With a cycle, topologicalSort will return a partial list and ExecutePlan can deadlock waiting for ops whose dependencies can never be satisfied. Suggested fix: validate the plan before execution (e.g., ensure every DependsOn target exists and that a topo-sort returns len(sorted)==len(Operations)), and return an explicit error if a cycle is detected.
pkg/compose/create.go
Outdated
| if plan.IsEmpty() { | ||
| return nil | ||
| } | ||
|
|
||
| s.emitUntouchedContainerEvents(project, observed, plan) | ||
|
|
There was a problem hiding this comment.
create() returns early when plan.IsEmpty(), which means no progress events are emitted for already-running, up-to-date containers. Previously (via convergence) running containers would still emit runningEvent(...). If callers rely on those events (e.g., UX/progress output consistency), consider calling emitUntouchedContainerEvents even when the plan is empty, or intentionally document/adjust the behavior change.
| if plan.IsEmpty() { | |
| return nil | |
| } | |
| s.emitUntouchedContainerEvents(project, observed, plan) | |
| s.emitUntouchedContainerEvents(project, observed, plan) | |
| if plan.IsEmpty() { | |
| return nil | |
| } |
pkg/compose/create.go
Outdated
| // Validate external networks exist before reconciling | ||
| if err := s.validateExternalNetworks(ctx, project, options.Services); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
create() now validates external networks before reconciling, but there’s no equivalent validation for external volumes. Previously, ensureProjectVolumes called ensureVolume(...) for all project volumes, which returned a clear error when an external volume was missing. With the new plan-based flow, external volumes are skipped by reconcileVolumes, so a missing external volume will only surface later during container create (and potentially with a less specific error). Consider adding a validateExternalVolumes step similar to validateExternalNetworks, limited to volumes referenced by the targeted services.
| // Stop the container (needed before disconnect) | ||
| stopID := fmt.Sprintf("stop-container:%s", ctrName) | ||
| if _, exists := plan.Operations[stopID]; !exists { | ||
| plan.Operations[stopID] = &Operation{ | ||
| ID: stopID, | ||
| Type: OpStopContainer, | ||
| ServiceName: ctr.Labels[api.ServiceLabel], | ||
| Resource: ctrName, | ||
| ContainerOp: &ContainerOperation{ | ||
| ContainerName: ctrName, | ||
| Existing: &ctr, | ||
| Timeout: opts.Timeout, | ||
| }, | ||
| Reason: fmt.Sprintf("network %q is being recreated", n.Name), |
There was a problem hiding this comment.
Stop ops created during network recreation don’t populate ContainerOp.Service (and therefore executePlanStopContainer passes service=nil to stopContainer). This skips service-level pre_stop hooks and other service-aware stop behavior, and also risks missing timeout propagation. Consider looking up the service config from project.Services[ctr.Labels[api.ServiceLabel]] and setting ContainerOp.Service (and Timeout) so the stop behaves consistently with other stop paths.
| // Stop the container | ||
| stopID := fmt.Sprintf("stop-container:%s", ctrName) | ||
| if _, exists := plan.Operations[stopID]; !exists { | ||
| plan.Operations[stopID] = &Operation{ | ||
| ID: stopID, | ||
| Type: OpStopContainer, | ||
| ServiceName: ctr.Labels[api.ServiceLabel], | ||
| Resource: ctrName, | ||
| ContainerOp: &ContainerOperation{ | ||
| ContainerName: ctrName, | ||
| Existing: &ctr, | ||
| }, | ||
| Reason: fmt.Sprintf("volume %q is being recreated", v.Name), | ||
| } | ||
| } | ||
|
|
||
| // Remove the container (volumes require container removal, not just stop) | ||
| removeID := fmt.Sprintf("remove-container:%s", ctrName) | ||
| if _, exists := plan.Operations[removeID]; !exists { | ||
| plan.Operations[removeID] = &Operation{ | ||
| ID: removeID, | ||
| Type: OpRemoveContainer, | ||
| ServiceName: ctr.Labels[api.ServiceLabel], | ||
| Resource: ctrName, | ||
| ContainerOp: &ContainerOperation{ | ||
| ContainerName: ctrName, | ||
| Existing: &ctr, | ||
| }, | ||
| DependsOn: []string{stopID}, | ||
| Reason: fmt.Sprintf("volume %q is being recreated", v.Name), | ||
| } |
There was a problem hiding this comment.
reconcileVolumes generates stop/remove container ops for containers using a recreated volume, but it can’t propagate ReconcileOptions.Timeout or service-level stop behavior because it doesn’t take opts and it doesn’t set ContainerOp.Service. This changes semantics vs other stop/remove paths (no pre_stop hooks; daemon default timeout). Suggested fix: pass opts into reconcileVolumes and set ContainerOp.Service (lookup by service label) and Timeout on the generated container ops.
| readyOpsByService := indexReadyOps(plan) | ||
|
|
||
| for _, op := range plan.Operations { | ||
| if (op.Type == OpCreateContainer || op.Type == OpStartContainer) && op.ContainerOp != nil { |
There was a problem hiding this comment.
buildDependencyEdges applies addOperationDependencies to all OpCreateContainer ops, including the temp container create used in a recreate chain (ContainerOp.Existing != nil). When a network is being recreated, this makes the temp create depend on create-network:<name>, while create-network depends on remove-network -> disconnect-network -> stop-container, and that stop-container is also reused by the recreate chain (often depending on the temp create). This can introduce a dependency cycle and/or force the temp container to be created on a network that is about to be removed (network remove can fail because the temp container wasn't part of findContainersOnNetwork). Suggested fix: only add network/volume/service dependency edges to “final” create/start ops (e.g., OpCreateContainer where Existing == nil, and OpStartContainer), or otherwise explicitly exclude temp recreate-create ops from addOperationDependencies and handle the recreate+network-recreate interaction as a special case.
| if (op.Type == OpCreateContainer || op.Type == OpStartContainer) && op.ContainerOp != nil { | |
| if op.ContainerOp == nil { | |
| continue | |
| } | |
| // Only add dependency edges for "final" create operations (fresh containers) | |
| // and for start operations. Temporary recreate creates (Existing != nil) | |
| // should not get network/volume/service dependency edges, to avoid cycles | |
| // and incorrect ordering with network recreation. | |
| if op.Type == OpCreateContainer && op.ContainerOp.Existing != nil { | |
| continue | |
| } | |
| if op.Type == OpCreateContainer || op.Type == OpStartContainer { |
| if len(op.DependsOn) == 1 { | ||
| if dep := plan.Operations[op.DependsOn[0]]; dep != nil && dep.Type == OpRenameContainer { | ||
| readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id) |
There was a problem hiding this comment.
indexReadyOps only treats a service as “ready” when its OpStartContainer has exactly one dependency and that dependency is a rename op. Start ops can legitimately have additional deps (e.g., reconnects when a container is connected to multiple recreated networks, or other future edges), which would make the service disappear from readyOpsByService and break depends_on ordering. Suggest checking for the presence of any OpRenameContainer in DependsOn (or another explicit marker on the op) rather than requiring len(DependsOn) == 1.
| if len(op.DependsOn) == 1 { | |
| if dep := plan.Operations[op.DependsOn[0]]; dep != nil && dep.Type == OpRenameContainer { | |
| readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id) | |
| for _, depID := range op.DependsOn { | |
| if dep := plan.Operations[depID]; dep != nil && dep.Type == OpRenameContainer { | |
| readyOpsByService[op.ServiceName] = append(readyOpsByService[op.ServiceName], id) | |
| break |
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
re-implement compose logic as a reconciliation between observed and desired state, producing a plan with pending operations.
This allow to decorelate reconciliation (aka "convergence") logic from docker API, and write simpler and efficient tests to cover various scenarios
This PR was created with AI assistance
What I did
told Claude about the architecture I had in mind, and expectations regarding tests structure