Skip to content

fix: [#342] resolve Docker BuildKit 'image already exists' error in CI#344

Merged
josecelano merged 8 commits intomainfrom
342-docker-buildkit-image-exists-error
Feb 13, 2026
Merged

fix: [#342] resolve Docker BuildKit 'image already exists' error in CI#344
josecelano merged 8 commits intomainfrom
342-docker-buildkit-image-exists-error

Conversation

@josecelano
Copy link
Copy Markdown
Member

@josecelano josecelano commented Feb 12, 2026

Description

Fixes #342

Resolves Docker BuildKit "image already exists" errors in GitHub Actions CI caused by a race condition during parallel test execution.

Problem

When cargo test runs multiple integration tests in parallel, each test calls build_if_missing() to ensure the Docker image exists. The race condition:

  1. Test A and Test B both call build_if_missing() simultaneously
  2. Both call image_exists() → both get false (no image yet)
  3. Both start docker build in parallel (~60s each)
  4. Test A finishes first, tags dependency-installer-test:ubuntu-24.04 → success
  5. Test B finishes, all Docker steps complete but the final export/tagging step fails:
    #8 ERROR: image "docker.io/library/dependency-installer-test:ubuntu-24.04": already exists
    

Solution

When a Docker build fails with "already exists", treat it as success — it means another concurrent test already built and tagged the exact same image, which is now available for use.

Why this is correct

  • The "already exists" error only occurs at the export/tagging step, after all build steps complete successfully
  • It means the identical image was already built by a concurrent process
  • The image is immediately available for container creation
  • No data loss or corruption possible — Docker tags are atomic pointers

Approaches tried and why they failed

Attempt Approach Result
1 docker rmi -f before building ❌ Worse race conditions (removing image while another test uses it)
2 Extended docker rmi for fully-qualified names ❌ Same problem
3 Remove docker rmi, trust BuildKit atomicity ❌ BuildKit does not handle this silently
4 (final) Treat "already exists" as success CI passes

Changes

  • packages/dependency-installer/tests/containers/image_builder.rs:

    • build_if_missing() now detects "already exists" in build output and returns Ok(()) instead of Err
    • Enhanced error reporting with tracing::{error, info} for structured logging
    • Added --force-rm flag for intermediate container cleanup
    • Updated documentation to reflect the concurrent build handling
  • src/testing/e2e/containers/image_builder.rs:

    • Same "already exists" detection in build() method
    • Added --force-rm flag
    • Updated comments about BuildKit concurrency behavior

Testing

  • ✅ All CI checks passing (Container, Coverage, E2E, Linting)
  • ✅ All local pre-commit checks pass
  • cargo clippy, cargo machete, cargo fmt clean

@josecelano josecelano self-assigned this Feb 13, 2026
Apply the same Docker BuildKit fix to the second ImageBuilder in src/testing/e2e/containers/image_builder.rs that was already applied to packages/dependency-installer/tests/containers/image_builder.rs.

This ensures both image builders force remove stale images before building and use the --force-rm flag to cleanup intermediate containers.
…building

BuildKit uses fully-qualified image names (docker.io/library/...) during the
export phase, which was causing 'already exists' errors even after removing
the short image name. Now both name formats are removed to ensure clean builds.
Include both stdout and stderr in error messages, and add structured logging
with exit codes and output lengths. This will help diagnose the actual failure
reason when Docker builds timeout or fail in CI environments.
The docker rmi commands were creating race conditions when multiple tests
tried to build the same image concurrently:
- Test A: check image doesn't exist -> start building
- Test B: check image doesn't exist -> start building
- Test A: remove existing images (might remove Test B's partial build!)
- Test B: remove existing images (might remove Test A's partial build!)
- Both builds complete -> conflict when tagging

Docker BuildKit handles concurrent builds to the same tag atomically with
internal locking, so we don't need to manually remove images. The
image_exists() check at the start is sufficient for the common case.

This reverts the image removal logic from commits 74f2151, 3b506dd, and
c719c92 which introduced the race condition while trying to fix BuildKit
export errors.
…nt tests

When parallel tests build the same Docker image simultaneously, the
second build may complete all steps successfully but fail at the final
export/tagging step with 'already exists' because the first build
already claimed the tag. This is not a real failure - the image is
available for use.

Instead of failing the test, detect this specific error message and
treat it as success in both image builders (dependency-installer and
e2e containers).
Documents the race condition where parallel tests build the same Docker
image simultaneously, causing 'already exists' errors at the tagging
step. Records the chosen solution (treat as success), the image
staleness caveat for development, and the four alternatives considered
and rejected (unique tags, file locks, pre-building, docker rmi).
Adds a new Agent Skill at .github/skills/create-adr/ that guides AI
agents through the complete ADR creation workflow: template usage, file
naming, index registration, validation, and commit conventions.

Registered in AGENTS.md skills table.
@josecelano
Copy link
Copy Markdown
Member Author

ACK ff71199

@josecelano josecelano added bug Something isn't working ci testing labels Feb 13, 2026
@josecelano josecelano merged commit ed6c082 into main Feb 13, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ci testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix Docker BuildKit "image already exists" error in dependency-installer tests

1 participant