Skip to content

Auto-start Cosmos emulator via Testcontainers#37999

Open
Copilot wants to merge 13 commits intomainfrom
copilot/automate-cosmos-emulator-startup
Open

Auto-start Cosmos emulator via Testcontainers#37999
Copilot wants to merge 13 commits intomainfrom
copilot/automate-cosmos-emulator-startup

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

Follow up to #37110

Replace explicit Docker scripts for the Cosmos emulator with automatic Testcontainers.CosmosDb lifecycle management, matching the pattern from #37809 for SQL Server. Also auto-detect the Linux emulator instead of requiring Test__Cosmos__EmulatorType=linux.

TestEnvironment connection initialization

3-path logic in TestEnvironment.InitializeAsync() (called from CosmosTestStore.InitializeAsync()):

  1. Configured endpointTest__Cosmos__DefaultConnection env var or config file → use directly
  2. Localhost probe — probe https://localhost:8081 for an already-running emulator (e.g. Windows emulator or manually-started Docker container) → use if reachable
  3. Testcontainer — start CosmosDbContainer with vnext-preview image; dispose on process exit

The localhost probe enables a convenient developer workflow on Windows where the emulator can be running without needing to set an environment variable. When the testcontainer fails to start, the exception is wrapped with a helpful message explaining that either the Cosmos DB emulator should be running on localhost:8081, Docker must be installed and running, or the Test__Cosmos__DefaultConnection environment variable should be set to connect to an existing emulator or Cosmos DB instance.

DefaultConnection has a sensible default initial value (from config or https://localhost:8081) so it can be read safely from constructors before initialization. InitializeAsync() is called explicitly from CosmosTestStore.InitializeAsync() rather than implicitly from property getters, to avoid accidentally starting an emulator from unit tests. After initialization, CosmosTestStore updates its ConnectionUri and ConnectionString to reflect any changes (e.g., testcontainer startup).

Linux emulator auto-detection

IsLinuxEmulator is now true when:

  • Testcontainer is running (always Linux image), or
  • OS is not Windows (existing heuristic), or
  • EmulatorType config explicitly set to "linux" (backward compat)

No env var needed for the common case.

HttpClient handling

When testcontainer is active, ApplyConfiguration uses the container's HttpMessageHandler (URI rewriter routing to the mapped port over HTTP), captured once during initialization. Otherwise keeps the existing DangerousAcceptAnyServerCertificateValidator for local HTTPS emulators. A shared static HttpMessageHandler and Func<HttpClient> factory delegate avoid ManyServiceProvidersCreatedWarning, while the factory creates a new HttpClient(handler, disposeHandler: false) per invocation so each CosmosClient gets its own client it can freely configure (e.g. set Timeout).

CI changes

  • helix.proj — removed Docker pre/post commands for Cosmos on Ubuntu; Ubuntu XL runs Cosmos tests via testcontainer auto-start. Cosmos tests with SkipConnectionCheck=true run on Windows.Server2025.Amd64. Cosmos tests are excluded from Windows.11.
  • copilot-setup-steps.yml — explicitly sets DefaultConnection, EmulatorType, and SkipConnectionCheck to avoid starting a second emulator alongside the service container
  • cosmosConfig.jsonDefaultConnection set to null to enable auto-detection

Other

  • Added Testcontainers.CosmosDb 4.11.0 to test/Directory.Packages.props
  • Updated .agents/skills/cosmos-provider/SKILL.md
  • Removed eng/testing/run-cosmos-container.{sh,ps1} (replaced by testcontainer)

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits March 25, 2026 02:31
- Add Testcontainers.CosmosDb package reference
- Rewrite TestEnvironment.cs with lazy connection init that auto-starts
  a testcontainer when no emulator is configured or reachable
- Use testcontainer's HttpMessageHandler for URI rewriting
- Auto-detect Linux emulator (always true for testcontainers)
- Set cosmosConfig.json DefaultConnection to null for auto-detection
- Simplify helix.proj (remove manual Docker pre/post commands)
- Remove EmulatorType env var from copilot-setup-steps.yml
- Update SKILL.md with new testcontainer approach

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/655b1ac0-f385-43cf-9276-b3c6b2501432
Copy link
Copy Markdown
Member

@AndriySvyryd AndriySvyryd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Remove the run-cosmos-container scripts

…re helix/copilot env vars, delete scripts

- Make TestEnvironment initialization fully async (InitializeAsync)
- Set DefaultConnection and HttpMessageHandler values during async init
- If SkipConnectionCheck=true, let testcontainer failures throw
- Create HttpClient instance outside the lambda in ApplyConfiguration
- Inline IsTestContainer (removed property)
- Restore copilot-setup-steps.yml env vars (EmulatorType, DefaultConnection, SkipConnectionCheck)
- Restore helix.proj Windows.11 block and SkipConnectionCheck for Ubuntu
- Delete eng/testing/run-cosmos-container.{sh,ps1}
- Update SKILL.md

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/7f2ba766-d1b7-484b-899c-4b6d320a97e5
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 25, 2026

@copilot Remove the run-cosmos-container scripts

Done in 6df36b8. Deleted both eng/testing/run-cosmos-container.sh and eng/testing/run-cosmos-container.ps1.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates EF Core Cosmos functional test infrastructure to automatically provision the Azure Cosmos DB emulator via Testcontainers.CosmosDb, replacing explicit Docker scripts and simplifying Linux emulator detection for CI and local runs.

Changes:

  • Add async Cosmos connection initialization with a configured-endpoint → localhost probe → testcontainer fallback sequence.
  • Switch Cosmos test HTTP handling to optionally use the testcontainer-provided HttpMessageHandler.
  • Remove Helix/Docker Cosmos emulator scripts and update CI/setup configuration and internal skill docs accordingly.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/EFCore.Cosmos.FunctionalTests/cosmosConfig.json Sets DefaultConnection to null to enable auto-detection rather than forcing localhost.
test/EFCore.Cosmos.FunctionalTests/TestUtilities/TestEnvironment.cs Implements async initialization, testcontainer startup, and Linux emulator auto-detection.
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs Ensures TestEnvironment.InitializeAsync() runs before connection availability checks.
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosDbContextOptionsBuilderExtensions.cs Uses testcontainer handler when available; adjusts HttpClient creation/factory usage.
test/EFCore.Cosmos.FunctionalTests/EFCore.Cosmos.FunctionalTests.csproj Adds Testcontainers.CosmosDb dependency for Cosmos functional tests.
test/Directory.Packages.props Adds centralized package version for Testcontainers.CosmosDb.
eng/testing/run-cosmos-container.sh Removes Linux Docker script (superseded by testcontainers).
eng/testing/run-cosmos-container.ps1 Removes Windows Docker script (superseded by testcontainers).
eng/helix.proj Removes Ubuntu Cosmos Docker pre/post commands; relies on testcontainer auto-start.
.github/workflows/copilot-setup-steps.yml Pins Cosmos env vars to use the workflow’s service container and avoid starting a second emulator.
.agents/skills/cosmos-provider/SKILL.md Updates Cosmos test/emulator documentation to match the new testcontainer-based flow.

…:false, add DefaultConnection initial value

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/2469bf2b-2857-4f34-89ee-e1c963e35d9d
…ILL.md suggestion

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/655b5153-de78-4ee4-988b-30f981564c60
@AndriySvyryd AndriySvyryd marked this pull request as ready for review March 25, 2026 20:45
@AndriySvyryd AndriySvyryd requested a review from a team as a code owner March 25, 2026 20:45
Copilot AI review requested due to automatic review settings March 25, 2026 20:45
Copilot AI requested a review from AndriySvyryd March 25, 2026 20:45
@AndriySvyryd AndriySvyryd assigned roji and unassigned AndriySvyryd and Copilot Mar 25, 2026
…ate new HttpClient per call

Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/565fa6df-e6bf-48c9-a394-dd19442ad795

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 26, 2026

@copilot Investigate the https://github.com/dotnet/efcore/actions/runs/23611924610/job/68770523478?pr=37999 check failure

The CI failure was 4974 test failures all with InvalidOperationException: This instance has already started one or more requests. Properties can only be modified before sending the first request at HttpClient.set_Timeout. The Cosmos SDK sets Timeout on the HttpClient, but since we shared a single static HttpClient instance, the second CosmosClient to use it would crash.

Fixed in a322cbe: share a static HttpMessageHandler and a static Func<HttpClient> delegate (same reference → no ManyServiceProvidersCreatedWarning), but the factory creates a new HttpClient(handler, disposeHandler: false) per invocation so each CosmosClient gets its own HttpClient it can freely configure.

@AndriySvyryd AndriySvyryd marked this pull request as ready for review March 26, 2026 21:30
Copilot AI review requested due to automatic review settings March 26, 2026 21:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Comment on lines +178 to +183
await TestEnvironment.InitializeAsync().ConfigureAwait(false);

// Update connection details in case InitializeAsync changed them (e.g., testcontainer started).
ConnectionUri = TestEnvironment.DefaultConnection;
ConnectionString = TestEnvironment.ConnectionString;

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting the emulator/testcontainer is currently tied to CosmosTestStore.InitializeAsync() via TestEnvironment.InitializeAsync(). However, in CI the Helix queues set Test__Cosmos__SkipConnectionCheck=true, and CosmosTestStore.IsConnectionAvailableAsync() returns true without calling TryConnectAsync() (which is what triggers initialization). That means tests which use CosmosTestStore.Create(...) and then execute real Cosmos operations (e.g. ConnectionSpecificationTest.Can_specify_connection_string_in_OnConfiguring) can run before any store initialization and fail because no emulator/container has been started. Consider updating IsConnectionAvailableAsync() so that when SkipConnectionCheck is true it still calls TestEnvironment.InitializeAsync() (letting failures propagate), ensuring the emulator is started once up-front for the assembly.

Copilot uses AI. Check for mistakes.
….11 → Windows.Server2025.Amd64, remove Cosmos from Windows.11

Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/6c604a0f-b531-4054-9bf3-957523bbab30

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
return;
}

// Try to connect to the default emulator endpoint.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred flow here would be simpler:

  • If the environment variable is defined, we connect to that (no testcontainer)
  • Otherwise, testcontainer

This way, if I want to have an always-on emulator running in docker (or the Windows emulator), it's simply up to me to define the environment variable, rather than rely on probing here with a hard-coded port etc.

Am I missing something, why this more complicated logic is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d2b56b1. Removed localhost probing entirely. The flow is now: DefaultConnection env var/config set → use it, otherwise → start testcontainer. No intermediate probing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above approach was simpler for the developer. If they had the emulator running they didn't need to even know about the environment variable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I'm pretty sure that if someone is taking the trouble to install the emulator locally, just adding an environment variable wouldn't represent an issue (plus if the testcontainer works well, there won't be a huge reason to run a permanent instance anyway). It also doesn't help that the TCP port here is very generic (8081), and some other service could be running on it, generating a weird and unexpected error (i.e. if this were some very standard port reserved for Cosmos I'd be more inclined to like it).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'll add that if you have strong feelings here I'm fine going either way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the Linux emulator gets feature parity this will be the main way to test Cosmos locally on Windows.
@copilot revert the code back to probing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2aa85dd. Restored localhost probing: the flow is now (1) configured endpoint from env var/config, (2) probe localhost:8081 for an already-running emulator, (3) start testcontainer with helpful error on failure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until the Linux emulator gets feature parity this will be the main way to test Cosmos locally on Windows.

That's still basically just you :) And adding an environment variable for that seems very reasonable. But no problem.

}
catch when (!SkipConnectionCheck)
{
// Any failure (Docker not installed, daemon not running, image pull failure, etc.)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. As above, things should IMHO be super simple, and we shouldn't try to fall back from failures to a default endpoint - that only makes things more difficult to figure out if there's a failure. My preference would be to just always connect to the environment variable endpoint if the variable is defined (failing if there's an issue), or if the variable isn't defined, just always start a testcontainer (failing if there's an issue).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d2b56b1. Removed the catch when (!SkipConnectionCheck) fallback. If the env var is set, we connect to it (failing if there's an issue). If not, we start a testcontainer (failing if there's an issue). No fallback to a default endpoint.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still add a helpful message saying that either an emulator or docker needs to be running, or an external connection string should be provided.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, agreed. @copilot can you add that please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b0ae4a8. Wrapped the testcontainer startup in a try/catch that throws an InvalidOperationException with a message explaining that Docker must be installed and running, or the Test__Cosmos__DefaultConnection environment variable should be set to connect to an existing emulator.

…remove localhost probing and failure fallback

Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/ab54e2d2-c2e8-4385-8da4-f514a5ab5f11

Co-authored-by: roji <1862641+roji@users.noreply.github.com>
Copilot finished work on behalf of roji March 28, 2026 07:46

_container = container;

AppDomain.CurrentDomain.ProcessExit += (_, _) =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd ideally use an xunit assembly fixture to start and stop the container.

Agent-Logs-Url: https://github.com/dotnet/efcore/sessions/a26908c4-620d-4aab-becc-397ea330aed7

Co-authored-by: AndriySvyryd <6539701+AndriySvyryd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants