Allow cosmos tests to run in parallel v2#37766
Conversation
Use guid for store name
…e collection calls and docs
…cosmos-test-parallelism
There was a problem hiding this comment.
Pull request overview
This PR aims to enable parallel execution of EF Core Cosmos functional tests by introducing emulator-/cloud-specific CosmosTestStore implementations and centralizing test-store creation/initialization, while also simplifying Northwind seeding to avoid slow custom imports.
Changes:
- Introduces
CosmosEmulatorTestStore/CosmosCloudTestStoreand updatesCosmosTestStoreFactory/CosmosTestStoreto route behavior based onTestEnvironment.IsEmulator. - Enables assembly-level parallelization for Cosmos functional tests and adjusts various tests/fixtures for isolation and stability.
- Removes the Northwind JSON-based import path and updates affected Northwind-related tests/fixtures.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/EFCore.Cosmos.Tests/Extensions/CosmosDbContextOptionsExtensionsTests.cs | Switches test store creation to CosmosTestStoreFactory. |
| test/EFCore.Cosmos.FunctionalTests/Update/CosmosBulkExecutionTest.cs | Adjusts store naming/fixture usage for parallel execution. |
| test/EFCore.Cosmos.FunctionalTests/Update/CosmosBulkEndToEndTest.cs | Sets explicit store name for isolation. |
| test/EFCore.Cosmos.FunctionalTests/Update/CosmosBulkEndToEndTestNoBatching.cs | Sets explicit store name for isolation. |
| test/EFCore.Cosmos.FunctionalTests/Update/CosmosBulkConcurrencyTest.cs | Renames fixture and sets explicit store name for isolation. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/TestEnvironment.cs | Updates parsing/typing of Cosmos test environment configuration. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStoreFactory.cs | Centralizes store creation, selecting emulator vs cloud store implementations. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs | Refactors to an abstract base store and adjusts initialization/cleanup behavior. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosEmulatorTestStore.cs | Adds emulator-specific locking and container-count concurrency control for parallel tests. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosCloudTestStore.cs | Adds cloud-specific throttling of concurrent initialization. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosDbConfiguredConditionAttribute.cs | Moves connection-availability probing out of CosmosTestStore into the condition attribute. |
| test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosNorthwindTestStoreFactory.cs | Removes Cosmos-specific Northwind store factory (and its JSON import path). |
| test/EFCore.Cosmos.FunctionalTests/Query/NorthwindQueryCosmosFixture.cs | Switches Northwind fixture to use the general CosmosTestStoreFactory. |
| test/EFCore.Cosmos.FunctionalTests/Query/FromSqlQueryCosmosTest.cs | Updates an expected SQL predicate and result count following seeding changes. |
| test/EFCore.Cosmos.FunctionalTests/Query/OwnedQueryCosmosTest.cs | Removes redundant theory attributes from various overrides. |
| test/EFCore.Cosmos.FunctionalTests/Properties/TestAssemblyCondition.cs | Enables parallelism via CollectionBehavior(MaxParallelThreads = -1). |
| test/EFCore.Cosmos.FunctionalTests/Storage/CosmosDatabaseCreatorTest.cs | Updates test store usage and adds emulator locking helper. |
| test/EFCore.Cosmos.FunctionalTests/PartitionKeyTest.cs | Removes redundant EnsureCreatedAsync calls (relying on store lifecycle). |
| test/EFCore.Cosmos.FunctionalTests/HierarchicalPartitionKeyTest.cs | Removes redundant EnsureCreatedAsync calls (relying on store lifecycle). |
| test/EFCore.Cosmos.FunctionalTests/EndToEndCosmosTest.cs | Updates initialized store creation and removes redundant EnsureCreatedAsync calls. |
| test/EFCore.Cosmos.FunctionalTests/EmbeddedDocumentsTest.cs | Switches fixture store creation to CosmosTestStoreFactory. |
| test/EFCore.Cosmos.FunctionalTests/CosmosTriggersTest.cs | Removes redundant EnsureCreatedAsync call prior to trigger creation. |
| test/EFCore.Cosmos.FunctionalTests/CosmosSessionTokensTest.cs | Adds collection scoping and adjusts session token storage plumbing for parallel safety. |
| test/EFCore.Cosmos.FunctionalTests/ConnectionSpecificationTest.cs | Switches test store creation to CosmosTestStoreFactory. |
| test/EFCore.Cosmos.FunctionalTests/ConfigPatternsCosmosTest.cs | Switches initialized store creation to factory and routes container creation via test store. |
| test/EFCore.Cosmos.FunctionalTests/EFCore.Cosmos.FunctionalTests.csproj | Removes Northwind.json copy-to-output configuration. |
Comments suppressed due to low confidence (7)
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosEmulatorTestStore.cs:239
- WaitChangeContainerCountAsync performs awaited DeleteContainersAsync work inside GetWaitTask, which is awaited while holding _concurrencyControlSemaphore. This can block other wait/release operations for long periods (and increases deadlock risk). Consider computing required actions under the lock, releasing it, then performing container deletion/waits outside the lock and looping as needed.
// Re enter the queue.
if (_databaseCreated)
{
await DeleteContainersAsync(dbContext); // @TODO: Does this have to happen inside the lock?
}
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs:145
- CreatedContainersAsync ignores the provided cancellationToken by passing CancellationToken.None to EnsureCreatedAsync. This prevents callers from canceling long-running initialization; pass through the cancellationToken instead.
public virtual async Task<bool> CreatedContainersAsync(DbContext context, bool skipTokenCredential = false, CancellationToken cancellationToken = default)
{
if (!TestEnvironment.UseTokenCredential || skipTokenCredential)
{
return await context.Database.EnsureCreatedAsync(CancellationToken.None).ConfigureAwait(false);
}
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosTestStore.cs:49
- The 'shared' parameter in CreateName is unused. Either remove it or incorporate it into naming (if shared stores need stable names while non-shared need uniqueness), to avoid confusion about intended behavior.
private static string CreateName(string name, bool shared)
{
return !TestEnvironment.IsEmulator ? name + _runId : name;
}
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosCloudTestStore.cs:11
- Typo in field name '_aquired' (should be '_acquired') which deviates from the codebase naming conventions and makes the code harder to read/search.
private bool _aquired;
test/EFCore.Cosmos.FunctionalTests/TestUtilities/TestEnvironment.cs:36
- UseTokenCredential is parsed with a case-sensitive string comparison to "True". This will treat common values like "true"/"FALSE" (e.g., from env vars) as false; consider using bool.TryParse or a case-insensitive comparison to avoid misconfiguration.
public static bool UseTokenCredential { get; } = Config["UseTokenCredential"] == "True";
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosEmulatorTestStore.cs:16
- ContainerCrudSemaphore is created with a max count of int.MaxValue (new SemaphoreSlim(1)). Using new SemaphoreSlim(1, 1) helps catch accidental over-release and preserves mutual exclusion guarantees.
/// <summary>
/// The emulator becomes unstable with parallel CRUD container requests, use this lock to create containers.
/// </summary>
public static readonly SemaphoreSlim ContainerCrudSemaphore = new(1);
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosCloudTestStore.cs:9
- The SemaphoreSlim is constructed with only an initial count (max defaults to int.MaxValue). Consider specifying a maxCount (e.g., new SemaphoreSlim(Environment.ProcessorCount, Environment.ProcessorCount)) to catch accidental over-release bugs.
private static readonly SemaphoreSlim _semaphore = new(Environment.ProcessorCount);
test/EFCore.Cosmos.FunctionalTests/Storage/CosmosDatabaseCreatorTest.cs
Outdated
Show resolved
Hide resolved
test/EFCore.Cosmos.FunctionalTests/TestUtilities/CosmosEmulatorTestStore.cs
Show resolved
Hide resolved
| { | ||
| if (_initialized | ||
| && _dataFilePath == null) | ||
| if (_initialized && !Shared) |
There was a problem hiding this comment.
DisposeAsync no longer deletes databases for Shared test stores. For runs against a real Cosmos DB account this can leak databases/containers across test runs and incur cost/quota issues. Consider deleting the shared database when the fixture/test run completes (e.g., reference-count shared stores like CosmosEmulatorTestStore does, or otherwise ensure shared stores are cleaned up deterministically).
| if (_initialized && !Shared) | |
| if (_initialized) |
There was a problem hiding this comment.
Do we want a similar tracking mechanism for real cosmos db runs to delete the database if no longer used? I thought it would just be simpler to delete the cosmos account in the portal
There was a problem hiding this comment.
We want to keep the account to keep the read-only databases (Northwind) around, so let's delete the other dbs as before
There was a problem hiding this comment.
@AndriySvyryd I think query / Shared test stores are always read only. For example the association fixtures. So this currently deletes all non-read-only databases. Do you only want to keep Northwind around when running against cloud? Note that locally I actually delete everything now (when no tests are using it anymore), but I've sped up the creation of the Northwind3 db greatly
There was a problem hiding this comment.
If Northwind creation is fast now, you can delete it as well, just make sure it uses a unique name when running against cloud
|
Think I will wait for #37681 with this |
|
I'm holding off on this for #37844 , might be significant |
…cosmos-test-paralellism-v2
|
#37681 makes this a bit more cumbersome. In order for this to work as it stands, the NonSharedStoreName has to be the same as the SharedStoreName for all cosmos tests (that have both a shared and non-shared store). Or it always has to be SharedStoreName + "_NonShared" and I can add a special check so those use the same test store. This could be done by sealing the override QueryTestBase.NonSharedStoreName If neither of these are done or adhered to, I think this could deadlock where the SharedStore is created and creates 10 containers, and then the NonSharedStore is initialized in the same test class. Alternative would be adding a IAsyncLifetime to every cosmos test and cleaning the containers after each test method. Seems bulky. Makes me think that after #37681 this isn't feasible anymore, or at-least with the current method. Also taking into account the downsides of this method in general listed above in the description. Unless we seal QueryTestBase.NonSharedStoreName? @roji Thoughts on this? |
|
FYI, I submitted #37110 which in theory could allow to start X number of emulator containers and execute X test classes in parallel. The downside is that it currently doesn't support many of the features we are using. |
With some spare time, I've been experimenting around with the cosmos functional tests to see if I can get them to run in parallel. With an old attempt (See: #37696) being possibly unstable in the future or with a certain combination of tests, this PR aims to allow parallel test execution by manually controlling parallelism taking into account limitations of the emulator when needed.
The biggest challenge for parallel execution appeared to be the cosmos db emulator, which can become unstable in certain scenario's.
After some testing I found that the create container implementation in the emulator has a race condition, and can crash the emulator when ran in parallel. Other CRUD requests for containers appear to have no direct effect on the running of the emulator, but parallel requests can cause the emulator to have to clear the data upon restart.
In addition, the docs state that the emulator could show performance degradation with more than 10 concurrent containers.
On top of that, my tests have shown the emulator will stop responding to create container requests at some point when creating many containers. On my local machine this happens consistently after the creation of the 25th container.
This means that when running the tests against the emulator we can't execute CRUD container requests in parallel and ideally no more than 10 containers exist at the same time, and never more than 25.
This could be fixed by just decreasing the parallelism using
CollectionBehavior(MaxParallelThreads = N), but this is risky as it will cause errors or slow tests when many tests are added that create a lot of containers and those happen to run in parallel. See: #37696This PR implements manual parallelism control in CosmosEmulatorTestStore based on the number of containers each test store is attempting to create. This is both faster and prevents possible errors, but there are a couple of downsides, being mainly:
Some other challenges were:
There was a custom import for a cosmos specific Northwind database, which didn't appear to be needed. The custom import was very slow compared to using savechange's new batching, so I removed it and had to slightly change 1 test.
This PR decreases total test run time:
locally:
from 26 to 12.5 minutes
pipeline:
From ~30+ minutes to ~20 minutes total run time
Run against real cosmos db: Done