Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceProvider.GetServices() Returns Results Based on Last GetKeyedServices() Call #111795

Closed
h-fotakiev opened this issue Jan 24, 2025 · 20 comments · Fixed by #113343
Closed

ServiceProvider.GetServices() Returns Results Based on Last GetKeyedServices() Call #111795

h-fotakiev opened this issue Jan 24, 2025 · 20 comments · Fixed by #113343
Assignees
Labels
area-Extensions-DependencyInjection help wanted [up-for-grabs] Good issue for external contributors in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@h-fotakiev
Copy link

Description

The ServiceProvider.GetServices<T>() method in Microsoft.Extensions.DependencyInjection exhibits inconsistent behaviour when used in conjunction with .GetKeyedServices(key). Specifically, after calling .GetKeyedServices(key). for a particular key more than once and awaiting a slight delay, subsequent calls to GetServices() return the same result as the last GetKeyedServices(key) call, instead of only returning non-keyed services as expected. You can also no longer get the IEnumerable of services of the specified type that have been registered without a key since the method keeps returning the same result as the last GetKeyedServices(key) call. The behaviour only occurs if the .GetKeyedServices<T>(key) returns 1 or more results - it doesn't work with keys with no dependencies registered of the requested type <T>.

Reproduction Steps


var services = new ServiceCollection();

services.AddKeyedTransient<MyClass>("test", (ctx, key) => new MyClass { Key = (string)key });
services.AddKeyedTransient<MyClass>("anotherTest", (ctx, key) => new MyClass { Key = (string)key });
services.AddTransient<MyClass>(ctx => new MyClass { Key = "null" });

var provider = services.BuildServiceProvider();

Console.WriteLine(provider.GetKeyedServices<MyClass>("test").ElementAt(0).Key);           // Output: "test"
await Task.Delay(100);
Console.WriteLine(provider.GetServices<MyClass>().ElementAt(0).Key);                      // Expected: "null", Actual: "null"
await Task.Delay(100);
Console.WriteLine(provider.GetKeyedServices<MyClass>("anotherTest").ElementAt(0).Key);    // Output: "anotherTest"
await Task.Delay(100);
Console.WriteLine(provider.GetServices<MyClass>().ElementAt(0).Key);                      // Expected: "null", Actual: "null"
await Task.Delay(100);
Console.WriteLine(provider.GetKeyedServices<MyClass>("test").ElementAt(0).Key);           // Output: "test"
await Task.Delay(100);
Console.WriteLine(provider.GetServices<MyClass>().ElementAt(0).Key);                      // Expected: "null", Actual: "test"
await Task.Delay(100);
Console.WriteLine(provider.GetKeyedServices<MyClass>("anotherTest").ElementAt(0).Key);    // Output: "anotherTest"
await Task.Delay(100);
Console.WriteLine(provider.GetServices<MyClass>().ElementAt(0).Key);                      // Expected: "null", Actual: "anotherTest"

class MyClass()
{
    public string? Key { get; set; };
}

Expected behavior

provider.GetServices<MyClass>() should only return an enumerable containing the implementation defined as services.AddTransient<MyClass>( ctx => new MyClass { Key = "null" });

Actual behavior

provider.GetServices<MyClass>() returns the result of the last provider.GetKeyedServices<MyClass>(key) call.

Regression?

I tested it only on net8.0 but tried all library versions supporting keyed services.

Known Workarounds

No known workarounds.

Configuration

.NET Version: .NET 8.0
Tested OSs: macOs Sonoma 14.4 (with intel chip), Ubuntu
Architecture: x64
Is it specific to that configuration: No, I presume it will work (or not work, depending on how you look at it) on any configuration.

Other information

If you call GetKeyedServices(key) at least twice, it will replace the result of all subsequent GetServices() with its own result. The replacement doesn't take effect instantly after the 2nd GetKeyedServices(key) call though, but a few milliseconds after it. When I tested it with versions 8.0.0, 8.0.1 and 9.0.0, 10 milliseconds delay was more than enough. With 9.0.1 sometimes it takes more than 10 ms for the effect to take place, which is why the example uses 100 ms.

You don't necessarily need an asynchronous operation for the bug to occur. You can use synchronous calls to reproduce the bug, for example:

var services = new ServiceCollection();
services.AddKeyedTransient<MyClass>("key");
var serviceProvider = services.BuildServiceProvider();

serviceProvider.GetKeyedServices<MyClass>("key");
serviceProvider.GetKeyedServices<MyClass>("key");

var count = 0;
var iterations = 0;
do{
    count  = serviceProvider.GetServices<MyClass>().Count();
    iterations++;
}while(count!=1);

Console.WriteLine(iterations);

This synchronous code will at some point exit the while loop but the number of iterations it takes is inconsistent (I got as low as 18000 and as high as 49 000), so it's easier to just use awat Task.Delay(10) instead of a loop, and still get the same behaviour.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 24, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-extensions-dependencyinjection
See info in area-owners.md if you want to be subscribed.

@KalleOlaviNiemitalo
Copy link

The race condition comes from the ThreadPool.UnsafeQueueUserWorkItem call in DynamicServiceProviderEngine.RealizeService:

if (Interlocked.Increment(ref callCount) == 2)
{
// Don't capture the ExecutionContext when forking to build the compiled version of the
// resolve function
_ = ThreadPool.UnsafeQueueUserWorkItem(_ =>
{
try
{
_serviceProvider.ReplaceServiceAccessor(callSite, base.RealizeService(callSite));
}
catch (Exception ex)
{
DependencyInjectionEventSource.Log.ServiceRealizationFailed(ex, _serviceProvider.GetHashCode());
Debug.Fail($"We should never get exceptions from the background compilation.{Environment.NewLine}{ex}");
}
},
null);
}

The bug is that, when CallSiteFactory.TryCreateEnumerable creates an IEnumerableCallSite, it does not set ServiceCallSite.Key, which remains null:

return _callSiteCache[callSiteKey] = new IEnumerableCallSite(resultCache, itemType, callSites);

public IEnumerableCallSite(ResultCache cache, Type itemType, ServiceCallSite[] serviceCallSites) : base(cache)
{
Debug.Assert(!ServiceProvider.VerifyAotCompatibility || !itemType.IsValueType, "If VerifyAotCompatibility=true, an IEnumerableCallSite should not be created with a ValueType.");
ItemType = itemType;
ServiceCallSites = serviceCallSites;
}

ServiceCallSite.Key being null then causes ServiceProvider.ReplaceServiceAccessor to update the wrong item in _serviceAccessors, assigning the new keyed service accessor to an unkeyed service:

internal void ReplaceServiceAccessor(ServiceCallSite callSite, Func<ServiceProviderEngineScope, object?> accessor)
{
_serviceAccessors[new ServiceIdentifier(callSite.Key, callSite.ServiceType)] = new ServiceAccessor
{
CallSite = callSite,
RealizedService = accessor
};
}

In contrast, CallSiteFactory.TryCreateExact sets ServiceCallSite.Key before it stores the ServiceCallSite to the ConcurrentDictionary:

callSite.Key = descriptor.ServiceKey;
return _callSiteCache[callSiteKey] = callSite;

CallSiteFactory.TryCreateOpenGeneric doesn't set ServiceCallSite.Key either. That is also a bug and can be tested as follows:

using System;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;

var services = new ServiceCollection();

services.AddTransient(typeof(IService<>), typeof(UnkeyedService<>));
services.AddKeyedTransient(typeof(IService<>), "1", typeof(KeyedService1<>));

var provider = services.BuildServiceProvider();

Console.WriteLine(provider.GetRequiredService<IService<int>>());           // Output: "UnkeyedService`1[System.Int32]"
Console.WriteLine(provider.GetRequiredKeyedService<IService<int>>("1"));   // Output: "KeyedService1`1[System.Int32]"
Console.WriteLine(provider.GetRequiredKeyedService<IService<int>>("1"));   // Output: "KeyedService1`1[System.Int32]"
await Task.Delay(100);
Console.WriteLine(provider.GetRequiredService<IService<int>>());           // Expected: "UnkeyedService`1[System.Int32]", Actual: "KeyedService1`1[System.Int32]"

interface IService<T>
{
}

class UnkeyedService<T> : IService<T>
{
}

class KeyedService1<T> : IService<T>
{
}

@KalleOlaviNiemitalo
Copy link

There was a "dirty fix for ReplaceServiceAccessor" in e741b3e (part of #87183) but it covered only CallSiteFactory.TryCreateExact; it fixed neither CallSiteFactory.TryCreateEnumerable nor CallSiteFactory.TryCreateOpenGeneric.

@KalleOlaviNiemitalo
Copy link

So to fix this, I think the "dirty fix for ReplaceServiceAccessor" to set ServiceCallSite.Key must be implemented in CallSiteFactory.TryCreateEnumerable here:

return _callSiteCache[callSiteKey] = new IEnumerableCallSite(resultCache, itemType, callSites);

and in CallSiteFactory.TryCreateOpenGeneric here:

return _callSiteCache[callSiteKey] = CreateConstructorCallSite(lifetime, serviceIdentifier, closedType, callSiteChain);

Alternatively, it would be possible to change CallSiteFactory.CreateConstructorCallSite instead of CallSiteFactory.TryCreateOpenGeneric. That change would however be larger, as CallSiteFactory.CreateConstructorCallSite has three return new ConstructorCallSite statements and each of them would need to be changed. It would also be a bit redundant, as CallSiteFactory.CreateConstructorCallSite is called also from CallSiteFactory.TryCreateExact, which already sets ServiceCallSite.Key.

For automated testing, I don't see a clean way to wait until the work item enqueued by DynamicServiceProviderEngine.RealizeService has finished executing. AFAICT, ThreadPool.PendingWorkItemCount is decremented already when the work item is dequeued and has not yet started executing, so the test cannot just poll until ThreadPool.PendingWorkItemCount goes to zero. ThreadPool.CompletedWorkItemCount incrementing is a signal, but if the process is running other tests in parallel, then the increment could be caused by some unrelated work item.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Jan 26, 2025

I presume a non-"dirty" fix would be to make ServiceCallSite.Key a readonly property and initialise it in constructors. The key would be a required parameter of the constructors (either as is or as part of a ServiceIdentifier parameter), and it would not be possible to forget to set it. However, if the fix needs to be backported to .NET 8.0, then I expect a smaller change would be more easily approved.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Jan 27, 2025

Known Workarounds

If you add the following to the start of the program, then the bug does not occur, but DI containers may become slower:

AppContext.SetSwitch("Microsoft.Extensions.DependencyInjection.DisableDynamicEngine", true);

IIUC, Native AOT compilation likewise disables the dynamic engine and thus avoids the bug:

#if NETFRAMEWORK || NETSTANDARD2_0
engine = CreateDynamicEngine();
#else
if (RuntimeFeature.IsDynamicCodeCompiled && !DisableDynamicEngine)
{
engine = CreateDynamicEngine();
}
else
{
// Don't try to compile Expressions/IL if they are going to get interpreted
engine = RuntimeServiceProviderEngine.Instance;
}
#endif

The existence of this workaround makes the fix less important to backport to .NET 8.0 and 9.0.

@rjgotten
Copy link

The existence of this workaround makes the fix less important to backport to .NET 8.0 and 9.0.

Given that some of Microsoft's own extension packages use keyed services, it probably would be a good idea to disable the dynamic engine by default. Because the current state is basically a set-up to have everyone that has a mildly complicated project, sent through a debugging hell-hole before finally (hopefully...) by some miraculous event arriving at this thread here -- to learn about the busted functionality in the dynamic engine and that the fix for them is basically "upgrade to .NET 10 or disable it; here's how."

@steveharter
Copy link
Member

@KalleOlaviNiemitalo thanks for your analysis. I have not prototyped the suggestion:

I presume a non-"dirty" fix would be to make ServiceCallSite.Key a readonly property and initialise it in constructors. The key would be a required parameter of the constructors (either as is or as part of a ServiceIdentifier parameter), and it would not be possible to forget to set it. However, if the fix needs to be backported to .NET 8.0, then I expect a smaller change would be more easily approved.

but am flagging this as "Help Wanted" if someone wants to pick this up and ideally add tests that would verify.

@steveharter steveharter added this to the 10.0.0 milestone Jan 27, 2025
@steveharter steveharter added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 27, 2025
@KalleOlaviNiemitalo
Copy link

the fix for them is basically "upgrade to .NET 10 or disable it; here's how."

Upgrading only the Microsoft.Extensions.DependencyInjection package should suffice, if it remains compatible with the older .NET Runtime. On the main branch, the project targets net8.0; and I expect version 10.0.0 will keep targeting net8.0, like version 8.0.0 still targeted net6.0.

<!-- The minimum supported .NET version. -->
<NetCoreAppMinimum>net8.0</NetCoreAppMinimum>

<TargetFrameworks>$(NetCoreAppCurrent);$(NetCoreAppPrevious);$(NetCoreAppMinimum);netstandard2.1;netstandard2.0;$(NetFrameworkMinimum)</TargetFrameworks>

However, it could indeed be difficult for developers suffering from this bug to discover that there is a fix.

@IliaShuliatikov
Copy link
Contributor

IliaShuliatikov commented Feb 21, 2025

Hi @steveharter, as mentioned in previous comments, the problem is that the ServiceCallSite.Key property is only set for an instance created in the TryCreateExact method, for all the other cases it is not the case. This results in unkeyed entry being updated (overwritten) by the keyed one.

The fix proposed by @KalleOlaviNiemitalo to make the ServiceCallSite.Key property readonly and make it a required constructor parameter is indeed working - it solves the problem of this property not being initialized in some cases, as well as ensures that we won't have this problem in case we have a new class derived from ServiceCallSite.
In case we need to backport the fix to .NET 8, we won't have any breaking changes, since we're not changing the public API: ServiceCallSite and all existing ServiceCallSite derived classes are internal classes and we're making all code changes within the Microsoft.Extensions.DependencyInjection project.

If all that sounds good, I'd be happy to help and implement the fix as well as cover these cases with new tests. I see the issue has the help wanted label, is it still available for contribution? If so, could you please assign it to me? Thank you.

@KalleOlaviNiemitalo
Copy link

Re automated testing of the fix:

ThreadPool.CompletedWorkItemCount incrementing is a signal, but if the process is running other tests in parallel, then the increment could be caused by some unrelated work item.

Some tests use RemoteExecutor to run code in a separate process. If the test did that, I think it could then rely on polling ThreadPool.CompletedWorkItemCount.

Or perhaps it could limit the thread pool to one worker thread only (ThreadPool.SetMaxThreads), post another work item after DynamicServiceProviderEngine.RealizeService has posted its, and wait for the second work item to finish. I don't know how reliably the thread pool stays within the specified max thread limit, though.

@steveharter
Copy link
Member

If all that sounds good, I'd be happy to help and implement the fix as well as cover these cases with new tests. I see the issue has the help wanted label, is it still available for contribution? If so, could you please assign it to me? Thank you.

Done. Thanks.

@steveharter steveharter self-assigned this Mar 7, 2025
@steveharter steveharter marked this as a duplicate of #113022 Mar 7, 2025
@IliaShuliatikov
Copy link
Contributor

@steveharter, while it's relatively easy to fix the problem, covering these cases with unit tests is tricky since the DI cache update happens inside the DynamicServiceProviderEngine method in a background task.

One way to do this is to simply use Task.Delay, but this might not be completely reliable. I've created tests verifying this behavior for both Generic and Enumerable cases, ran them with different delay (with the current implementation, i.e. without applying the fix) to see if we could get some acceptable rate and it seems that 10000 ms is pretty reliable (though I assume the environment can affect the results):

OS Runtime False Positive Rate with 100 ms Delay False Positive Rate with 1000 ms Delay False Positive Rate with 10000 ms Delay
Windows 11 24H2 net10.0 ~53% (16 out of 30 runs) ~2.7% (44 out of 1600 runs) 0% (0 out of 500 runs)
Windows 11 24H2 net462 ~56% (17 out of 30 runs) ~4.3% (69 out of 1600 runs) 0% (0 out of 500 runs)
Ubuntu 24.04.1 LTS net10.0 ~40% (40 out of 100 runs) 0% (0 out of 200 runs) 0% (0 out of 1000 runs)

Please let me know if this is acceptable, and we can use the 10000 ms delay to wait for the cache to be updated.

If this is still not reliable enough, I can see several ways to do it reliably, but so far I haven't found a reliable way to do it without any additional code changes (e.g. by using the reflection in tests), so in this case we need to make changes to the DynamicServiceProviderEngine class just for tests (e.g. via a new internal event or by converting background update to an internal Action field in the DynamicServiceProviderEngine class). Given that the logic of this class has not been changed frequently, I would prefer to avoid this.

Could you please share your opinion on which way seems preferable to you?

@steveharter
Copy link
Member

I suggested in the PR to make these tests Outerloop. Can you verify they still repro when running as outerloop. Sample commands to run those:

cd {YOURPATH}\src\libraries\Microsoft.Extensions.DependencyInjection\tests\DI.Tests
dotnet build /t:test /p:outerloop=true

@steveharter
Copy link
Member

Also I have an open PR that also fixes some caching issues: #113137

@IliaShuliatikov
Copy link
Contributor

I suggested in the PR to make these tests Outerloop.

Thank you, done.

Can you verify they still repro when running as outerloop.

Yes, they reproduce when running outerloop tests as you suggested.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Mar 27, 2025

This fix is not included in the Microsoft.Extensions.DependencyInjection 10.0.0-preview.2.25163.2 package, but it is on the release/10.0-preview3 branch. I assume a preview package that includes it will be released in April 2025 and can then be used on .NET 8 and .NET Framework too.

@rjgotten
Copy link

rjgotten commented Mar 27, 2025

@KalleOlaviNiemitalo
This fix is not included in the Microsoft.Extensions.DependencyInjection 10.0.0-preview.2.25163.2 package, but it is on the release/10.0-preview3 branch. I assume a preview package that includes it will be released in April 2025 and can then be used on .NET 8 and .NET Framework too.

So- I take it this then won't be backported and fixed for those users that remain on the 8.0.x series of packages and their closed scope of features, that originally went with .NET 8 ?

@KalleOlaviNiemitalo
Copy link

@rjgotten, I don't know whether the fix will be backported to 9.0 and/or 8.0. The fix looks pretty straightforward and has tests, so a backport would have only a low risk of causing new bugs. The public API did not change.

I have been commenting here as a user of Microsoft.Extensions.DependencyInjection who has interest in strange bugs. This particular bug has not affected any software developed by me, so I'm not going to ask the maintainers of .NET to backport the fix.

If this bug is affecting your application and you want a backport, I guess the maintainers may be interested in why you cannot use the DisableDynamicEngine workaround.

@rjgotten
Copy link

rjgotten commented Mar 27, 2025

@KalleOlaviNiemitalo
I guess the maintainers may be interested in why you cannot use the DisableDynamicEngine workaround.

Oh, I could use that workaround in potential affected apps. It's just a matter of knowing whether I should expect to need to keep it disabled - because as I understood it, having it disabled involves a few performance downsides as well. That's all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Extensions-DependencyInjection help wanted [up-for-grabs] Good issue for external contributors in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
5 participants