Fix race condition in FeatureReferences<T>.Fetch#66533
Fix race condition in FeatureReferences<T>.Fetch#66533eocrawford wants to merge 2 commits intodotnet:mainfrom
Conversation
|
Thanks for your PR, @eocrawford. Someone from the team will get assigned to your PR shortly and we'll get it reviewed. |
There was a problem hiding this comment.
Pull request overview
This PR targets a reported race in FeatureReferences<TCache>.Fetch by removing an intermediate cached = null! write that can be observed by concurrent readers during feature cache invalidation.
Changes:
- Removed the explicit
cached = null!assignment whenIFeatureCollection.Revisionchanges, relying on the later cache flush logic instead.
Comments suppressed due to low confidence (1)
src/Extensions/Features/src/FeatureReferences.cs:106
- With the removal of
cached = null!, a revision mismatch no longer forcesUpdateCachedwhen the per-feature cached field is already non-null. That meansflush = truebecomes ineffective in the common case (cached value exists), soCachewon’t be cleared andRevisionwon’t be updated, and callers can observe stale features afterIFeatureCollection.Revisionchanges. If the goal is to avoid the intermediate null write, you still need to forceUpdateCachedwhenflushis true (e.g., branch onflushbefore thecached ?? ...fast-path).
// Collection changed, clear whole feature cache
flush = true;
}
return cached ?? UpdateCached(ref cached!, state, factory, revision, flush);
| if (Revision != revision) | ||
| { | ||
| // Clear cached value to force call to UpdateCached | ||
| cached = null!; | ||
| // Collection changed, clear whole feature cache | ||
| flush = true; | ||
| } |
There was a problem hiding this comment.
This change is addressing a very subtle concurrency/revision-invalidation path, but there’s no unit coverage around FeatureReferences<TCache>.Fetch behavior when Revision changes while a cached feature field is already non-null. Consider adding a regression test in src/Extensions/Features/test that mutates a FeatureCollection to bump Revision and asserts a subsequent Fetch re-reads/refreshes the requested feature (and/or clears the cache) even when the cached ref starts non-null.
There was a problem hiding this comment.
Added in this PR: Fetch_RevisionChanged_RefreshesCache in FeatureReferencesFetchTests.cs covers this — sets a feature, calls Fetch, bumps the revision via a new Set, calls Fetch again, and asserts the refreshed value is returned.
0ecb8a0 to
14aadbf
Compare
|
Is the HttpContext being used concurrently from different threads. |
Almost certainly yes, with our current implemention |
|
Why? It's not thread safe. |
Two changes to eliminate a race where concurrent callers of Fetch see a transiently null cached feature reference: 1. In Fetch: replace `cached = null!` + `cached ?? UpdateCached(...)` with `(flush ? null : cached) ?? UpdateCached(...)`. The null is on the evaluation stack, never written to the shared ref field. 2. In UpdateCached: use a local variable for the return value instead of reading back from the ref field. A concurrent `Cache = default` can zero the ref field between the write and the return; returning from a local avoids this. Fixes dotnet#42040
0d83339 to
ef25534
Compare
|
The concurrent access comes from OData batch processing. OData's We don't control this — it's how Separately, IIS can bump the feature collection revision during request processing (e.g., when server variables are lazily populated), which can trigger the revision-change path in |
|
Commenter does not have sufficient privileges for PR 66533 in repo dotnet/aspnetcore |
Under concurrent load,
FeatureReferences<T>.Fetchintermittently returnsnull, causingNullReferenceExceptioninDefaultHttpRequest.set_RouteValuesduring endpoint routing.Why the current code is incorrect
There are two race windows in
FeatureReferences<TCache>, a mutable struct with no synchronization:Race 1 —
Fetchwrites null to a shared ref field (line 103)When
Fetchdetects a revision change, it writescached = null!to force a cache refresh.cachedis arefparameter aliasing a field in theCachestruct. The null is visible to any concurrent caller reading the same field beforeUpdateCachedrepopulates it on line 120. The!null-forgiving operator at the call site does nothing at runtime, so the null propagates and is dereferenced.Race 2 —
UpdateCachedreturns from the ref field afterCache = default(line 117, line 135)UpdateCachedwrites the resolved feature tocached(the ref field), then returnscached. A concurrent caller'sCache = defaultcan zero the ref field between the write and the return, causingUpdateCachedto return null.Fix
Race 1: Replace
cached = null!+cached ?? UpdateCached(...)with(flush ? null : cached) ?? UpdateCached(...). The null is a value on the evaluation stack, never written to the shared ref field.Race 2: In
UpdateCached, use a local variable (value) for the resolved feature. Write to the ref field to populate the cache, but return from the local. A concurrentCache = defaultcan zero the ref field, but the returned local is unaffected.Test
Added
FeatureReferencesFetchTestswith a concurrent regression test: 8 threads, 200K iterations, half bumping the revision while the other half callFetch. AssertsFetchnever returns null. This test fails against the unpatched code and passes with the fix.Evidence
Microsoft.AspNetCore.Routing.dllnarrowed the crash site:SelectEndpointWithPoliciesAsync→DefaultEndpointSelector.Select→DefaultHttpRequest.set_RouteValues→FeatureReferences.FetchRequest.RouteValuesreads as non-null in the catch handler — the race resolves between the throw and the catchFixes #42040
Related: OData/AspNetCoreOData#1263, #56276, #41924