Skip to content

Fix race condition in FeatureReferences<T>.Fetch#66533

Open
eocrawford wants to merge 2 commits intodotnet:mainfrom
eocrawford:fix/feature-references-fetch-race
Open

Fix race condition in FeatureReferences<T>.Fetch#66533
eocrawford wants to merge 2 commits intodotnet:mainfrom
eocrawford:fix/feature-references-fetch-race

Conversation

@eocrawford
Copy link
Copy Markdown

@eocrawford eocrawford commented Apr 30, 2026

Under concurrent load, FeatureReferences<T>.Fetch intermittently returns null, causing NullReferenceException in DefaultHttpRequest.set_RouteValues during endpoint routing.

System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.AspNetCore.Http.DefaultHttpRequest.set_RouteValues(RouteValueDictionary value)
   at Microsoft.AspNetCore.Routing.Matching.DefaultEndpointSelector.Select(HttpContext httpContext, Span`1 candidateState)
   at Microsoft.AspNetCore.Routing.Matching.DfaMatcher.<SelectEndpointWithPoliciesAsync>d__13.MoveNext()

Why the current code is incorrect

There are two race windows in FeatureReferences<TCache>, a mutable struct with no synchronization:

Race 1 — Fetch writes null to a shared ref field (line 103)

When Fetch detects a revision change, it writes cached = null! to force a cache refresh. cached is a ref parameter aliasing a field in the Cache struct. The null is visible to any concurrent caller reading the same field before UpdateCached repopulates it on line 120. The ! null-forgiving operator at the call site does nothing at runtime, so the null propagates and is dereferenced.

Race 2 — UpdateCached returns from the ref field after Cache = default (line 117, line 135)

UpdateCached writes the resolved feature to cached (the ref field), then returns cached. A concurrent caller's Cache = default can zero the ref field between the write and the return, causing UpdateCached to return null.

Fix

Race 1: Replace cached = null! + cached ?? UpdateCached(...) with (flush ? null : cached) ?? UpdateCached(...). The null is a value on the evaluation stack, never written to the shared ref field.

Race 2: In UpdateCached, use a local variable (value) for the resolved feature. Write to the ref field to populate the cache, but return from the local. A concurrent Cache = default can zero the ref field, but the returned local is unaffected.

Test

Added FeatureReferencesFetchTests with a concurrent regression test: 8 threads, 200K iterations, half bumping the revision while the other half call Fetch. Asserts Fetch never returns null. This test fails against the unpatched code and passes with the fix.

Evidence

  • Reproduced at 0.02% rate (3–6 NREs per ~14,000 requests) with 35 concurrent threads against ASP.NET Core 9.0.15 on IIS
  • 4 instrumented builds of Microsoft.AspNetCore.Routing.dll narrowed the crash site: SelectEndpointWithPoliciesAsyncDefaultEndpointSelector.SelectDefaultHttpRequest.set_RouteValuesFeatureReferences.Fetch
  • Request.RouteValues reads as non-null in the catch handler — the race resolves between the throw and the catch
  • Production: 19K+ errors across multiple deployments; 138 errors across 95 tenants in 7 days

Fixes #42040

Related: OData/AspNetCoreOData#1263, #56276, #41924

Copilot AI review requested due to automatic review settings April 30, 2026 00:15
@github-actions github-actions Bot added the needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically label Apr 30, 2026
@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label Apr 30, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Thanks for your PR, @eocrawford. Someone from the team will get assigned to your PR shortly and we'll get it reviewed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets a reported race in FeatureReferences<TCache>.Fetch by removing an intermediate cached = null! write that can be observed by concurrent readers during feature cache invalidation.

Changes:

  • Removed the explicit cached = null! assignment when IFeatureCollection.Revision changes, relying on the later cache flush logic instead.
Comments suppressed due to low confidence (1)

src/Extensions/Features/src/FeatureReferences.cs:106

  • With the removal of cached = null!, a revision mismatch no longer forces UpdateCached when the per-feature cached field is already non-null. That means flush = true becomes ineffective in the common case (cached value exists), so Cache won’t be cleared and Revision won’t be updated, and callers can observe stale features after IFeatureCollection.Revision changes. If the goal is to avoid the intermediate null write, you still need to force UpdateCached when flush is true (e.g., branch on flush before the cached ?? ... fast-path).
            // Collection changed, clear whole feature cache
            flush = true;
        }

        return cached ?? UpdateCached(ref cached!, state, factory, revision, flush);

Comment on lines 100 to 104
if (Revision != revision)
{
// Clear cached value to force call to UpdateCached
cached = null!;
// Collection changed, clear whole feature cache
flush = true;
}
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is addressing a very subtle concurrency/revision-invalidation path, but there’s no unit coverage around FeatureReferences<TCache>.Fetch behavior when Revision changes while a cached feature field is already non-null. Consider adding a regression test in src/Extensions/Features/test that mutates a FeatureCollection to bump Revision and asserts a subsequent Fetch re-reads/refreshes the requested feature (and/or clears the cache) even when the cached ref starts non-null.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in this PR: Fetch_RevisionChanged_RefreshesCache in FeatureReferencesFetchTests.cs covers this — sets a feature, calls Fetch, bumps the revision via a new Set, calls Fetch again, and asserts the refreshed value is returned.

@eocrawford eocrawford force-pushed the fix/feature-references-fetch-race branch 2 times, most recently from 0ecb8a0 to 14aadbf Compare April 30, 2026 00:35
@davidfowl
Copy link
Copy Markdown
Member

Is the HttpContext being used concurrently from different threads.

@eocrawford
Copy link
Copy Markdown
Author

Is the HttpContext being used concurrently from different threads.

Almost certainly yes, with our current implemention

@davidfowl
Copy link
Copy Markdown
Member

Why? It's not thread safe.

Two changes to eliminate a race where concurrent callers of Fetch
see a transiently null cached feature reference:

1. In Fetch: replace `cached = null!` + `cached ?? UpdateCached(...)`
   with `(flush ? null : cached) ?? UpdateCached(...)`. The null is
   on the evaluation stack, never written to the shared ref field.

2. In UpdateCached: use a local variable for the return value instead
   of reading back from the ref field. A concurrent `Cache = default`
   can zero the ref field between the write and the return; returning
   from a local avoids this.

Fixes dotnet#42040
@eocrawford eocrawford force-pushed the fix/feature-references-fetch-race branch from 0d83339 to ef25534 Compare April 30, 2026 21:55
@eocrawford
Copy link
Copy Markdown
Author

eocrawford commented Apr 30, 2026

The concurrent access comes from OData batch processing. OData's ODataBatchMiddleware creates sub-request contexts that share the parent's IFeatureCollection. When multiple batch sub-requests are dispatched through the middleware pipeline, they share feature state. The revision counter on the collection changes as features are set during request processing, and concurrent Fetch calls on different sub-requests race through the revision-change path.

We don't control this — it's how Microsoft.AspNet.OData.Batch.ODataBatchMiddleware works. The sub-request contexts are created via ODataBatchReaderExtensions.CreateHttpContext() which copies features from the parent into a new FeatureCollection, but the cached FeatureReferences struct on DefaultHttpRequest still points into shared state.

Separately, IIS can bump the feature collection revision during request processing (e.g., when server variables are lazily populated), which can trigger the revision-change path in Fetch even on a single-request codepath if the timing aligns with middleware pipeline execution.

@azure-pipelines
Copy link
Copy Markdown

Commenter does not have sufficient privileges for PR 66533 in repo dotnet/aspnetcore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Indicates that the PR has been added by a community member needs-area-label Used by the dotnet-issue-labeler to label those issues which couldn't be triaged automatically

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Throw a better exception when the HttpContext is accessed concurrently and will result in a null ref

3 participants