Skip to content

.NET 9 ThreadPool Starvation - High QueueSize with Many Idle Worker Threads #117877

@vrecluse

Description

@vrecluse

Problem Description
We've encountered a critical performance issue with the .NET 9.0 thread pool (using SDK 9.0.303, a stable release) in our production/high-load environment.

Under heavy load, the application becomes extremely slow or unresponsive. Upon diagnosis with dotnet-stack, we observed a pathological state:

The ThreadPool's global work item queue (QueueSize) was backlogged with thousands of items (in our log, the count was 6759).

Simultaneously, dozens of ThreadPool worker threads (PortableThreadPool+WorkerThread) were completely idle, waiting on LowLevelLifoSemaphore.Wait.

Expected Behavior:
When there are pending work items in the global queue, idle worker threads should be woken up immediately to process them.

Actual Behavior:
Work items are severely backlogged in the global queue while a large number of worker threads remain idle, leading to effective thread pool starvation.

Analysis
This issue appears to be related to the internal scheduling or signaling/wakeup mechanism of the new Portable Thread Pool introduced in .NET 9. The logs strongly suggest a failure in the mechanism responsible for waking up idle threads despite a massive number of pending tasks.

Our application is an Orleans-based service that uses ASP.NET Core Kestrel, MongoDB, Nacos, OpenTelemetry, and other libraries, involving a high degree of asynchronous I/O.

Evidence: dotnet-stack Trace
Below are the key parts of the dotnet-stack log we captured. The full log file can be provided if needed.

Log Summary:

QueueSize: 6759

Typical stack trace for the numerous idle threads:

Thread (0x3D6B):
CPU_TIME
System.Private.CoreLib!System.Threading.LowLevelLifoSemaphore.WaitNative(...)
System.Private.CoreLib!System.Threading.LowLevelLifoSemaphore.WaitForSignal(int32)
System.Private.CoreLib!System.Threading.LowLevelLifoSemaphore.Wait(int32,bool)
System.Private.CoreLib!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
At the same time, at least one worker thread was active, processing an HTTP/2 frame write in Kestrel.

We believe this is a critical bug within the .NET 9.0 runtime. We hope this information is helpful for the investigation.

Panda.Silo.Server.stack_log_20250721_161744.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions