Skip to content

System.Text.RegularExpressions.Tests: multiple tests OOM on x86 checked coreclr #126003

@danmoseley

Description

@danmoseley

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=1348372
Build error leg: Libraries Test Run checked coreclr windows x86 Debug
Pull request: #125928

Error Message

{
  "ErrorMessage": ["RegularExpressions.Tests", "OutOfMemoryException"],
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Description

Four System.Text.RegularExpressions.Tests tests fail with OutOfMemoryException on the x86 checked coreclr leg (net11.0-windows-Debug-x86-coreclr_checked). All share the same root cause: the SourceGenerated regex engine path runs full Roslyn compilations at test time, and these exhaust the ~2 GB x86 address space.

Contributing factors

  • Large batch sizes: SourceGenRegexAsync compiles all patterns per test class in a single Roslyn invocation — up to ~2,903 patterns for RegexPcre2Tests. Each becomes a [GeneratedRegex] method with BDD tables and character class logic, producing enormous generated source that Roslyn must analyze and emit.
  • Parallel test collections (4 threads): Multiple large Roslyn compilations can overlap simultaneously.
  • Checked coreclr overhead: Extra validation data structures reduce the effective addressable memory.

Failing tests

1. MonoTests.ValidateRegex (~1,008 patterns)

OOM in Roslyn emit phase (BlobBuilder):

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Reflection.Metadata.BlobBuilder..ctor(Int32 capacity)
   at System.Reflection.Metadata.Ecma335.MetadataBuilder.HeapBlobBuilder..ctor(Int32 capacity)
   at System.Reflection.Metadata.Ecma335.MetadataBuilder.HeapBlobBuilder.AllocateChunk(Int32 minimalSize)
   at System.Reflection.Metadata.BlobBuilder.Expand(Int32 newLength)
   at System.Reflection.Metadata.BlobBuilder.WriteBytesUnchecked(ReadOnlySpan`1 buffer)
   at System.Reflection.Metadata.BlobBuilder.WriteUTF16(ReadOnlySpan`1 value)
   at System.Reflection.Metadata.BlobBuilder.WriteUserString(String value)
   at System.Reflection.Metadata.Ecma335.MetadataBuilder.GetOrAddUserString(String value)
   at Microsoft.Cci.MetadataWriter.GetOrAddUserString(String str)
   ...
   at Microsoft.Cci.PeWriter.WritePeToStream(...)
   at System.Text.RegularExpressions.Tests.RegexGeneratorHelper.SourceGenRegexAsync(...)
   at System.Text.RegularExpressions.Tests.MonoTests.ValidateRegex_MemberData()

2. RegexPcre2Tests.IsMatchTests (~2,903 patterns)

OOM in Roslyn semantic analysis (DefiniteAssignmentPass):

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Array.Resize[T](T[]& array, Int32 newSize)
   at Microsoft.CodeAnalysis.CSharp.DefiniteAssignmentPass.Normalize(LocalState& state)
   at Microsoft.CodeAnalysis.CSharp.DefiniteAssignmentPass.Join(LocalState& self, LocalState& other)
   at Microsoft.CodeAnalysis.CSharp.AbstractFlowPass`2.ResolveBranch(...)
   at Microsoft.CodeAnalysis.CSharp.AbstractFlowPass`2.ResolveBranches(...)
   ...
   at Microsoft.CodeAnalysis.CSharp.MethodCompiler.CompileMethodBodies(...)
   at System.Text.RegularExpressions.Tests.RegexGeneratorHelper.SourceGenRegexAsync(...)

3. RegexGroupTests.Groups (~631 patterns)

OOM during source generator compilation:

System.ArgumentException : [generated source]
   at System.Text.RegularExpressions.Tests.RegexGeneratorHelper.SourceGenRegexAsync(...)
   at System.Text.RegularExpressions.Tests.RegexHelpers.GetRegexesAsync(...)
   at System.Text.RegularExpressions.Tests.RegexHelpers.GetRegexes(...)
   at System.Text.RegularExpressions.Tests.RegexGroupTests.Groups_MemberData()

4. RegexMatchTests.CharClassSubtraction_DeepNesting_DoesNotStackOverflow (engine: NonBacktracking)

Different mechanism — OOM building the BDD structure for a 1000-depth nested char class [a-[a-[...[a]...]]]:

System.OutOfMemoryException : Exception of type 'System.OutOfMemoryException' was thrown.
   at System.String.Substring(Int32 startIndex)
   at System.Text.RegularExpressions.Symbolic.RegexNodeConverter.<CreateBDDFromSetString>g__Compute|5_0(String set)
   at System.Text.RegularExpressions.Symbolic.RegexNodeConverter.CreateBDDFromSetString(String set)
   at System.Text.RegularExpressions.Symbolic.RegexNodeConverter.<CreateBDDFromSetString>g__Compute|5_0(String set)
   at System.Text.RegularExpressions.Symbolic.RegexNodeConverter.CreateBDDFromSetString(String set)
   ... (repeated ~1000 times)

Possible mitigations

  • Cap batch size in SourceGenRegexAsync (e.g., 200 patterns per compilation) — there is already a #if false block for compiling individually
  • Reduce maxParallelThreads for this test assembly on x86/checked legs
  • Skip SourceGenerated engine on x86 checked builds if the constraint is fundamental

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=1348372
Error message validated: [RegularExpressions.Tests OutOfMemoryException]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 3/24/2026 2:17:43 AM UTC

Report

Build Definition Test Pull Request
1348372 dotnet/runtime System.Text.RegularExpressions.Tests.RegexMatchTests.CharClassSubtraction_DeepNesting_DoesNotStackOverflow #125928

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
1 1 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Known Build ErrorUse this to report build issues in the .NET Helix tabarea-System.Text.RegularExpressionsblocking-clean-ciBlocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms'untriagedNew issue has not been triaged by the area owner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions