[mono][sgen] Make color visible to client permanent #121247

BrzVlad · 2025-10-31T14:29:01Z

A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client.

This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.

Fixes assertions like:

* Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met

Does the exact same thing.

A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client. This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.

BrzVlad · 2025-10-31T14:29:23Z

graph TD;
    BL0-->NBMID;
    BL1-->NBMID;
    BL2-->NBMID;
    BL3-->NBMID;
    BL4-->NBMID;
    BL5-->NBMID;
    BL6-->NBMID;
    BL7-->NBMID;
    NBMID-->BR0;
    NBMID-->BR1;
    NBMID-->BR2;
    NBMID-->BR3;
    NBMID-->BR4;
    NBMID-->BR5;
    NBMID-->NBR7;
    NBMID-->BR6;
    NBR7-->BR5;
    NBR7-->BR6;

Consider the following graph, that is identical to the one from the added testcase. B prefix is for bridge objects, NB is for normal objects. Every object is an SCC in this scenario. The optimization passing SCCs containing non bridge objects is meant to prevent the addition of excessive links on the java side. This graph leads to the addition of 8+7 = 15 refs on java. If we didn't allow to pass NBMID SCC over to the java side, we would need to add 7 refs for each one of the BLx objects, totalling 56 refs!

NBMID is initially considered to be a visible to client color, because it has 8 xrefs_in and 8 xrefs_out (8x8 > 60). However, when computing the final xrefs for this color, NBR7 is not included (because it is a color that doesn't contain any bridges and it doesn't have enough links). It will instead be traversed, ending up with BR5 and BR6 xrefs that were already present. This means that NBMID will only have 7 xrefs_out and it would no longer satisfy the condition of being a color visible to client.

dotnet-policy-service · 2025-10-31T14:30:33Z

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Copilot

Pull Request Overview

This PR fixes a bug in the GC bridge processing where a color's visibility to the client could change during processing, causing incorrect behavior. The fix introduces a visibleToClient flag that pins a color as visible once detected, preventing it from becoming invisible later even if it loses xrefs.

Key changes:

Added a visibleToClient flag to ColorData structures in both CoreCLR and Mono runtimes
Modified ColorVisibleToClient/color_visible_to_client functions to cache visibility status
Added a test case BridgelessHeavyColorChanging to verify the fix using inline arrays

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
src/coreclr/gc/gcbridge.cpp	Added `visibleToClient` flag to `ColorData` and updated `ColorVisibleToClient` to cache visibility determination
src/mono/mono/metadata/sgen-tarjan-bridge.c	Added `visible_to_client` flag to `ColorData` and updated `color_visible_to_client` to cache visibility determination
src/tests/GC/Features/Bridge/Bridge.cs	Added `InlineData` struct, updated `NonBridge14` to use it, and added `BridgelessHeavyColorChanging` test method

src/coreclr/gc/gcbridge.cpp

filipnavara

Generally LGTM and makes it much easier to reason about the code, just one nit about the data structure layout.

lateralusX

LGTM!

BrzVlad · 2025-11-05T13:42:37Z

/ba-g android infra issue

A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client. This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has. Fixes assertions like: ``` * Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met ```

srxqds · 2025-11-10T08:20:14Z

Does the release/9.0 branch not have this impact?

BrzVlad · 2025-11-10T11:27:43Z

For .net9 I've backported only #121376. Rather that disabling tarjan gc bridge, on .net9 users can set MONO_GC_PARAMS=disable-non-bridge-scc

…121483) Backport #121247 and #121243 to release/10.0. This fixes assertions like ``` * Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met ``` ## Customer Impact - [x] Customer reported - [ ] Found internally Some applications on maui-android can randomly crash during GC, when using the default gc bridge (the tarjan bridge). We've had a few fixes for the tarjan bridge merged a few months ago, but there is still this one issue. The workaround used by customers is to fallback to an older GC bridge which has worse performance. For some this performance impact is not acceptable. This backport also fixes the same issue in the CoreCLR gcbridge implementation. CoreCLR doesn't have a fallback GC bridge implementation, so this fix is essential for the successful use of CoreCLR/NativeAOT on android, at least for some customers. ## Regression - [ ] Yes - [x] No ## Testing Tested on our own gc bridge tests, with scenario causing the issue. ## Risk The GC bridge is a sensitive area and fixes here typically have some carried risk. This fix however is quite straightforward, it simply pins the value of a property inside an SCC node, rather than having it recomputed with unstable value that was leading to problems. No changes are done to the core algorithm. Low risk.

BrzVlad added 3 commits October 31, 2025 12:42

[mono][sgen] Use color_visible_to_client method for clarity

cb73f6d

Does the exact same thing.

[tests] Add gc bridge test that is failing

6571c29

BrzVlad requested a review from steveisok as a code owner October 31, 2025 14:29

Copilot AI review requested due to automatic review settings October 31, 2025 14:29

BrzVlad requested a review from vitek-karas as a code owner October 31, 2025 14:29

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 31, 2025

dotnet-policy-service bot assigned BrzVlad Oct 31, 2025

BrzVlad added area-GC-mono and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Oct 31, 2025

BrzVlad requested a review from filipnavara October 31, 2025 14:29

Copilot AI reviewed Oct 31, 2025

View reviewed changes

src/coreclr/gc/gcbridge.cpp Outdated Show resolved Hide resolved

filipnavara reviewed Oct 31, 2025

View reviewed changes

src/coreclr/gc/gcbridge.cpp Show resolved Hide resolved

filipnavara reviewed Oct 31, 2025

View reviewed changes

Apply fixes to coreclr gcbridge

0e33c69

BrzVlad force-pushed the fix-gcbridge-non-bridge-scc branch from f16c430 to 0e33c69 Compare October 31, 2025 15:01

build-analysis bot mentioned this pull request Oct 31, 2025

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

lateralusX self-requested a review November 4, 2025 20:39

lateralusX approved these changes Nov 4, 2025

View reviewed changes

build-analysis bot mentioned this pull request Nov 5, 2025

Timeout in HostFactoryResolverTests.NoSpecialEntryPointPatternCanRunInParallel #114704

Open

BrzVlad merged commit fe17c2c into dotnet:main Nov 5, 2025
144 of 149 checks passed

dotnet-maestro bot mentioned this pull request Nov 6, 2025

[main] Source code updates from dotnet/runtime dotnet/dotnet#3258

Merged

BrzVlad mentioned this pull request Nov 10, 2025

[release/10.0] [mono][sgen] Make color visible to client permanent #121483

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mono][sgen] Make color visible to client permanent #121247

[mono][sgen] Make color visible to client permanent #121247

BrzVlad commented Oct 31, 2025

Uh oh!

BrzVlad commented Oct 31, 2025

Uh oh!

dotnet-policy-service bot commented Oct 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

filipnavara left a comment

Uh oh!

lateralusX left a comment

Uh oh!

BrzVlad commented Nov 5, 2025

Uh oh!

Uh oh!

srxqds commented Nov 10, 2025

Uh oh!

BrzVlad commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[mono][sgen] Make color visible to client permanent #121247

[mono][sgen] Make color visible to client permanent #121247

Conversation

BrzVlad commented Oct 31, 2025

Uh oh!

BrzVlad commented Oct 31, 2025

Uh oh!

dotnet-policy-service bot commented Oct 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

filipnavara left a comment

Choose a reason for hiding this comment

Uh oh!

lateralusX left a comment

Choose a reason for hiding this comment

Uh oh!

BrzVlad commented Nov 5, 2025

Uh oh!

Uh oh!

srxqds commented Nov 10, 2025

Uh oh!

BrzVlad commented Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants