-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[mono][sgen] Make color visible to client permanent #121247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does the exact same thing.
A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client. This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.
graph TD;
BL0-->NBMID;
BL1-->NBMID;
BL2-->NBMID;
BL3-->NBMID;
BL4-->NBMID;
BL5-->NBMID;
BL6-->NBMID;
BL7-->NBMID;
NBMID-->BR0;
NBMID-->BR1;
NBMID-->BR2;
NBMID-->BR3;
NBMID-->BR4;
NBMID-->BR5;
NBMID-->NBR7;
NBMID-->BR6;
NBR7-->BR5;
NBR7-->BR6;
Consider the following graph, that is identical to the one from the added testcase. B prefix is for bridge objects, NB is for normal objects. Every object is an SCC in this scenario. The optimization passing SCCs containing non bridge objects is meant to prevent the addition of excessive links on the java side. This graph leads to the addition of 8+7 = 15 refs on java. If we didn't allow to pass
|
|
Tagging subscribers to this area: @BrzVlad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in the GC bridge processing where a color's visibility to the client could change during processing, causing incorrect behavior. The fix introduces a visibleToClient flag that pins a color as visible once detected, preventing it from becoming invisible later even if it loses xrefs.
Key changes:
- Added a
visibleToClientflag toColorDatastructures in both CoreCLR and Mono runtimes - Modified
ColorVisibleToClient/color_visible_to_clientfunctions to cache visibility status - Added a test case
BridgelessHeavyColorChangingto verify the fix using inline arrays
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/gc/gcbridge.cpp | Added visibleToClient flag to ColorData and updated ColorVisibleToClient to cache visibility determination |
| src/mono/mono/metadata/sgen-tarjan-bridge.c | Added visible_to_client flag to ColorData and updated color_visible_to_client to cache visibility determination |
| src/tests/GC/Features/Bridge/Bridge.cs | Added InlineData struct, updated NonBridge14 to use it, and added BridgelessHeavyColorChanging test method |
filipnavara
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM and makes it much easier to reason about the code, just one nit about the data structure layout.
f16c430 to
0e33c69
Compare
lateralusX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
/ba-g android infra issue |
A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client. This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has. Fixes assertions like: ``` * Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met ```
|
Does the release/9.0 branch not have this impact? |
|
For .net9 I've backported only #121376. Rather that disabling tarjan gc bridge, on .net9 users can set |
…121483) Backport #121247 and #121243 to release/10.0. This fixes assertions like ``` * Assertion at /home/vbrezae/Xamarin/repos/runtime/src/mono/mono/metadata/sgen-tarjan-bridge.c:1151, condition `color_visible_to_client (cd)' not met ``` ## Customer Impact - [x] Customer reported - [ ] Found internally Some applications on maui-android can randomly crash during GC, when using the default gc bridge (the tarjan bridge). We've had a few fixes for the tarjan bridge merged a few months ago, but there is still this one issue. The workaround used by customers is to fallback to an older GC bridge which has worse performance. For some this performance impact is not acceptable. This backport also fixes the same issue in the CoreCLR gcbridge implementation. CoreCLR doesn't have a fallback GC bridge implementation, so this fix is essential for the successful use of CoreCLR/NativeAOT on android, at least for some customers. ## Regression - [ ] Yes - [x] No ## Testing Tested on our own gc bridge tests, with scenario causing the issue. ## Risk The GC bridge is a sensitive area and fixes here typically have some carried risk. This fix however is quite straightforward, it simply pins the value of a property inside an SCC node, rather than having it recomputed with unstable value that was leading to problems. No changes are done to the core algorithm. Low risk.
A color (SCC) that isn't containing any bridge objects is made visible to client if xrefs_in * xrefs_out is greater than 60. Later on in bridge processing, we need to build the callback to pass for .net android. During this stage, we reduce from the full set of SCCs to SCCs that should be visible to client (containing bridge objects or satisfying the above condition). If an SCC has an xref to a color that is not visible to client, we need to do a recursive traversal to find all neighbors that are visible to client. The problem is that this process can end up making an SCC no longer visible to client, leading to inconsistencies in the computation. Consider a color(C1) that has a neighbor that is not visible to client(C2). In this final stage, we compute the neighbors of C1 by traversing recursively through the neighbors of C2. If C2 ends up pointing to colors that were already neighbors of C1, then, following this computation, C1 would end up with fewer xrefs_out, making the color no longer visible to client. This make future checks incorrect, resulting in building incorrect graph for client.
This scenario seems rare in practice, we should have gotten way more reports otherwise. We fix this by pinning the visible_to_client property for a color once it first satisfies it, so it will no longer matter how many actual xrefs the color has.
Fixes assertions like: