-
Notifications
You must be signed in to change notification settings - Fork 934
Fix the intermittent segfault issue due to GC in Admin tests #2129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
🎉 All Contributor License Agreements have been signed. Ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses a segmentation fault issue in Admin tests by disabling Python's garbage collector during callback execution in librdkafka's background threads. The fix prevents AdminClient objects from being destroyed by the garbage collector while running in librdkafka's threads, which is forbidden by the library.
Key Changes:
- Added garbage collector disabling/enabling logic around callback execution in multiple admin result handlers
- Removed extraneous blank lines for code cleanup
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import gc | ||
| gc_was_enabled = gc.isenabled() | ||
| gc.disable() |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Importing gc inside each callback function creates unnecessary overhead when these callbacks are invoked repeatedly. Consider moving the import gc statement to the module level at the top of the file to avoid repeated import lookups.
| # Disable GC during callback to prevent AdminClient destruction from librdkafka thread. | ||
| # This callback runs in librdkafka's background thread, and if GC runs here, it may | ||
| # try to destroy AdminClient objects, which librdkafka forbids from its own threads. |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detailed explanation of why GC is disabled should be added to all other callback methods that use the same pattern. Currently, only _make_consumer_group_offsets_result has this documentation, while _make_topics_result, _make_resource_result, _make_consumer_groups_result, _make_acls_result, _make_futmap_result_from_list, and _make_futmap_result lack explanation for the same GC disabling logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point by copilot. Maybe we put to desription at the import and a small comment at disable saying see import comment for preventing null resource segfault errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point!
| """ | ||
| try: | ||
| import gc | ||
|
|
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The blank line at 237 between the import statement and the explanatory comment is inconsistent with the formatting in other methods where the GC setup code follows immediately after the import. Consider removing this blank line for consistency.
| # Disable GC during callback to prevent AdminClient destruction from librdkafka thread. | ||
| # This callback runs in librdkafka's background thread, and if GC runs here, it may | ||
| # try to destroy AdminClient objects, which librdkafka forbids from its own threads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point by copilot. Maybe we put to desription at the import and a small comment at disable saying see import comment for preventing null resource segfault errors?
629643e to
6cafc2c
Compare
1246593 to
f30945f
Compare
|


What
Bug:
test_list_consumer_group_offsets_api) finishes but after that the callback function (_make_consumer_group_offsets_result), running in the librdkafka's background, triggers garbage collection. When librdkafka detects thatrd_kafka_destroyis invoked from its own thread (https://github.com/confluentinc/librdkafka/blob/616fb6e8074e82e0302f1c7d2896d7e577f8eb9b/src/rdkafka.c#L1115), it sends the ABORT signalFIx:
gc.disable()in tests that make fire-and-forget callsChecklist
References
JIRA: https://confluentinc.atlassian.net/browse/NONJAVACLI-4105
Test & Review
Open questions / Follow-ups