Skip to content

Conversation

@toddbaert
Copy link
Member

@toddbaert toddbaert commented Dec 10, 2025

This PR adds an experimental option which rebuilds the entire gRPC connection if the provider goes into error state (after grace period). This can be useful for troublehsooting network issues, especially in the context of service meshes/proxies.

I added unit tests, but I also tested locally, and it works as expected. The feature can be enabled through config, but also with an env var: FLAGD_REINITIALIZE_ON_ERROR. It defaults to off.

@gemini-code-assist
Copy link

Summary of Changes

Hello @toddbaert, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust mechanism to automatically rebuild gRPC connections when the provider encounters an error, particularly useful for maintaining stability in environments with intermittent network issues or complex service mesh configurations. The feature is opt-in, providing administrators with a powerful tool for enhanced resilience without altering default behavior.

Highlights

  • New Configuration Option: Introduced a new reinitializeOnError boolean option in FlagdOptions that, when enabled, triggers a full gRPC connection rebuild if the provider enters an error state after its grace period. This option is off by default and can be configured via code or an environment variable (FLAGD_REINITIALIZE_ON_ERROR).
  • Connection Reinitialization Logic: Implemented the core logic for reinitializing gRPC channel components within SyncStreamQueueSource. This includes a new initializeChannelComponents method to set up the channel and stubs, and a reinitializeChannelComponents method that creates new components and gracefully shuts down the old ones when an error occurs and the option is enabled.
  • Error Handling Integration: Integrated the new reinitialization capability into InProcessResolver's onError method. This method now checks if the reinitializeOnError option is active and, if so, calls the reinitializeChannelComponents method on the underlying SyncStreamQueueSource.
  • Unit Tests: Added comprehensive unit tests for both InProcessResolver and SyncStreamQueueSource to verify that the reinitializeOnError functionality behaves as expected, ensuring reinitialization occurs only when enabled and not when disabled or shut down.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for rebuilding the gRPC connection on error, which can significantly aid in troubleshooting network issues. The implementation is generally well-done, with corresponding unit tests. However, I've identified a potential race condition related to thread safety during connection re-initialization, a minor logic issue in the new error handler, and some improvements for the new tests to make them more robust. My detailed comments and suggestions are provided below.

Signed-off-by: Todd Baert <[email protected]>
@toddbaert
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature to rebuild the gRPC connection on error, enhancing the provider's resilience in unstable network environments. The implementation is well-designed, using a GrpcComponents wrapper for atomic updates and proper synchronization to ensure thread safety. The feature is configurable and disabled by default, which is a sensible choice. The unit tests are thorough in covering the new logic. I've pointed out a couple of potential improvements in the test suite to prevent a possible NullPointerException and a resource leak. Overall, this is a strong contribution that improves the robustness of the flagd provider.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Todd Baert <[email protected]>
Signed-off-by: Todd Baert <[email protected]>

try {
// create new channel components first
initializeChannelComponents();
Copy link
Member

@guidobrei guidobrei Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call grpcComponents.channelConnector.initialize(); after reinitializing the GrpcComponents object, like we do in the init() method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good eye... but... I think I discovered something earlier about this that I lost track of and forgot to mention:

I think the initialize method in the ChannelConnector is actually very well-hidden dead code. I noticed earlier that it basically ONLY sets up a channel monitor that emits events for channel disconnections, but we already will get such events from the stream handler itself.

I think this is why my manual testing had no issue not including the call you suggest above, and same with our e2e test suite. In fact, if I completely comment out all the functionality of the ChannelConnector.initialize method, our entire e2e suite runs fine, including the assertion that we get disconnect events and other stream events.

Unless I'm missing something which we have no tests for, I think this method can be deleted.

cc @aepfli do you know anything about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants