Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xds fallback #11254

Merged
merged 3 commits into from
Dec 9, 2024
Merged

Xds fallback #11254

merged 3 commits into from
Dec 9, 2024

Conversation

larry-safran
Copy link
Contributor

@larry-safran larry-safran requested a review from ejona86 June 12, 2024 23:17
@ejona86 ejona86 requested a review from YifeiZhuang June 17, 2024 20:49
@larry-safran larry-safran marked this pull request as ready for review June 17, 2024 20:49
Copy link
Member

@YifeiZhuang YifeiZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good overall. I finally start to understand the spirit of it.

@larry-safran
Copy link
Contributor Author

Fallback is ready for review again.

Copy link
Member

@YifeiZhuang YifeiZhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have, mostly minor, I haven't looked deep enough.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have.

@ejona86

This comment was marked as resolved.

Copy link
Contributor Author

@larry-safran larry-safran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things done in commit 003348b

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have.

@larry-safran larry-safran requested a review from ejona86 November 11, 2024 23:49
@larry-safran larry-safran force-pushed the xds_fallback branch 2 times, most recently from f186005 to f304264 Compare November 21, 2024 23:32
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what I have.

subscriber.controlPlaneClient.adjustResourceSubscription(type);

CpcWithFallbackState cpcToUse = manageControlPlaneClient(subscriber);
if (cpcToUse.cpc != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cpc is null here because all cpcs are failing, then the subscriber won't be notified with an error. Although, is that preexisting? If it is preexisting, we can resolve that as a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the retry timer goes off and the CPC tries to reconnect, if it fails, then handleStreamClosed will call onError (the same as previously done). It seems like we shouldn't continue sending errors for every backoff, but that should probably be addressed in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had considered that, but retries can take 2 minutes. It is known to be failing at this very moment, so we shouldn't delay an error.

I think it is proper to update the error each backoff; the error can change over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't care about an extra call to onError, then it trivially becomes adding a call to subscriber.onError in the catch (IOException) of manageControlPlaneClient

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why manageControlPlaneClient(), which I've already said is broken to pass a subscriber, when we could just call onError() directly here? Nothing actually knows the error more than "no working cpcs." I was most interested in having an actionable error message, but having a call to onError() here would be fine except...

... we'd have to be careful about it to avoid #11672 . Let's not worry about it in this PR, because it is pre-existing, broken in multiple ways, and fallback is enough trouble by itself.

subscriber.controlPlaneClient.adjustResourceSubscription(type);

CpcWithFallbackState cpcToUse = manageControlPlaneClient(subscriber);
if (cpcToUse.cpc != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had considered that, but retries can take 2 minutes. It is known to be failing at this very moment, so we shouldn't delay an error.

I think it is proper to update the error each backoff; the error can change over time.

@larry-safran
Copy link
Contributor Author

larry-safran commented Nov 28, 2024 via email

@ejona86
Copy link
Member

ejona86 commented Nov 28, 2024

The FailingXdsTransport seems like the cleanest approach, though I think
we need a FailingXdsStreamingCall to go with it.

Yeah, it implies a FailingXdsStreamingCall as well.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we don't need an executor in the FailingXdsTransport because the synchronization context will prevent reentrancy.

I'm still not all that comfortable with manageControlPlaneClient() being passed a Subscriber. But it looks like it will behave okay, even if it is misleading, so this can go in without that changed.

@larry-safran larry-safran merged commit 210f9c0 into grpc:master Dec 9, 2024
15 of 16 checks passed
@larry-safran larry-safran deleted the xds_fallback branch December 9, 2024 23:42
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants