Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

udp_proxy_filter: Fix crash when accepting dynamic cds for UDP. Fixes #34195. #37151

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

railwaycat
Copy link
Contributor

Commit Message: udp_proxy_filter: Fix crash when accepting dynamic cds for UDP. Fixes #34195.
Additional Description:
Please see #34195 and #26206 for details. This change fixes a usage of cluster info resource after the udp cluster be destroyed.

I've got an approval from envoy-security list for the fix.
Risk Level: Low
Testing: mentioned in both tickets
Docs Changes: N/A
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

Copy link

Hi @railwaycat, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #37151 was opened by railwaycat.

see: more, trace.

@yanavlasov
Copy link
Contributor

@railwaycat can you add tests that validate this condition, please?

/wait-any

@soulxu
Copy link
Member

soulxu commented Nov 17, 2024

/assign @yanavlasov

@yanavlasov
Copy link
Contributor

/wait

@railwaycat
Copy link
Contributor Author

Hi @yanavlasov Sorry for the delay! I updated the UDP Proxy filter test case ClusterDynamicInfoMapUpdate to cover the case that an UDP cluster with active sessions be removed.

@yanavlasov
Copy link
Contributor

Is the issue that the cluster_ reference becomes dangling when CDS updates the cluster?

If so then this change fixes the problem only partially. There are still uses of the cluster_ reference not related to the cluster info. Lie this:

Upstream::HostConstSharedPtr host = cluster_.loadBalancer().chooseHost(&context);

These will crash as well.

Looking at the code, to support dynamic clusters UDP proxy must avoid "capturing" thread local cluster at the filter creation reference and instead lookup the thread local cluster then it is needed and deal with the possibility that it may be not present. Similar to what the router filter does. I understand it is a much bigger change, however this smaller change does not address the underlying cause and may still produce the crash.

/wait-any

@railwaycat
Copy link
Contributor Author

This PR fixes the misusing of cluster_.info() more than fix the dangling cluster_ reference.

By the description in this comment

* If information about the cluster needs to be kept, use the ThreadLocalCluster::info() method to
store the pointer to cluster information for using later is safe.

Before this PR, UDP cluster uses a stored cluster_ reference to access cluster information by info() method which is not safe and will result a crash when cluster_ no longer valid. This PR replaces it with a stored cluster information pointer and avoid using any cluster_ reference, as the comment suggests.

I agree the dangling cluster_ in UDP cluster is a bigger issue but we may fix it in another PR.

@yanavlasov yanavlasov merged commit 19dcf98 into envoyproxy:main Dec 13, 2024
25 checks passed
@aojea
Copy link

aojea commented Dec 13, 2024

@yanavlasov not familiar with envoy releases, can you tell me which releases will get this fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envoy crash using CDS with dynamic configuration from filesystem
4 participants