-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rootful podman and 'externalTrafficPolicy: Local' won't play nicely together #184
Comments
some additional information:
|
It seems this is some environmental problem, can't say if is general with podman or from some specific environments, bear in mind these setups are running in the kubernetes ci without failures https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20loadbalancer&include-filter-by-regex=LoadBalancer so See this error, the podman dns provider is not working fine, it must resolve the pod name, the startup problem is because of that
for the external traffic policy local this is indeed really weird
I will look into the container that is created as a loadbalancer and paste the logs, |
fedora uses systemd-resolved as replacement for the /etc/resolv.conf mechanism. maybe this has an impact? |
I think we can improve that, right now is a very basic serial check cloud-provider-kind/pkg/controller/controller.go Lines 121 to 171 in 0ca7080
we can run both checks in parallel and choose the one that wins the race, I will open a feature request to fix this , otherwise is hard to follow this issue if we have multiple topics |
one more thing: running the cluster and checking e.g. the worker3
and doing a dns query on the host like so
results in
But a direct query on the host (or curl in another container , e.g. 'registry.k8s.io/e2e-test-images/agnhost') does not resolve this domain-name. Or (as i solved it in another project): Maybe 'cloud-provider-kind' should do sth like this? (besides doing it concurrently, which is always nice) |
seems i was on the right track: The container name resolution conundrum In the article, there is a solution using a config and an explicit default network, so that dns resolution works for rootful containers running in parallel. So... wouldn't it be easier for kind to just establish such an explicit, named network and cloud-provider-kind attaches to this network as well - at least to figure out the dns ip to resolve |
this is how it works already, kind creates its own network
that looks very tightly coupled to Fedora, and is not likely going to work across OSs and platforms, besides, we don't want to define different behaviors for docker and podman Podman is not stable enough, is still experimental in Kind and as consequence here because of the history of compatibility issues or breaking changes, though things improved a lot lately https://github.com/kubernetes-sigs/kind/issues?q=is%3Aissue+is%3Aopen+label%3Aarea%2Fprovider%2Fpodman+ |
yes, i see: the 'kind' network,
gives you the ip of the control-plane to connect to. This sounds like a nice workaround? |
I honestly really appreciate you taking all this time exploring these options, is that we had bad experiences assuming the host can contact the containers directly, that is why if is running in the host uses the portmap endpoint that is stable ... The problem is that if cloud-provider-kind runs in a container we have the inverse problem , it is slow to start until it gives up ... polling both endpoints, the local and the external in parallel solves the problem for both scenarios IMHO and does not depend on execing or the kind network (some users modify it with an env variable) |
just for clarification: But yes, working in parallel is also fine... |
another update: i installed docker 'Docker version 27.3.1, build 2.fc41' and tried The problem remains
|
some other strange thing (on docker):
now i'm out of ideas... |
what version of cloud provider kind are you using? it seems the latest release has problems with podman #186 (comment) If that is the case I need to make a release ASAP, can you try building from source? |
it's the latest one, 0.4.0 |
regarding #186, yes, the port80 does not work for podman. |
i pulled the current main branch and compiled it with the dockerfile provided and extracted the cloud-provider-kind from the container. (running in docker) First try with
with 'local'
i would say no change for the 'local' setting |
using this new binary in podman and an ingress controller has improved somewhat: i don't get any more connection-refused with curl. I'm doing this:
and then
get pods (ns default):
ns ingress-nginx:
Result:
|
ok, two things then, I need to get the fix for the slow start and cut a new release we need to figure out what is happening with that environment, using the etp local, find the container that emulates the loadbalancer and upload the logs |
anything i can provide besides this:
whis is a veeery long query-time?? (4secs?) |
|
using the example above with ingress, here are the logs of a freshly created podman cluster,
but most of the time it hangs until Ctrl-C is pressed |
I'm using your example from
examples/loadbalancer_etp_local.yaml
with kind running in rootful podman.Similar to #143 i see a long startup delay when running (also rootful)
cloud-provider-kind
withwhich is a bit annoying, but works fine after this
But, this is not the real problem:
Just trying the above example fails when trying to connect with curl:
This is indepentend (as suggested in another closed-bugreport here) of using
What is important is to change the service-configuration and delete the
externalTrafficPolicy: Local
line so it looks like this:which results in
without any timeouts/connection resets
Two questions:
The text was updated successfully, but these errors were encountered: