Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootful podman and 'externalTrafficPolicy: Local' won't play nicely together #184

Open
slartibart70 opened this issue Jan 12, 2025 · 21 comments

Comments

@slartibart70
Copy link

I'm using your example from examples/loadbalancer_etp_local.yaml with kind running in rootful podman.

Similar to #143 i see a long startup delay when running (also rootful) cloud-provider-kind with

I0112 23:42:27.867429  156479 app.go:46] FLAG: --enable-lb-port-mapping="false"
I0112 23:42:27.867450  156479 app.go:46] FLAG: --enable-log-dumping="false"
I0112 23:42:27.867452  156479 app.go:46] FLAG: --logs-dir=""
I0112 23:42:27.867455  156479 app.go:46] FLAG: --v="2"
enabling experimental podman provider
I0112 23:42:28.163500  156479 controller.go:174] probe HTTP address https://local-cluster-control-plane:6443
I0112 23:42:33.163722  156479 controller.go:177] Failed to connect to HTTP address https://local-cluster-control-plane:6443: Get "https://local-cluster-control-plane:6443": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0112 23:42:33.163778  156479 controller.go:174] probe HTTP address https://local-cluster-control-plane:6443
I0112 23:42:35.810295  156479 controller.go:177] Failed to connect to HTTP address https://local-cluster-control-plane:6443: Get "https://local-cluster-control-plane:6443": dial tcp: lookup local-cluster-control-plane on 127.0.0.53:53: server misbehaving
I0112 23:42:36.810510  156479 controller.go:174] probe HTTP address https://local-cluster-control-plane:6443
I0112 23:42:41.810857  156479 controller.go:177] Failed to connect to HTTP address https://local-cluster-control-plane:6443: Get "https://local-cluster-control-plane:6443": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
I0112 23:42:43.812765  156479 controller.go:174] probe HTTP address https://local-cluster-control-plane:6443
I0112 23:42:44.560344  156479 controller.go:177] Failed to connect to HTTP address https://local-cluster-control-plane:6443: Get "https://local-cluster-control-plane:6443": dial tcp: lookup local-cluster-control-plane on 127.0.0.53:53: server misbehaving
I0112 23:42:47.561729  156479 controller.go:174] probe HTTP address https://local-cluster-control-plane:6443
I0112 23:42:52.562623  156479 controller.go:177] Failed to connect to HTTP address https://local-cluster-control-plane:6443: Get "https://local-cluster-control-plane:6443": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

which is a bit annoying, but works fine after this

I0112 23:42:57.083710  156479 controller.go:174] probe HTTP address https://127.0.0.1:41299
I0112 23:42:57.087242  156479 controller.go:84] Creating new cloud provider for cluster local-cluster
I0112 23:42:57.091554  156479 controller.go:91] Starting cloud controller for cluster local-cluster

But, this is not the real problem:

Just trying the above example fails when trying to connect with curl:

for x in {1..10}; do curl '10.89.0.8:1234'; echo; done
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-12 23:34:59.242585578 +0000 UTC m=+312.772381806
NOW: 2025-01-12 23:34:59.249477323 +0000 UTC m=+312.779273551
curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-12 23:35:04.264640005 +0000 UTC m=+317.794436263
NOW: 2025-01-12 23:35:04.270987827 +0000 UTC m=+317.800784075
curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-12 23:35:09.285931664 +0000 UTC m=+322.815727922

This is indepentend (as suggested in another closed-bugreport here) of using

./cloud-provider-kind -enable-lb-port-mapping  ## or
./cloud-provider-kind 

What is important is to change the service-configuration and delete the externalTrafficPolicy: Local line so it looks like this:

apiVersion: v1
kind: Service
metadata:
  name: lb-service-local
spec:
  type: LoadBalancer
  selector:
    app: MyLocalApp
  ports:
    - protocol: TCP
      port: 1234
      targetPort: 8080

which results in

for x in {1..10}; do curl '10.89.0.10:1234'; echo; done
NOW: 2025-01-12 23:54:49.434342214 +0000 UTC m=+1016.595292426
NOW: 2025-01-12 23:54:49.440171472 +0000 UTC m=+1016.601121644
NOW: 2025-01-12 23:54:49.445746617 +0000 UTC m=+1016.606696799
NOW: 2025-01-12 23:54:49.450510643 +0000 UTC m=+1016.611460825
NOW: 2025-01-12 23:54:49.454518322 +0000 UTC m=+1016.615468504
NOW: 2025-01-12 23:54:49.458547032 +0000 UTC m=+1016.619497214
NOW: 2025-01-12 23:54:49.462695198 +0000 UTC m=+1016.623645380
NOW: 2025-01-12 23:54:49.466576808 +0000 UTC m=+1016.627526990
NOW: 2025-01-12 23:54:49.47130217 +0000 UTC m=+1016.632252332
NOW: 2025-01-12 23:54:49.476165424 +0000 UTC m=+1016.637115606

without any timeouts/connection resets

Two questions:

  • can you improve the startup-behaviour of cloud-provider-kind (e.g. reduce the timeouts)?
  • why is the externalTrafficPolicy so unreliable?
@slartibart70
Copy link
Author

some additional information:

podman version
Client:       Podman Engine
Version:      5.3.1
API Version:  5.3.1
Go Version:   go1.23.3
Built:        Thu Nov 21 01:00:00 2024
OS/Arch:      linux/amd64
Operating System: Fedora Linux 41
Kernel Version: 6.12.9-200.fc41.x86_64 (64-bit)

@aojea
Copy link
Contributor

aojea commented Jan 13, 2025

It seems this is some environmental problem, can't say if is general with podman or from some specific environments, bear in mind these setups are running in the kubernetes ci without failures https://testgrid.k8s.io/sig-network-kind#sig-network-kind,%20loadbalancer&include-filter-by-regex=LoadBalancer so

See this error, the podman dns provider is not working fine, it must resolve the pod name, the startup problem is because of that

lookup local-cluster-control-plane on 127.0.0.53:53: server misbehaving

for the external traffic policy local this is indeed really weird

for x in {1..10}; do curl '10.89.0.8:1234'; echo; done
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-12 23:34:59.242585578 +0000 UTC m=+312.772381806
NOW: 2025-01-12 23:34:59.249477323 +0000 UTC m=+312.779273551
curl: (56) Recv failure: Connection reset by peer

I will look into the container that is created as a loadbalancer and paste the logs,

@slartibart70
Copy link
Author

fedora uses systemd-resolved as replacement for the /etc/resolv.conf mechanism.
see: Changes/systemd-resolved

maybe this has an impact?

@aojea
Copy link
Contributor

aojea commented Jan 13, 2025

I think we can improve that, right now is a very basic serial check

// prefer internal (direct connectivity) over no-internal (commonly portmap)
for _, internal := range []bool{true, false} {
kconfig, err := c.kind.KubeConfig(cluster, internal)
if err != nil {
klog.Errorf("Failed to get kubeconfig for cluster %s: %v", cluster, err)
continue
}
config, err := clientcmd.RESTConfigFromKubeConfig([]byte(kconfig))
if err != nil {
klog.Errorf("Failed to convert kubeconfig for cluster %s: %v", cluster, err)
continue
}
// check that the apiserver is reachable before continue
// to fail fast and avoid waiting until the client operations timeout
var ok bool
for i := 0; i < 5; i++ {
select {
case <-ctx.Done():
return nil, ctx.Err()
default:
}
if probeHTTP(httpClient, config.Host) {
ok = true
break
}
time.Sleep(time.Second * time.Duration(i))
}
if !ok {
klog.Errorf("Failed to connect to apiserver %s: %v", cluster, err)
continue
}
kubeClient, err := kubernetes.NewForConfig(config)
if err != nil {
klog.Errorf("Failed to create kubeClient for cluster %s: %v", cluster, err)
continue
}
// the first cluster will give us the type of connectivity between
// cloud-provider-kind and the clusters and load balancer containers.
// In Linux or containerized cloud-provider-kind this will be direct.
once.Do(func() {
if internal {
cpkconfig.DefaultConfig.ControlPlaneConnectivity = cpkconfig.Direct
}
})
return kubeClient, err
}
return nil, fmt.Errorf("can not find a working kubernetes clientset")
}

we can run both checks in parallel and choose the one that wins the race, I will open a feature request to fix this , otherwise is hard to follow this issue if we have multiple topics

@slartibart70
Copy link
Author

one more thing:

running the cluster and checking e.g. the worker3 /etc/resolv.conf i find

root@local-cluster-worker3:/# cat /etc/resolv.conf
search dns.podman
nameserver fc00:f853:ccd:e793::1
nameserver 10.89.0.1

and doing a dns query on the host like so

dig @10.89.0.1 local-cluster-control-plane

results in

;; ANSWER SECTION:
local-cluster-control-plane. 0  IN      A       10.89.0.16
local-cluster-control-plane. 0  IN      A       10.89.0.16

But a direct query on the host (or curl in another container , e.g. 'registry.k8s.io/e2e-test-images/agnhost') does not resolve this domain-name.
I think (to be verified) that those containers need to share a common network to communicate with each other?

Or (as i solved it in another project):
I started a small alpine container just for querying the dns-IP (see above: 10.89.0.1) and then i was using this for getting the proper IP addresses of the containers i want to reach by dns-name (by. e.g. using 'dig' or similar)

Maybe 'cloud-provider-kind' should do sth like this? (besides doing it concurrently, which is always nice)

@slartibart70
Copy link
Author

seems i was on the right track:

The container name resolution conundrum

In the article, there is a solution using a config and an explicit default network, so that dns resolution works for rootful containers running in parallel.

So... wouldn't it be easier for kind to just establish such an explicit, named network and cloud-provider-kind attaches to this network as well - at least to figure out the dns ip to resolve local-cluster-control-plane in a small alpine container and then using this ip to connect to the control-plane from the host?

@aojea
Copy link
Contributor

aojea commented Jan 13, 2025

So... wouldn't it be easier for kind to just establish such an explicit, named network and cloud-provider-kind attaches to this network as well

this is how it works already, kind creates its own network

The container name resolution conundrum

In the article, there is a solution using a config and an explicit default network, so that dns resolution works for rootful containers running in parallel.

that looks very tightly coupled to Fedora, and is not likely going to work across OSs and platforms, besides, we don't want to define different behaviors for docker and podman

Podman is not stable enough, is still experimental in Kind and as consequence here because of the history of compatibility issues or breaking changes, though things improved a lot lately

https://github.com/kubernetes-sigs/kind/issues?q=is%3Aissue+is%3Aopen+label%3Aarea%2Fprovider%2Fpodman+
kubernetes-sigs/kind#1778

@slartibart70
Copy link
Author

slartibart70 commented Jan 13, 2025

this is how it works already, kind creates its own network

yes, i see: the 'kind' network,
and a

podman run --rm -it --net kind --entrypoint dig  registry.k8s.io/e2e-test-images/agnhost:2.39 local-cluster-control-plane

gives you the ip of the control-plane to connect to.

This sounds like a nice workaround?
(and it is compatible with docker)

@aojea
Copy link
Contributor

aojea commented Jan 13, 2025

This sounds like a nice workaround?
(and it is compatible with docker)

I honestly really appreciate you taking all this time exploring these options, is that we had bad experiences assuming the host can contact the containers directly, that is why if is running in the host uses the portmap endpoint that is stable ...

The problem is that if cloud-provider-kind runs in a container we have the inverse problem , it is slow to start until it gives up ... polling both endpoints, the local and the external in parallel solves the problem for both scenarios IMHO and does not depend on execing or the kind network (some users modify it with an env variable)

@slartibart70
Copy link
Author

slartibart70 commented Jan 13, 2025

just for clarification:
i don't want cloud-provider-kind to run in a container, but it should start a small one at startup just to get the dns-server ip, kill the small container and continue to poll for the control-plane.

But yes, working in parallel is also fine...

@slartibart70
Copy link
Author

slartibart70 commented Jan 13, 2025

another update:

i installed docker 'Docker version 27.3.1, build 2.fc41' and tried examples/loadbalancer_etp_local.yaml again (with and without 'local' setting)

The problem remains

or x in {1..20}; do curl 172.18.0.6:1234;echo; done
NOW: 2025-01-13 17:23:42.619708764 +0000 UTC m=+33.520976230
NOW: 2025-01-13 17:23:42.625958759 +0000 UTC m=+33.527226175
NOW: 2025-01-13 17:23:42.631395711 +0000 UTC m=+33.532663147
NOW: 2025-01-13 17:23:42.636923195 +0000 UTC m=+33.538190611
NOW: 2025-01-13 17:23:42.641550529 +0000 UTC m=+33.542817965
NOW: 2025-01-13 17:23:42.64692372 +0000 UTC m=+33.548191156
NOW: 2025-01-13 17:23:42.651135315 +0000 UTC m=+33.552402751
NOW: 2025-01-13 17:23:42.657681312 +0000 UTC m=+33.558948728
NOW: 2025-01-13 17:23:42.663691543 +0000 UTC m=+33.564958979
NOW: 2025-01-13 17:23:42.668528716 +0000 UTC m=+33.569796142

and with 'local'

for x in {1..20}; do curl 172.18.0.6:1234;echo; done
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-13 17:24:19.792268609 +0000 UTC m=+70.693536066
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-13 17:24:39.832977084 +0000 UTC m=+90.734244510
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-13 17:24:49.855203591 +0000 UTC m=+100.756471017
curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-13 17:24:54.869019645 +0000 UTC m=+105.770287071

@slartibart70
Copy link
Author

some other strange thing (on docker):
Independent of the 'local' setting, if i try this (kind-loadbalancer) deployment, this works fine:

for x in {1..30}; do curl 172.18.0.7:5678;echo; done
foo-app
bar-app
foo-app
foo-app
foo-app
bar-app
...

now i'm out of ideas...
any help is appreciated

@aojea
Copy link
Contributor

aojea commented Jan 14, 2025

what version of cloud provider kind are you using? it seems the latest release has problems with podman #186 (comment)

If that is the case I need to make a release ASAP, can you try building from source?

@slartibart70
Copy link
Author

it's the latest one, 0.4.0
running on kind 0.26.0 with a control-node and 3 worker nodes

@slartibart70
Copy link
Author

regarding #186, yes, the port80 does not work for podman.
Using docker and cloud-provider-kind is ok (i tried with an ingress and can confirm it's working)

@slartibart70
Copy link
Author

slartibart70 commented Jan 14, 2025

i pulled the current main branch and compiled it with the dockerfile provided and extracted the cloud-provider-kind from the container. (running in docker)

First try with loadbalancer_etp_local.yaml
without 'externalTrafficPolicy: Local'

for x in {1..30}; do curl 172.18.0.7:1234;echo; done
NOW: 2025-01-14 15:37:20.198382831 +0000 UTC m=+58.561236727
NOW: 2025-01-14 15:37:20.20495797 +0000 UTC m=+58.567811836
NOW: 2025-01-14 15:37:20.209132199 +0000 UTC m=+58.571986055
NOW: 2025-01-14 15:37:20.213799064 +0000 UTC m=+58.576652920
NOW: 2025-01-14 15:37:20.218413769 +0000 UTC m=+58.581267625
NOW: 2025-01-14 15:37:20.223519598 +0000 UTC m=+58.586373454
NOW: 2025-01-14 15:37:20.228220407 +0000 UTC m=+58.591074273
NOW: 2025-01-14 15:37:20.233263015 +0000 UTC m=+58.596116871
NOW: 2025-01-14 15:37:20.240116082 +0000 UTC m=+58.60296994

with 'local'

for x in {1..30}; do curl 172.18.0.7:1234;echo; done
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-14 15:38:20.277043755 +0000 UTC m=+118.639897611
curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

curl: (56) Recv failure: Connection reset by peer

NOW: 2025-01-14 15:38:35.309241583 +0000 UTC m=+133.672095429
NOW: 2025-01-14 15:38:35.314381948 +0000 UTC m=+133.677235814

i would say no change for the 'local' setting

@slartibart70
Copy link
Author

using this new binary in podman and an ingress controller has improved somewhat: i don't get any more connection-refused with curl.

I'm doing this:

kubectl apply -f https://kind.sigs.k8s.io/examples/ingress/deploy-ingress-nginx.yaml
kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=90s
kubectl apply -f https://kind.sigs.k8s.io/examples/ingress/usage.yaml

and then

LOADBALANCER_IP=$(kubectl get services \
   --namespace ingress-nginx \
   ingress-nginx-controller \
   --output jsonpath='{.status.loadBalancer.ingress[0].ip}')

curl ${LOADBALANCER_IP}/foo;echo
curl ${LOADBALANCER_IP}/bar;echo

get pods (ns default):

NAME      READY   STATUS    RESTARTS   AGE     IP           NODE                    NOMINATED NODE   READINESS GATES
bar-app   1/1     Running   0          5m18s   10.244.2.4   local-cluster-worker2   <none>           <none>
foo-app   1/1     Running   0          5m18s   10.244.3.4   local-cluster-worker    <none>           <none>

ns ingress-nginx:

NAME                                       READY   STATUS      RESTARTS        AGE     IP           NODE                    NOMINATED NODE   READINESS GATES
ingress-nginx-admission-create-xvjl8       0/1     Completed   0               7m27s   10.244.3.2   local-cluster-worker    <none>           <none>
ingress-nginx-admission-patch-j2xm2        0/1     Completed   0               7m27s   10.244.2.3   local-cluster-worker2   <none>           <none>
ingress-nginx-controller-696d4c4c5-dlmsc   1/1     Running     1 (5m12s ago)   7m27s   10.244.3.3   local-cluster-worker    <none>           <none>
get service  -n ingress-nginx
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.96.174.136   10.89.0.57    80:31535/TCP,443:32224/TCP   9m25s
ingress-nginx-controller-admission   ClusterIP      10.96.27.195    <none>        443/TCP                      9m25s

Result:

curl 10.89.0.57/bar
bar-app    ## works ok

curl 10.89.0.57/foo     ## hangs indefinitely

@aojea
Copy link
Contributor

aojea commented Jan 14, 2025

ok, two things then, I need to get the fix for the slow start and cut a new release

we need to figure out what is happening with that environment, using the etp local, find the container that emulates the loadbalancer and upload the logs

@slartibart70
Copy link
Author

anything i can provide besides this:

kc exec pod/foo-app -- cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.3.4      foo-app

$ kc exec pod/bar-app -- cat /etc/hosts
# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
10.244.2.4      bar-app

kc exec pod/bar-app -- dig foo-app
; Query time: 3979 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)

kc exec pod/foo-app -- dig foo-app
;; Query time: 3813 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)

whis is a veeery long query-time?? (4secs?)

@aojea
Copy link
Contributor

aojea commented Jan 14, 2025

podman ps will give you the list of containers, there are some containers named kindccm-ASDASDASD that are the loadbalancers,we need those logs

@slartibart70
Copy link
Author

slartibart70 commented Jan 14, 2025

using the example above with ingress, here are the logs of a freshly created podman cluster,
now, sometimes i get a response from

curl 10.89.0.62/foo
foo-app

but most of the time it hangs until Ctrl-C is pressed
This is similar with /bar

logs.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants