-
Notifications
You must be signed in to change notification settings - Fork 539
Description
Description
When starting a ClickHouseKeeperInstallation (with 3 replicas), the first keeper cannot start because it seems that it cannot check the readiness probe.
Operator version: 0.26.1
Steps to Reproduce
Deploy a CHK with 3 keeper replicas
Observe that keeper-0 starts but doesn't become Ready (the readiness probe fails to connect)
Other keeper 1 and 2 are deployed and successfully running.
Clickhouse is still accessible and the CHK CRD is complete.
Expected Behavior
After a start of CHK, all keeper replicas should start correctly with the readiness probe check OK and the ClickHouseInstallation can proceed normally.
Workarounds Attempted
I thought that it was fixed with this commit but I managed to get it fixed on my env by setting this liveness probe into my readiness probe, as is:
templates: ... containers: - name: clickhouse-keeper ... readinessProbe: exec: command: - bash - -xc - | date && OK=$(exec 3<>/dev/tcp/127.0.0.1/2181; printf 'ruok' >&3; IFS=; tee <&3; exec 3<&-;);if [[ "${OK}" == "imok" ]]; then exit 0; else exit 1; fi initialDelaySeconds: 5 periodSeconds: 3 timeoutSeconds: 3
Additional Context
Error:
Events of the pod ... │ Normal Started 10m kubelet Container started │ │ Warning Unhealthy 9m27s (x12 over 10m) kubelet Readiness probe failed: Get "http://100.64.2.118:9182/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers │ │ ) │ │ Warning Unhealthy 31s (x112 over 10m) kubelet Readiness probe failed: Get "http://100.64.2.118:9182/ready": dial tcp 100.64.2.118:9182: connect: connection refused
Pod's logs:
... │ 2026.03.20 11:03:06.966067 [ 45 ] {} <Information> RaftInstance: pre-vote decision: X (deny) │ │ 2026.03.20 11:03:25.109830 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9182 failed: Code: 159. DB::Exception: Cannot receive session id within session timeout. (TIMEOUT_EXCEEDED) (version 26.1.4.35 (official buil │ │ d)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> │ │ 2026.03.20 11:03:25.110054 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:2181 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:2181 (version 26.1.4.35 (official bui │ │ ld)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> │ │ 2026.03.20 11:03:25.110220 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:7000 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:7000 (version 26.1.4.35 (official bui │ │ ld)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> ...
Thanks,
William