Skip to content

ClickHouseKeeperInstallation: First keepers doesn't pass the readiness probe #1945

@williamquach

Description

@williamquach

Description

When starting a ClickHouseKeeperInstallation (with 3 replicas), the first keeper cannot start because it seems that it cannot check the readiness probe.

Operator version: 0.26.1

Steps to Reproduce

Deploy a CHK with 3 keeper replicas
Observe that keeper-0 starts but doesn't become Ready (the readiness probe fails to connect)
Other keeper 1 and 2 are deployed and successfully running.
Clickhouse is still accessible and the CHK CRD is complete.

Expected Behavior

After a start of CHK, all keeper replicas should start correctly with the readiness probe check OK and the ClickHouseInstallation can proceed normally.

Workarounds Attempted

I thought that it was fixed with this commit but I managed to get it fixed on my env by setting this liveness probe into my readiness probe, as is:
templates: ... containers: - name: clickhouse-keeper ... readinessProbe: exec: command: - bash - -xc - | date && OK=$(exec 3<>/dev/tcp/127.0.0.1/2181; printf 'ruok' >&3; IFS=; tee <&3; exec 3<&-;);if [[ "${OK}" == "imok" ]]; then exit 0; else exit 1; fi initialDelaySeconds: 5 periodSeconds: 3 timeoutSeconds: 3

Additional Context

Error:
Events of the pod ... │ Normal Started 10m kubelet Container started │ │ Warning Unhealthy 9m27s (x12 over 10m) kubelet Readiness probe failed: Get "http://100.64.2.118:9182/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers │ │ ) │ │ Warning Unhealthy 31s (x112 over 10m) kubelet Readiness probe failed: Get "http://100.64.2.118:9182/ready": dial tcp 100.64.2.118:9182: connect: connection refused

Pod's logs:
... │ 2026.03.20 11:03:06.966067 [ 45 ] {} <Information> RaftInstance: pre-vote decision: X (deny) │ │ 2026.03.20 11:03:25.109830 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:9182 failed: Code: 159. DB::Exception: Cannot receive session id within session timeout. (TIMEOUT_EXCEEDED) (version 26.1.4.35 (official buil │ │ d)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> │ │ 2026.03.20 11:03:25.110054 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:2181 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:2181 (version 26.1.4.35 (official bui │ │ ld)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> │ │ 2026.03.20 11:03:25.110220 [ 1 ] {} <Warning> Application: Listen [0.0.0.0]:7000 failed: Poco::Exception. Code: 1000, e.code() = 98, Net Exception: Address already in use: 0.0.0.0:7000 (version 26.1.4.35 (official bui │ │ ld)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled │ │ IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host> ...

Thanks,
William

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions