Skip to content

Cannot switch over a cluster through API or patronictl even there is no replication lag #1152

@corama

Description

@corama

We encountered an issue while testing on switching over a cluster on test purpose: patroni node refused to perform and giving warning notes as bellow:

2025-08-15 07:03:07,502 INFO: Member patroni-5344-node-1 exceeds maximum replication lag
2025-08-15 07:03:07,502 WARNING: switchover: no healthy members found, switchover is not possible
2025-08-15 07:03:07,502 INFO: Cleaning up failover key

Actually there is no write load on the target cluster at all, cause we are just in testing phrase.
Here are the config for patroni:

$ cat postgres.yml
bootstrap:
  dcs:
    loop_wait: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      archive_mode: true
      hot_standby: 'on'
      log_destination: csvlog
      log_filename: postgresql-%Y-%m-%d_%H%M%S.log
      logging_collector: 'on'
      max_locks_per_transaction: 512
      parameters:
        archive_mode: 'on'
        archive_timeout: 1800s
        autovacuum_analyze_scale_factor: 0.02
        autovacuum_max_workers: 5
        autovacuum_vacuum_scale_factor: 0.05
        checkpoint_completion_target: 0.9
        hot_standby: 'on'
        log_autovacuum_min_duration: 0
        log_checkpoints: 'on'
        log_connections: 'on'
        log_disconnections: 'on'
        log_filename: postgresql-%Y-%m-%d_%H%M%S.log
        log_lock_waits: 'on'
        log_min_duration_statement: 500
        log_statement: ddl
        log_temp_files: 0
        max_connections: 136
        max_replication_slots: 10
        max_wal_senders: 10
        tcp_keepalives_idle: 900
        tcp_keepalives_interval: 100
        track_functions: all
        wal_compression: 'on'
        wal_level: hot_standby
        wal_log_hints: 'on'
      restart_after_crash: true
      unix_socket_directories: .
      use_pg_rewind: true
      use_slots: true
      wal_level: replica
    retry_timeout: 15
    ttl: 200
  initdb:
  - encoding: UTF8
  - locale: en_US.UTF-8
  - data-checksums
......

And cluster status is:

Image

We try to increase setting for maximum_lag_on_failover, but not work.

Any suggestions?
Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions