Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix Deadlock with transaction recovery is possible during Citus upgra…
…des (citusdata#7875) Currently, RecoverWorkerTransactions() creates a new connection for each worker node and then performs transaction recovery by reading and locking the pg_dist_transaction catalog table until the end of the transaction. When RecoverTwoPhaseCommits() calls RecoverWorkerTransactions() for each worker node, the lock acquisition order between pg_dist_authinfo and pg_dist_transaction can reverse on alternate iterations. This reversal can lead to a deadlock if any concurrent process requires locks on these catalog tables—a situation that has surfaced during the Citus upgrade workflow. To resolve this, we now pre-establish all worker node connections upfront. This change ensures that RecoverWorkerTransactions() operates with a single, consistent distributed catalog table connection, thereby always acquiring locks on pg_dist_authinfo and pg_dist_transaction in the correct order and preventing potential deadlocks during extension updates or similar operations.
- Loading branch information