Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with transaction recovery is possible during Citus upgrades #7875

Open
onurctirtir opened this issue Feb 4, 2025 · 1 comment
Open

Comments

@onurctirtir
Copy link
Member

onurctirtir commented Feb 4, 2025

This is especially possible for the upgrade paths where we are taking strong locks, as in citus--11.2-2--11.3-1.sql.

An example on how this upgrade path can cause a deadlock with transaction recovery:

  1. RecoverTwoPhaseCommits() acquires RowExclusiveLock on pg_dist_transaction via RecoverWorkerTransactions() and it doesn't release the lock until the end of the transaction as close the table with "NoLock".
  2. Upgrade script acquires AccessExclusiveLock on pg_dist_authinfo because of the following DDL:
    "ALTER TABLE pg_catalog.pg_dist_authinfo REPLICA IDENTITY USING INDEX pg_dist_authinfo_identification_index;"
  3. RecoverTwoPhaseCommits() continues to process the next worker via RecoverWorkerTransactions() and this implicitly tries to acquire AccessShareLock on pg_dist_authinfo via GetNodeConnection().
  4. Upgrade script tries to acquire AccessExclusiveLock on pg_dist_transaction because of the following DDL:
    "ALTER TABLE pg_catalog.pg_dist_transaction REPLICA IDENTITY USING INDEX pg_dist_transaction_unique_constraint;"

Now, while the process that is executing "ALTER EXTENSION citus UPDATE" is waiting to acquire AccessExclusiveLock on pg_dist_transaction while holding AccessExclusiveLock on pg_dist_authinfo; maintenance daemon is waiting to acquire AccessShareLock on pg_dist_authinfo while holding AccessExclusiveLock on pg_dist_authinfo.

As a result, the processes involved in the deadlock are cancelled by Postgres to resolve the deadlock.

Luckily, this doesn't leave the database in a bad state or such because Postgres implicitly executes the upgrade scripts within an implicit transaction block and a retry mostly helps.

But rather than retrying, until we properly fix this issue, disabling 2PC recovery during Citus upgrade seems like a more reliable workaround, as in;

ALTER SYSTEM SET citus.recover_2pc_interval TO -1;
SELECT pg_reload_conf();

ALTER EXTENSION citus UPDATE;

ALTER SYSTEM RESET citus.recover_2pc_interval;
SELECT pg_reload_conf();
@onurctirtir
Copy link
Member Author

Note to future dev:

Once we come up with a fix for that, we should also revert b6b73e2 from the branches where it presents while backporting the fix.

As of today, only main & release-13.0 has b6b73e2.

codeforall added a commit to codeforall/citus that referenced this issue Feb 24, 2025
…des (citusdata#7875)

Currently, RecoverWorkerTransactions() creates a new connection for each worker
node and then performs transaction recovery by reading and locking the
pg_dist_transaction catalog table until the end of the transaction.
When RecoverTwoPhaseCommits() calls RecoverWorkerTransactions() for each worker
node, the lock acquisition order between pg_dist_authinfo and
pg_dist_transaction can reverse on alternate iterations.
This reversal can lead to a deadlock if any concurrent process requires locks
on these catalog tables—a situation that has surfaced during the
Citus upgrade workflow.

To resolve this, we now pre-establish all worker node connections upfront.
This change ensures that RecoverWorkerTransactions() operates with a single,
consistent distributed catalog table connection, thereby always acquiring locks
on pg_dist_authinfo and pg_dist_transaction in the correct order and preventing
potential deadlocks during extension updates or similar operations.
@codeforall codeforall mentioned this issue Feb 24, 2025
codeforall added a commit to codeforall/citus that referenced this issue Feb 24, 2025
…des (citusdata#7875)

Currently, RecoverWorkerTransactions() creates a new connection for each worker
node and then performs transaction recovery by reading and locking the
pg_dist_transaction catalog table until the end of the transaction.
When RecoverTwoPhaseCommits() calls RecoverWorkerTransactions() for each worker
node, the lock acquisition order between pg_dist_authinfo and
pg_dist_transaction can reverse on alternate iterations.
This reversal can lead to a deadlock if any concurrent process requires locks
on these catalog tables—a situation that has surfaced during the
Citus upgrade workflow.

To resolve this, we now pre-establish all worker node connections upfront.
This change ensures that RecoverWorkerTransactions() operates with a single,
consistent distributed catalog table connection, thereby always acquiring locks
on pg_dist_authinfo and pg_dist_transaction in the correct order and preventing
potential deadlocks during extension updates or similar operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant