Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re try CI jobs #7864

Closed
wants to merge 4 commits into from
Closed

re try CI jobs #7864

wants to merge 4 commits into from

Conversation

onurctirtir
Copy link
Member

DESCRIPTION: Fixes a bug that might cause a deadlock when upgrading Citus.

Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered

As of this commit, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.

And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.

This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.

Also fix a flaky citus upgrade test.

Finally, upgrade download-artifacts action to 4.1.8 andupload-artifacts action to 4.6.0
since v3.x.y is no longer supported for both actions.

…ts are recovered

As of this commit, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.

And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.

This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.
@onurctirtir onurctirtir marked this pull request as ready for review January 31, 2025 12:05
Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.19%. Comparing base (c55bc8c) to head (4cad81d).
Report is 6 commits behind head on release-13.0.

❌ Your project status has failed because the head coverage (85.19%) is below the target coverage (87.50%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@               Coverage Diff                @@
##           release-13.0    #7864      +/-   ##
================================================
- Coverage         89.48%   85.19%   -4.30%     
================================================
  Files               276      276              
  Lines             60063    60061       -2     
  Branches           7524     7524              
================================================
- Hits              53747    51166    -2581     
- Misses             4166     6488    +2322     
- Partials           2150     2407     +257     

@onurctirtir onurctirtir deleted the avoid-deadlock-xact-recov-1 branch January 31, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant