Skip to content

HIVE-28916: Fix unbalannced calls in ObjectStore rollback transaction #5780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wecharyu
Copy link
Contributor

What changes were proposed in this pull request?

Fix the bug in ObjectStore#openTransaction and ObjectStore#rollbackTransaction.

Why are the changes needed?

Currently rollbackTransaction() always reset openTransactionCalls to 0, but it would cause unbalanced issue in nested transaction calls, for example:

openTransaction()
  // new function1
  {
    openTransaction()
    commitTransaction() # place1
  }

  // new function2
  {
    openTransaction()
    commitTransaction()
  }
commitTransaction() # place2

If the new function1 rollbackTransaction in place1, then it would get unbalanced exception in place2.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add a unit test:

mvn test -Dtest.groups= -Dtest=org.apache.hadoop.hive.metastore.TestObjectStore#testNestedTransaction -pl :hive-standalone-metastore-server

@dengzhhu653
Copy link
Member

Some thoughts:

  • at place1, the openTrasactionCalls should be 2, so commitTransaction is just changing the openTrasactionCalls to 1, we don't reach the end of the transaction here.
  • if we rollback at the place1, doesn't it mean we get an unexpected exception here, why do we continue to execute to place2?

@wecharyu
Copy link
Contributor Author

if we rollback at the place1, doesn't it mean we get an unexpected exception here, why do we continue to execute to place2?

@dengzhhu653 There are some cases where rollback transaction does not need fail the api calls, for example create_xxx will always check if it already exists by get_xxx that throw expected NoSuchObjectException, in this case we should continue other queries.

Copy link

@dengzhhu653
Copy link
Member

if we rollback at the place1, doesn't it mean we get an unexpected exception here, why do we continue to execute to place2?

@dengzhhu653 There are some cases where rollback transaction does not need fail the api calls, for example create_xxx will always check if it already exists by get_xxx that throw expected NoSuchObjectException, in this case we should continue other queries.

Per my understanding, this shouldn't happen, imaging we rollback the transaction at place1, then the ObjectStore openTrasactionCalls is reset to 0, if we continue to execute, then means we have one openTransaction and two commitTransaction, the commitTransaction at place2 should throw the exception:

if (openTrasactionCalls <= 0) {
      RuntimeException e = new RuntimeException("commitTransaction was called but openTransactionCalls = "
          + openTrasactionCalls + ". This probably indicates that there are unbalanced " +
          "calls to openTransaction/commitTransaction");
      LOG.error("Unbalanced calls to open/commit Transaction", e);
      throw e;
 }

the Metastore client call should get this exception, but our tests work fine.

@wecharyu
Copy link
Contributor Author

Per my understanding, this shouldn't happen, imaging we rollback the transaction at place1, then the ObjectStore
openTrasactionCalls is reset to 0, if we continue to execute, then means we have one openTransaction and two commitTransaction, the commitTransaction at place2 should throw the exception:

if (openTrasactionCalls <= 0) {
     RuntimeException e = new RuntimeException("commitTransaction was called but openTransactionCalls = "
         + openTrasactionCalls + ". This probably indicates that there are unbalanced " +
         "calls to openTransaction/commitTransaction");
     LOG.error("Unbalanced calls to open/commit Transaction", e);
     throw e;
}

the Metastore client call should get this exception, but our tests work fine.

Yes it works now because current code does not put get_xxx and create_xxx into the same transaction, it would fail if we openTransaction at the top of a create_xxx api.

Additionally, the failed tests in this PR reveal following unbalanced calls that are hidden by current implementation:

  • commitTransaction() twice for a openTransaction()
  • rollbackTransaction() even it does not call openTransaction()

@@ -1299,7 +1297,9 @@ private void create_database_core(RawStore ms, final Database db)
success = ms.commitTransaction();
} finally {
if (!success) {
ms.rollbackTransaction();
if (openTransaction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you checking for an openTransaction flag here? I don't see the same check in other places.

if (isActiveTransaction() && transactionStatus != TXN_STATUS.ROLLBACK) {
    currentTransaction.rollback();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In create_database_core it may throw exception before calling openTransaction(), so it may not need call rollbackTransaction() in finally block.

ms.openTransaction();
firePreEvent(new PreCreateDataConnectorEvent(connector, this));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, firePreEvent may throw exception and not call openTransaction().

@@ -983,12 +983,11 @@ public void alter_catalog(AlterCatalogRequest rqst) throws TException {
GetCatalogResponse oldCat = null;

try {
ms.openTransaction();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to open txn before the PreAlterCatalogEvent? are those transactional as well?

@@ -191,6 +184,10 @@ public void alterTable(RawStore msdb, Warehouse wh, String catName, String dbnam
TableName.getQualified(catName, dbname, name) + " doesn't exist");
}

String expectedKey = environmentContext != null && environmentContext.getProperties() != null ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why move non-transactional logic under the txn boundary?

@@ -561,17 +561,10 @@ public void shutdown() {
@Override
public boolean openTransaction() {
openTrasactionCalls++;
if (openTrasactionCalls == 1) {
if (currentTransaction == null || !currentTransaction.isActive()){
Copy link
Member

@deniskuzZ deniskuzZ May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check for !currentTransaction.isActive()? What if we called rollback on the interior txn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To open transaction if it's not opened. It would reopen a new transaction if has already called rollback.

transactionStatus = TXN_STATUS.COMMITED;
currentTransaction.commit();
}
openTrasactionCalls--;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if commit throws RuntimeException, db txn would be closed, but openTrasactionCalls won't be 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If commit throw RuntimeException, it should call rollback in the finally block of previous layer where the openTransactionCalls would be decreased.

@@ -662,7 +655,7 @@ public void rollbackTransaction() {
currentTransaction.rollback();
}
} finally {
openTrasactionCalls = 0;
openTrasactionCalls--;
Copy link
Member

@deniskuzZ deniskuzZ May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after rollback openTrasactionCalls should be 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our unit test could not pass if is 0 here, the key point seems that if we should support such usage:

openTransaction()
  // new function1
  {
    openTransaction()
    commitTransaction() # place1
  }

  // new function2
  {
    openTransaction()
    commitTransaction()
  }
commitTransaction() # place2

commitTransaction();
throw new NoSuchObjectException("partition values="
+ partVals.toString());
return new GetHelper<Partition>(catName, dbName, tblName, false, true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why refactor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may throw exception, so we need such logic:

try {
  openTransaction();

  success = commitTransaction();
} finally {
  if (!success) rollbackTransaction();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants