Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28851: HiveIcebergMetaHook acquires an HMS lock, regardless of the config and operations #5722

Merged
merged 11 commits into from
Apr 6, 2025

Conversation

owenmonn
Copy link
Contributor

What changes were proposed in this pull request?

HiveIcebergMetaHook acquires the HMS lock only when necessary, depending on the operationType and the config(engine.hive.lock-enabled).

Why are the changes needed?

Stat tasks fail during insert queries on Iceberg tables.
How to reproduce:

set hive.support.concurrency=true;
set hive.txn.ext.locking.enabled=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.stats.autogather=true;

create table ice_t (i int) stored by iceberg tblproperties ('engine.hive.lock-enabled'='false');

insert into ice_t values (1);

StatsTask Fail:

2025-03-27T12:54:00,304 INFO [Thread-108] stats.BasicStatsTask: [Warning] could not update stats.Failed with exception Unable to alter table. org.apache.iceberg.hive.LockException: Timed out after 182005 ms waiting for lock on default.ice_t

HIVE_LOCKS Table:

+----------------+---------+----------+---------------+--------------+-----+---------------------+
| HL_LOCK_EXT_ID | HL_DB   | HL_TABLE | HL_LOCK_STATE | HL_LOCK_TYPE | ... | HL_BLOCKEDBY_EXT_ID |
+----------------+---------+----------+---------------+--------------+-----+---------------------+
|            110 | default | ice_t    | a             | w            | ... |                NULL |
|            111 | default | ice_t    | w             | x            | ... |                 110 |
+----------------+---------+----------+---------------+--------------+-----+---------------------+

Does this PR introduce any user-facing change?

The Stat tasks of the Iceberg table will succeed

Is the change a dependency upgrade?

No

How was this patch tested?

Unit test added

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses an issue where HiveIcebergMetaHook acquires an HMS lock unnecessarily during certain operations, causing Stat tasks to fail on Iceberg tables. Key changes include:

  • Adding a new test (testStatsWithPessimisticLockInsert) to verify that lock acquisition is properly bypassed when not required.
  • Refactoring HiveIcebergMetaHook to conditionally acquire a lock by introducing a helper method (lockObject) and logic to determine lock eligibility (hiveLockEnabled).
  • Making the HiveLock interface public to support its use across packages.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
iceberg/iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStatistics.java Adds a test to verify that statistics computation bypasses locking when appropriate.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Refactors lock acquisition logic to conditionally acquire locks based on operation type and configuration.
iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveLock.java Updates the interface to public to enable broader usage.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests

@owenmonn
Copy link
Contributor Author

owenmonn commented Apr 4, 2025

@deniskuzZ

Thanks for the review! The tests seem to be pending due to the "This workflow requires approval from a maintainer" requirement.

Could you please approve the workflow?

@owenmonn
Copy link
Contributor Author

owenmonn commented Apr 5, 2025

3567afc

The tests TestTxnCommands#testParallelTruncateAnalyzeStats and TestTxnCommandsWithSplitUpdateAndVectorization#testParallelTruncateAnalyzeStats were failing due to the default value change of hive.txn.xlock.write to false.

When hive.txn.xlock.write=false, the ANALYZE query throws a RuntimeException if it attempts to acquire a SHARED_READ lock after the TRUNCATE query has already acquired EXCL_WRITE lock, contrary to the expected test results.

This commit adjusts hive.txn.xlock.write to true in test to address this issue.

Copy link

sonarqubecloud bot commented Apr 5, 2025

@deniskuzZ
Copy link
Member

deniskuzZ commented Apr 5, 2025

3567afc

The tests TestTxnCommands#testParallelTruncateAnalyzeStats and TestTxnCommandsWithSplitUpdateAndVectorization#testParallelTruncateAnalyzeStats were failing due to the default value change of hive.txn.xlock.write to false.

When hive.txn.xlock.write=false, the ANALYZE query throws a RuntimeException if it attempts to acquire a SHARED_READ lock after the TRUNCATE query has already acquired EXCL_WRITE lock, contrary to the expected test results.

This commit adjusts hive.txn.xlock.write to true in test to address this issue.

hive.txn.xlock.write doesn't change the locking type needed for TRUNCATE or ANALYZE.
TRUNCATE takes an EXCLUSIVE lock and ANALYZE - SHARED_READ
However, it activates the ZeroWaitRead optimization that allows SHARED_READ to fail fast and not wait for EXCLUSIVE lock to be released.

@owenmonn
Copy link
Contributor Author

owenmonn commented Apr 5, 2025

hive.txn.xlock.write doesn't change the locking type needed for TRUNCATE or ANALYZE.
TRUNCATE takes an EXCLUSIVE lock and ANALYZE - SHARED_READ
However, it activates the ZeroWaitRead optimization that allows SHARED_READ to fail fast and not wait for EXCLUSIVE lock to be released.

You're right. Enabling ZeroWaitRead causes existing tests to fail. To address this, I've disabled ZeroWaitRead in the existing tests by setting hive.txn.xlock.write=true.

test failure log:

java.lang.RuntimeException: java.lang.RuntimeException: analyze table mm_table compute statistics for columns failed: 
(responseCode = 10, 
 errorMessage = FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired, retry after some time. 
 LockResponse(lockid:10, state:NOT_ACQUIRED, errorMessage:Unable to acquire read lock due to an existing exclusive lock 
{lockid:9,  intLockId:1, txnid:9, db:default, table:mm_table, partition:null, state:ACQUIRED, type:EXCLUSIVE}), 
 hiveErrorCode = 40000,  SQLState = 42000,  exception = Locks on the underlying objects cannot be acquired, retry after some time. 

@deniskuzZ deniskuzZ merged commit c392d38 into apache:master Apr 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants