Skip to content

[WIP] HIVE-27473: Rewrite MetaStoreClients to be composable #5771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ngsg
Copy link
Contributor

@ngsg ngsg commented Apr 14, 2025

Design document: Google docs

What changes were proposed in this pull request?

This patch refactors HiveMetaStoreClient and its subclasses in composable classes.

Why are the changes needed?

By making SessionHiveMetaStoreClient composable, this refactoring enables users to create and use custom Catalogs with Hive's session-level features, such as temporary tables.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

This patch is currently in draft state, so no tests have been added yet.

@okumin
Copy link
Contributor

okumin commented Apr 14, 2025

The content in Google Doc is perfectly aligned with my memory. I'm still reading the proposed changes. I'd appreciate it if the latest dependencies between MetaStoreClients somewhere were written somewhere

* It helps to reduce the time spent in compilation by using HS2 memory more effectively, and it allows to
* improve HMS throughput for multi-tenant workloads by reducing the number of calls it needs to serve.
*/
public class HiveMetaStoreClientWithLocalCache extends NoopHiveMetaStoreClientDelegator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could cause confusion as we introduce classes with exactly the same name

Copy link
Member

@deniskuzZ deniskuzZ Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we call it CachingMetaStoreClientProxy/Decorator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking the same, the name should be different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed it to LocalCachingMetaStoreClientProxy.

* This class provides two features: query-level cache and Session level transaction.
* SG:FIXME, is "Session level transaction" correct explanation?
*/
public class HiveMetaStoreClientWithSessionFeature extends NoopHiveMetaStoreClientDelegator
Copy link
Member

@deniskuzZ deniskuzZ Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SessionMetaStoreClientProxy/Decorator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed it to SessionMetaStoreClientProxy.

new org.apache.hadoop.hive.ql.metadata.client.HiveMetaStoreClientWithLocalCache(conf, thriftClient);
IMetaStoreClient sessionLevelClient =
new HiveMetaStoreClientWithSessionFeature(conf, clientWithLocalCache);
IMetaStoreClient clientWithTmpTable = new HiveMetaStoreClientWithTmpTable(conf, sessionLevelClient);
Copy link
Member

@deniskuzZ deniskuzZ Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we could extract this logic into MetaStoreClientFactory that implements IMetaStoreClientFactory so users could define constructors for custom clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suggested extraction would be more appropriately addressed in HIVE-12679, which is the origin of this PR.
(BTW, I'm particularly interested in supporting custom clients that connect to external catalog such as AWS Glue or the Iceberg REST catalog, and I agree that it would be nice if we provide IMetaStoreClientFactory as an entrypoint of writing a custom client.)

import java.util.List;
import java.util.Map;

public class NoopHiveMetaStoreClientDelegator implements IMetaStoreClient {
Copy link
Member

@deniskuzZ deniskuzZ Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe MetaStoreClientProxy/Decorator + should be abstract

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have renamed it to BaseMetaStoreClientProxy and made it abstract.

import static org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.getPvals;
import static org.apache.hadoop.hive.metastore.utils.MetaStoreUtils.isExternalTable;

public class HiveMetaStoreClientWithTmpTable extends NoopHiveMetaStoreClientDelegator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure we need to extract this since we have just 1 internal usage of it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I integrate this layer with session feature layer. Now SessionMetaStoreClientProxy handles both temporary tables and session-level features.

@ngsg
Copy link
Contributor Author

ngsg commented Apr 15, 2025

@okumin, I have added an Appendix section to the design document, which summarizes the dependencies between MetaStoreClients as well as which classes in the PR codebase provide specific features compared to the master branch. I'm not entirely sure if this aligns with your request, so please let me know if it doesn't address what you were looking for.

@ngsg
Copy link
Contributor Author

ngsg commented Apr 15, 2025

@deniskuzZ, Thank you for looking over the PR. I think the suggested class names make sense, and I'll update the PR accordingly.

Regarding HiveMetaStoreClientWithTmpTable, my initial thought was that we should separate features as much as possible to allow custom catalog implementations to selectively include only what they need. However, I have reconsidered my thought and now believe that the features originating from SessionHiveMetaStoreClient and HiveMetaStoreClientWithLocalCache should always wrap the custom catalog. As described in the figure 2 of the design document, the next commit will include four layers (Hook, Session, LocalCache, Thrift) instead of five (Hook, TmpTable, Session, LocalCache, Thrift).

import java.util.Map;

public class NoopHiveMetaStoreClientDelegator implements IMetaStoreClient {
// effectively final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could. I changed it to final.

CacheKey cacheKey = new CacheKey(KeyType.TABLE, req);
Table r = (Table) mscLocalCache.getIfPresent(cacheKey);
if (r == null) {
r = getDelegate().getTable(req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this.delegate instead of getDelegate().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, since we cannot remove the public accessor getDelegate() due to HiveMetaStoreClient, it would be better to keep using getDelegate() in inherited classes as well as HiveMetaStoreClient. I'll check HiveMetaStoreClient again and remove the public accessor if we can safely decouple it from underlying client.

this.delegate = delegate;
}

public IMetaStoreClient getDelegate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be available outside of this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. I made it possible to set and access delegate externally because it was necessary during the early implementation. Since this PR code is directly come from the PoC stage, the code is quite rough, but the next commit will likely be an improved version that addresses your feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, HiveMetaStoreClient needs the public accessor. I didn't investigate the code deeply yet, and I'll check whether we can decouple HiveMetaStoreCleint from its delegate and finally remove this public accessor.

return new GetPartitionsByNamesResult(result);
private static IMetaStoreClient createUnderlyingClient(Configuration conf, HiveMetaHookLoader hookLoader,
Boolean allowEmbedded) throws MetaException {
IMetaStoreClient thriftClient = new ThriftHiveMetaStoreClient(conf, allowEmbedded);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have ThriftHiveMetaStoreClient here it means we can not reuse the functionality of this class with other than ThriftHiveMetaStoreClient e.g. RESTClient for Iceberg REST catalog can not reuse the functionality of ql/HiveMetaStoreClientWithLocalCache. This should be an input as delegate I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't consider custom catalogs in this PR as it is not a consideration of HIVE-27473. In my opinion, HIVE-12679 and HIVE-28658 could address custom catalog issues based on the separated HMSClient layers introduced in this PR.

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the dropTable-related methods, and they should work. I also think removing the zig-zag dependencies helps our maintainability of the Hive project.
Instead, we have to maintain lots of @Override in any classes, and third-party developers also have to do it. It could be a negative point we want to resolve someday.

public void createTable(Table tbl) throws MetaException, NoSuchObjectException, TException {
CreateTableRequest request = new CreateTableRequest(tbl);
createTable(request);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need createTable(Table tbl, EnvironmentContext envContext)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add createTable(Table tbl, EnvironmentContext envContext) since it is not a method of IMetaStoreClient.

It seems that only the test code uses this method, so I think we can safely skip this method by modifying the test to use createTable(CreateTableRequest request), which is a method of IMetaStoreClient and can be used to pass both Table and EnvironmentContext.

@Override
public void createTable(CreateTableRequest request)
throws MetaException, NoSuchObjectException, TException {
Table tbl = request.getTable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more consistent if we set the default catalog name when it's empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that your suggestion makes the code more consistent.

However, I have a question about the use of the default catalog and where the responsibility for setting it should lie. Currently, we use getDefaultCatalog() in many places to fill in empty catalog names. In my opinion, there should ideally be a single point of responsibility, or at least as few classes as possible, where the default catalog is defined and applied.

I also think we should minimize the behavior of proxy classes whenever possible. Rather than setting the default catalog within proxy classes, it might be better to pass null and let the underlying class, or even HMSHandler, handle it.

I'm not entirely sure this is a better approach, and I'm fine with continuing the current strategy of applying getDefaultCatalog() wherever needed. It would be helpful if you could share your opinion on this issue.

ignoreUnknownTable);
// SG:FIXME, introduce IMetaStoreClient.dropTable :: DropTableRequest -> void
// We need a new method that takes catName and checks tbl.isSetTxnId()
// Or does TxN stuff meaningful only if we are in default catalog?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to invoke the hook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have enclosed the dropTable call with the hook. I'll check this method again as well as the noted issue when I revisit HookMetaStoreClientProxy.

@ngsg ngsg force-pushed the HIVE-27473-Rewrite-MetaStoreClients-to-be-composable branch from e620366 to 147329d Compare April 22, 2025 11:43
@ngsg
Copy link
Contributor Author

ngsg commented Apr 22, 2025

@okumin , @deniskuzZ , @zratkai, thank you for your review. I've updated the branch to address your feedback and improve code quality. The main changes are summarized as follows:

  • Renamed the new classes
  • Integrated the temporary table layer and session feature layer
  • Updated {Session/LocalCaching}MetaStoreClientProxy to handle ValidTxnWriteIdList properly
  • Modified SessionMetaStoreClientProxy to set processorCapability in advance, ensuring the cached key matches the actual request
  • Changed the access level of TempTable from package-private to public, in order to place all proxy classes under the same directory

There are still several remaining issues, including SG:FIXME markers and some unchecked areas. In the latest commit, I focused on SessionMetaStoreClientProxy and plan to move on to the other proxy classes. I’d appreciate it if you could review the changes and share any considerations I may have missed.

@okumin
Copy link
Contributor

okumin commented Apr 23, 2025

@okumin
Copy link
Contributor

okumin commented Apr 23, 2025

Can we raise the proposal to the mailing list and tie-break the best approaches at this point? People in #4444 should be very interested. Then, we can invest our resources in reviewing this pull request.

The Option 2 of my original document might make integration clean. We will add a small change to Apache Hive, and third-party vendors implement the minimal number of methods. However, this approach doesn't allow existing integrations to run as they are.

If we can make Option 1 clean and the community people agree with it, I think many people would be happy. Additionally, the HiveMetaStoreClient family with complicated dependencies will be decomposed!

@ngsg
Copy link
Contributor Author

ngsg commented Apr 24, 2025

I found Hive 4.0.0 has around 20 deprecated methods. Are we allowed to discontinue them before HIVE-27473?

I think removing the deprecated methods would be helpful for simplifying the work in HIVE-27473. However, I'm not entirely sure about the standard process for removing deprecated methods in Hive. If their removal is acceptable at this stage, I would be supportive of proceeding in that direction and able to work for it.

@ngsg
Copy link
Contributor Author

ngsg commented Apr 24, 2025

Can we raise the proposal to the mailing list and tie-break the best approaches at this point? People in #4444 should be very interested. Then, we can invest our resources in reviewing this pull request.

The Option 2 of my original document might make integration clean. We will add a small change to Apache Hive, and third-party vendors implement the minimal number of methods. However, this approach doesn't allow existing integrations to run as they are.

If we can make Option 1 clean and the community people agree with it, I think many people would be happy. Additionally, the HiveMetaStoreClient family with complicated dependencies will be decomposed!

Sure, it would be nice to make an intermediate design decision at this point.
Before raising the proposal on the mailing list, I have a small question about your comment: when you mentioned “existing integrations,” were you referring to integration tests, or to external integrations such as third-party catalogs? I’d like to make sure I fully understand the impact you're referring to, especially in terms of potential behavioral changes.

@okumin
Copy link
Contributor

okumin commented Apr 28, 2025

@ngsg

when you mentioned “existing integrations,” were you referring to integration tests, or to external integrations such as third-party catalogs?

I know Glue Data Catalog depends on the unmerged HIVE-12679, and I can infer that Databricks also uses or has used it from their release notes. Keeping the compatibility might help those users(it is also possible that they prefer a cleaner interface). I will involve them once we start the discussion.

@okumin
Copy link
Contributor

okumin commented Apr 28, 2025

I happened to find Apache Ranger also relies on IMetaStoreClient. I believe HIVE-27473 does not impact the feature, but I am leaving a comment nonetheless.

https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java

@ngsg
Copy link
Contributor Author

ngsg commented Apr 28, 2025

@okumin, I understand now. Thank you for the explanation.
I’ve also started a discussion on the mailing list. It would be great if you could take a look and help enrich it with your insights.
Thanks again for your support.

@deniskuzZ
Copy link
Member

i think we can drop the deprecated methods in 4.1 and right now review and mark methods that we want to discontinue

Copy link

@@ -3411,6 +3411,11 @@ public boolean listPartitionsByExpr(String catName, String db_name, String tbl_n
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is being removed.
#5812

@dengzhhu653 dengzhhu653 self-requested a review May 14, 2025 04:04
@okumin
Copy link
Contributor

okumin commented May 14, 2025

My review progress is visible here.
https://gist.github.com/okumin/add5b97c3b6674820cf1aad433856e9f

@ngsg @deniskuzZ @dengzhhu653 While reviewing the pull request, I found MetaStoreFilterHook is used in 40+ places, and I guess we no longer need it. Do you think it is a reasonable option to discontinue it immediately?

Reason: MetaStoreFilterHook is declared as LimitedPrivate for Apache Sentry, which is already retired. At least, I don't see any use cases in Apache Hive(and Apache Ranger as well). It would simplify the entire dependency and reduce the risks where inconsistencies can unexpectedly cause a problem.

% git grep metastore.filter.hook       
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:    METASTORE_FILTER_HOOK("hive.metastore.filter.hook", "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl",
data/files/rfc5424-hs2.log:<14>1 2019-03-21T07:06:19.799Z hiveserver2-99ddccbd8-49sbv hiveserver2 1 cb6c1a29-4ba7-11e9-83d0-02c17ec35ac0 [mdc@18060 class="metastore.HiveMetaStoreClient" level="INFO" thread="main"] Mestastore configuration metastore.filter.hook changed from org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook to org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl
itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java:    conf.set("hive.metastore.filter.hook", "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl");
ql/src/test/queries/clientpositive/explainanalyze_3.q:set hive.metastore.filter.hook=org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl;
ql/src/test/queries/clientpositive/explainuser_3.q:set hive.metastore.filter.hook=org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl;
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java:    FILTER_HOOK("metastore.filter.hook", "hive.metastore.filter.hook",
%

@Override
public List<String> getTables(String dbname, String tablePattern, TableType tableType) throws MetaException {
try {
return getTables(getDefaultCatalog(conf), dbname, tablePattern, tableType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs to be getDelegate().getTables(dbname, tablePattern, tableType). That's because SessionMetaStoreClientProxy has getTables(String dbname, String tablePattern, TableType tableType) but doesn't have getTables(String catName, String dbname, String tablePattern, TableType tableType).

public List<TableMeta> getTableMeta(String dbPatterns, String tablePatterns, List<String> tableTypes)
throws MetaException {
try {
return getTableMeta(getDefaultCatalog(conf), dbPatterns, tablePatterns, tableTypes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to call getDeletage().getTableMeta(String dbPatterns, String tablePatterns, List<String> tableTypes)

@Override
public List<String> getAllTables(String dbname) throws MetaException {
try {
return getAllTables(getDefaultCatalog(conf), dbname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, this needs to use getDelegate().getAllTables(dbname)

@okumin
Copy link
Contributor

okumin commented May 14, 2025

I suspended the review because there are some cases where SessionMetaStoreClientProxy is unlikely to be invoked. It would be great if we could discontinue MetaStoreFilterHook, or we probably have to fix them a bit.

@okumin
Copy link
Contributor

okumin commented May 14, 2025

Sorry, MetaStoreFilterHook is used as HiveMetaStoreAuthorizer or AuthorizationMetaStoreFilterHook. I was searching only for the standalone-metastore project. My bad

}

@Override
public Partition appendPartition(String catName, String dbName, String tableName, List<String> partVals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to override appendPartition(String dbName, String tableName, List<String> partVals). The original HiveMetaStoreClient#appendPartition(String dbName, String tableName, List<String> partVals) invokes SessionHiveMetaStoreClient#appendPartition(String dbName, String tableName, List<String> partVals).

}

@Override
public Partition appendPartition(String catName, String dbName, String tableName, String partitionName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to override appendPartition(String dbName, String tableName, String name)

}

@Override
public int getNumPartitionsByFilter(String catName, String dbName, String tableName, String filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int getNumPartitionsByFilter(String dbName, String tableName, String filter) also needs to handle tmp tables.

@dengzhhu653
Copy link
Member

Sorry, MetaStoreFilterHook is used as HiveMetaStoreAuthorizer or AuthorizationMetaStoreFilterHook. I was searching only for the standalone-metastore project. My bad

Current we need this in the client side, IMO it's better to put the filter on the ThriftHiveMetaStoreClient to enforce the check

@dengzhhu653
Copy link
Member

dengzhhu653 commented May 15, 2025

I prefer the way to enhance the client through each lawyer.
I would like to make the chain thread-safe by default(for client in each lawyer) for non-embedded Metastore, especially for the ThriftHiveMetaStoreClient, like:
https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L2258-L2287


abstract public class BaseMetaStoreClientProxy implements IMetaStoreClient {

private final IMetaStoreClient delegate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private -> protected

public class SessionHiveMetaStoreClient extends HiveMetaStoreClientWithLocalCache implements IMetaStoreClient {
private static final Logger LOG = LoggerFactory.getLogger(SessionHiveMetaStoreClient.class);

public class SessionHiveMetaStoreClient extends BaseMetaStoreClientProxy implements IMetaStoreClient {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about make the constructor private, and create a static method newSessionHiveMetaStoreClient(Configuration conf, HiveMetaHookLoader hookLoader, Boolean allowEmbedded)

new LocalCachingMetaStoreClientProxy(conf, thriftClient);
IMetaStoreClient sessionLevelClient =
new SessionMetaStoreClientProxy(conf, clientWithLocalCache);
IMetaStoreClient clientWithHook = new HookMetaStoreClientProxy(conf, hookLoader, sessionLevelClient);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if hookLoader is null before creating the HookMetaStoreClientProxy?

}

@Override
public void truncateTable(String dbName, String tableName,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep these methods in this class?


@Override
public void dropDatabase(DropDatabaseRequest req) throws TException {
if (req.isCascade()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this logic both on this class and ThriftHiveMetaStoreClient?

}

@Override
public boolean dropPartition(String catName, String dbName, String tblName, List<String> partVals,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to override all dropPartition variants so that all of them handle tmp tables.

}

@Override
public void alter_partition(String catName, String dbName, String tblName, Partition newPart,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to override all variants.

@okumin
Copy link
Contributor

okumin commented May 15, 2025

@ngsg @dengzhhu653 @deniskuzZ

I reviewed half of the methods and commented on where I found issues. My impression is that the number of methods is huge, and it is not trivial to find mistakes.

I found we typically have two types of problems.

image

To avoid them, I am wondering if we can have an abstract class that enforces two rules. This is a sample implementation.

  • Rule A: Any subclass of IMetaStoreClient can implement the most generic variant of family methods. For example, if IMetaStoreClient has getTableMeta(db, tbl, types) and getTableMeta(catalog, db, tbl, types), vendors must implement only getTableMeta(catalog, db, tbl, types)
  • Rule B: All family methods behave consistently. For example, if IMetaStoreClient has getTableMeta(db, tbl, types) and getTableMeta(catalog, db, tbl, types), BOTH handle tmp tables or NEITHER handles tmp tables

From the point of view of vendors or Hive maintainers, they don't want to implement and maintain 5 types of getTable differently. However, we can't immediately discontinue obsolete variants from IMetaStoreClient, as Apache Spark or some projects depend on IMetaStoreClient or vendors really want to implement all variants one by one(maybe, no). That's why I am considering adding the layer as a helper abstract class.

In the new foundation, only the most generic method works as a bridge, and the other variants work just as aliases, which are correct in any layer. We can asynchronously discontinue legacy variants.

image

P.S. Anyway, I'm still reviewing the current PR, as it might not be easy to reach a consensus to add one more refactoring ticket.

public List<String> getTables(String catName, String dbName, String tablePattern) throws TException {
// SG:FIXME, inefficient
List<String> tableNames = getDelegate().getTables(catName, dbName, tablePattern);
// tables is filtered by FilterHook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use filterTableNamesIfEnabled? Probably, the answer is no.

hook.preTruncateTable(table, context, partNames);
}
// SG:FIXME
getDelegate().truncateTable(dbName, tableName, partNames, validWriteIds, writeId, deleteData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have to propagate catName.

}

@Override
public List<Table> getTableObjectsByName(String dbName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not reachable. I also wonder what values this method is adding. I feel getTableObjectsByName can be an alias of getTableObjectsByName.

@okumin
Copy link
Contributor

okumin commented May 15, 2025

I checked all methods and commented on all places where I had questions

@ngsg
Copy link
Contributor Author

ngsg commented May 21, 2025

@okumin, @dengzhhu653,

Thank you for your exhaustive reviews. Based on your comments, I summarize the remaining tasks as follows:

  1. Interface normalization issue
  2. FilterHook-related issues
  3. Thread-safety issue (particularly for non-embedded Metastore)
  4. General issues, including your review comments, failed test cases, and remaining FIXMEs

Unfortunately, I won’t be able to make significant progress on HIVE-27473 in the near future. I’ll do my best to return to it as soon as possible. Thanks again for your valuable reviews.

@ngsg
Copy link
Contributor Author

ngsg commented May 21, 2025

To avoid them, I am wondering if we can have an abstract class that enforces two rules. This is a sample implementation.

  • Rule A: Any subclass of IMetaStoreClient can implement the most generic variant of family methods. For example, if IMetaStoreClient has getTableMeta(db, tbl, types) and getTableMeta(catalog, db, tbl, types), vendors must implement only getTableMeta(catalog, db, tbl, types)
  • Rule B: All family methods behave consistently. For example, if IMetaStoreClient has getTableMeta(db, tbl, types) and getTableMeta(catalog, db, tbl, types), BOTH handle tmp tables or NEITHER handles tmp tables

I also thought of a similar approach when I was writing the PoC, and I agree that the two rules you stated must be upheld. Let me check the code again and try to implement your idea in the next commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants