Skip to content

Conversation

@ebyhr
Copy link
Member

@ebyhr ebyhr commented Oct 23, 2025

Description

Fix flush_metadata_cache failure when metastore impersonation is enabled

Release notes

## Hive
* TBD. ({issue}`issuenumber`)

Summary by Sourcery

Enable flush_metadata_cache to function correctly when metastore impersonation is enabled by propagating an impersonationEnabled flag through Hive connector factories and modules, and updating the cache flush logic to use session-specific caching.

New Features:

  • Introduce an impersonationEnabled flag in TestingHivePlugin, TestingHiveConnectorFactory, HiveConnectorFactory, and HiveMetastoreFactory to support metastore impersonation.
  • Propagate the impersonationEnabled parameter through HiveMetastoreModule and connector factory constructors.

Enhancements:

  • Update FlushMetadataCacheProcedure to recognize ImpersonationCachingHiveMetastoreFactory and create a session-scoped CachingHiveMetastore for flushing.
  • Default connectors and modules now explicitly specify impersonationEnabled=false when impersonation isn’t configured.

Tests:

  • Add TestThriftMetastoreImpersonation to verify flush_metadata_cache works under metastore impersonation.

@cla-bot cla-bot bot added the cla-signed label Oct 23, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 23, 2025

Reviewer's Guide

This PR introduces metastore impersonation support for cache flushing by plumbing a new impersonation flag through connector factories and metastore modules, enhancing FlushMetadataCacheProcedure to detect and use an ImpersonationCachingHiveMetastoreFactory, and adding an integration test to verify flush_metadata_cache under impersonation.

Sequence diagram for cache flush with metastore impersonation enabled

sequenceDiagram
    participant "FlushMetadataCacheProcedure"
    participant "HiveMetastoreFactory"
    participant "ImpersonationCachingHiveMetastoreFactory"
    participant "CachingHiveMetastore"
    participant "GlueCache"
    actor "User"

    "User"->>"FlushMetadataCacheProcedure": invoke flush_metadata_cache
    "FlushMetadataCacheProcedure"->>"HiveMetastoreFactory": check if impersonation enabled
    alt impersonation enabled
        "FlushMetadataCacheProcedure"->>"ImpersonationCachingHiveMetastoreFactory": createMetastore(identity)
        "ImpersonationCachingHiveMetastoreFactory"-->>"FlushMetadataCacheProcedure": return CachingHiveMetastore
    else impersonation not enabled
        "FlushMetadataCacheProcedure"-->>"FlushMetadataCacheProcedure": use default cachingHiveMetastore
    end
    "FlushMetadataCacheProcedure"->>"CachingHiveMetastore": flushCache/flushPartitionCache/invalidateTable
    "FlushMetadataCacheProcedure"->>"GlueCache": flushCache/invalidatePartition/invalidateTable
Loading

Class diagram for updated HiveMetastoreFactory and StaticHiveMetastoreFactory

classDiagram
    class HiveMetastoreFactory {
        +createMetastore(Optional<ConnectorIdentity>) HiveMetastore
        +hasBuiltInCaching() boolean
        +isImpersonationEnabled() boolean
        +ofInstance(HiveMetastore, boolean) HiveMetastoreFactory
    }
    class StaticHiveMetastoreFactory {
        -HiveMetastore metastore
        -boolean impersonationEnabled
        +StaticHiveMetastoreFactory(HiveMetastore, boolean)
        +isImpersonationEnabled() boolean
        +createMetastore(Optional<ConnectorIdentity>) HiveMetastore
    }
    HiveMetastoreFactory <|-- StaticHiveMetastoreFactory
Loading

Class diagram for updated HiveMetastoreModule

classDiagram
    class HiveMetastoreModule {
        -Optional<HiveMetastore> metastore
        -boolean impersonationEnabled
        +HiveMetastoreModule(Optional<HiveMetastore>, boolean)
        +setup(Binder)
    }
Loading

Class diagram for updated FlushMetadataCacheProcedure

classDiagram
    class FlushMetadataCacheProcedure {
        -Optional<CachingHiveMetastore> cachingHiveMetastore
        -Optional<GlueCache> glueCache
        -HiveMetastoreFactory hiveMetadataFactory
        +flushMetadataCache(...)
        -doFlushMetadataCache(ConnectorSession, Optional<String>, Optional<String>, List<String>, List<String>)
    }
    class ImpersonationCachingHiveMetastoreFactory {
        +createMetastore(Optional<ConnectorIdentity>) HiveMetastore
    }
    FlushMetadataCacheProcedure --> ImpersonationCachingHiveMetastoreFactory : uses
Loading

File-Level Changes

Change Details Files
Plumb metastoreImpersonationEnabled flag through testing connector factories and runner builder
  • Add boolean flag to TestingHivePlugin constructors and pass to TestingHiveConnectorFactory
  • Extend HiveQueryRunner.Builder with setMetastoreImpersonationEnabled and propagate flag to plugin installation
  • Modify TestingHiveConnectorFactory to accept and forward the impersonation flag
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestingHivePlugin.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/HiveQueryRunner.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestingHiveConnectorFactory.java
Extend HiveConnectorFactory and modules to accept impersonation flag
  • Update HiveConnectorFactory.createConnector signature to include impersonationEnabled and forward to HiveMetastoreModule
  • Modify HiveMetastoreModule constructor to take impersonationEnabled and bind HiveMetastoreFactory with flag
  • Apply false default impersonation flag in Hudi, Iceberg, and Lakehouse module installations
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConnectorFactory.java
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/HiveMetastoreModule.java
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConnectorFactory.java
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/catalog/file/TestingIcebergFileMetastoreCatalogModule.java
plugin/trino-lakehouse/src/main/java/io/trino/plugin/lakehouse/LakehouseHiveModule.java
Add impersonationEnabled parameter to HiveMetastoreFactory.ofInstance
  • Change ofInstance signature to accept boolean impersonationEnabled
  • Store impersonation flag in StaticHiveMetastoreFactory and return it in isImpersonationEnabled
lib/trino-metastore/src/main/java/io/trino/metastore/HiveMetastoreFactory.java
Update calls to HiveMetastoreFactory.ofInstance to include default impersonation flag
  • Pass false in delta-lake, hive-page-sink, and Thrift builder tests when creating the factory
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSplitManager.java
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/metastore/TestingDeltaLakeMetastoreModule.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHivePageSink.java
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestingThriftHiveMetastoreBuilder.java
Support impersonation in FlushMetadataCacheProcedure
  • Adjust guard to skip error when using ImpersonationCachingHiveMetastoreFactory
  • Detect ImpersonationCachingHiveMetastoreFactory and create a per-session CachingHiveMetastore
plugin/trino-hive/src/main/java/io/trino/plugin/hive/procedure/FlushMetadataCacheProcedure.java
Add integration test for flush_metadata_cache with impersonation
  • Create TestThriftMetastoreImpersonation to verify cache flush behavior under impersonation
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestThriftMetastoreImpersonation.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions github-actions bot added hudi Hudi connector iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector lakehouse labels Oct 23, 2025
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestThriftMetastoreImpersonation.java:55-56` </location>
<code_context>
+                .build();
+    }
+
+    @Test
+    void testFlushMetadataCache()
+    {
+        Session alice = Session.builder(getSession()).setIdentity(Identity.ofUser("alice")).build();
</code_context>

<issue_to_address>
**suggestion (testing):** Missing negative test for flush_metadata_cache failure scenarios.

Add a test to cover scenarios where flush_metadata_cache fails, such as when impersonation is enabled but the cache is missing or cannot be flushed, to validate error handling.
</issue_to_address>

### Comment 2
<location> `plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestThriftMetastoreImpersonation.java:56-72` </location>
<code_context>
+    }
+
+    @Test
+    void testFlushMetadataCache()
+    {
+        Session alice = Session.builder(getSession()).setIdentity(Identity.ofUser("alice")).build();
+
+        try (TestTable table = newTrinoTable("test_partition", "(id int, part int) WITH (partitioned_by = ARRAY['part'])", List.of("1, 10"))) {
+            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
+                    .isEqualTo(1L);
+
+            assertUpdate("INSERT INTO " + table.getName() + " VALUES (2, 20)", 1);
+            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
+                    .isEqualTo(1L);
+
+            assertUpdate(alice, "CALL system.flush_metadata_cache(schema_name => CURRENT_SCHEMA, table_name => '" + table.getName() + "')");
+            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
+                    .isEqualTo(2L);
+        }
+    }
+}
</code_context>

<issue_to_address>
**suggestion (testing):** Test only covers a single user scenario; consider multi-user edge cases.

Adding tests with multiple user identities will help verify that cache flushing and partition visibility work as expected for different users.

```suggestion
    void testFlushMetadataCache()
    {
        Session alice = Session.builder(getSession()).setIdentity(Identity.ofUser("alice")).build();
        Session bob = Session.builder(getSession()).setIdentity(Identity.ofUser("bob")).build();

        try (TestTable table = newTrinoTable("test_partition", "(id int, part int) WITH (partitioned_by = ARRAY['part'])", List.of("1, 10"))) {
            // Alice sees 1 partition
            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(1L);

            // Bob sees 1 partition
            assertThat(computeScalar(bob, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(1L);

            // Insert new partition as default session (no user)
            assertUpdate("INSERT INTO " + table.getName() + " VALUES (2, 20)", 1);

            // Alice and Bob still see 1 partition (cache not flushed)
            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(1L);
            assertThat(computeScalar(bob, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(1L);

            // Alice flushes cache
            assertUpdate(alice, "CALL system.flush_metadata_cache(schema_name => CURRENT_SCHEMA, table_name => '" + table.getName() + "')");
            // Alice sees 2 partitions, Bob still sees 1
            assertThat(computeScalar(alice, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(2L);
            assertThat(computeScalar(bob, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(1L);

            // Bob flushes cache
            assertUpdate(bob, "CALL system.flush_metadata_cache(schema_name => CURRENT_SCHEMA, table_name => '" + table.getName() + "')");
            // Now Bob sees 2 partitions
            assertThat(computeScalar(bob, "SELECT count(1) FROM \"" + table.getName() + "$partitions\""))
                    .isEqualTo(2L);
        }
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

private void doFlushMetadataCache(ConnectorSession session, Optional<String> schemaName, Optional<String> tableName, List<String> partitionColumns, List<String> partitionValues)
{
if (cachingHiveMetastore.isEmpty() && glueCache.isEmpty()) {
if (!(hiveMetadataFactory instanceof ImpersonationCachingHiveMetastoreFactory) && cachingHiveMetastore.isEmpty() && glueCache.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't flush procedure also work for other types of caching, not only ImpersonationCaching

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I’m not sure which caching you’re referring to. Could you elaborate?

@ebyhr ebyhr requested review from kokosing and wendigo October 24, 2025 01:53
@ebyhr
Copy link
Member Author

ebyhr commented Oct 24, 2025

CI hit #27082

@ebyhr ebyhr force-pushed the ebi/hive-flush-metadata-cache branch from d7f53d9 to 9b40b97 Compare October 25, 2025 00:39
@ebyhr
Copy link
Member Author

ebyhr commented Oct 25, 2025

Rebased on master to resolve conflicts.

private final boolean impersonationEnabled;

private StaticHiveMetastoreFactory(HiveMetastore metastore)
private StaticHiveMetastoreFactory(HiveMetastore metastore, boolean impersonationEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actual always false in current code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector hive Hive connector hudi Hudi connector iceberg Iceberg connector lakehouse

Development

Successfully merging this pull request may close these issues.

3 participants