Skip to content

DataFormat-aware shallow snapshot v2 support#21742

Draft
ask-kamal-nayan wants to merge 3 commits into
opensearch-project:mainfrom
ask-kamal-nayan:snapshot-v2-dataformat
Draft

DataFormat-aware shallow snapshot v2 support#21742
ask-kamal-nayan wants to merge 3 commits into
opensearch-project:mainfrom
ask-kamal-nayan:snapshot-v2-dataformat

Conversation

@ask-kamal-nayan
Copy link
Copy Markdown
Contributor

Description

Description

Wires DataFormat-Aware (DFA) indices into the existing shallow snapshot v2
path so V2 (pinned-timestamp) snapshots and restores work end-to-end for all the supported data formats.

Builds on #21311 (DFA NRT replication engine and remote-store wiring) by
making the cleanup path format-aware: when the live IndexService is gone but
the snapshotted IndexMetadata is available, cleanup now routes DFA indices
through DataFormatAwareRemoteDirectory so all per-format files are
enumerated, rather than leaking parquet/segments dirs.

What changes

  • New IndexMetadata-aware overloads on
    RemoteSegmentStoreDirectoryFactory.newDirectory(...) and
    RemoteSegmentStoreDirectory.remoteDirectoryCleanup(...). Existing
    overloads are preserved and delegate to the new variants with null.
  • Call sites in BlobStoreRepository, SnapshotsService,
    TransportCleanupRepositoryAction, IndexShard, StoreRecovery, and
    Node updated to thread IndexMetadata through.
  • New IT class DataFormatAwareRestoreShallowSnapshotV2IT (18 tests)
    covering V2 snapshot create / restore / clone / delete / multi-shard /
    concurrent / rename / pinned-timestamp cleanup / catalog-generation
    preservation on DFA indices, plus parameterized variant
    DataFormatAwareRestoreShallowSnapshotV2WithLuceneIT exercising the
    parquet + lucene-secondary code paths.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit 674e230)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The UUID assertion logic is inverted. The code asserts that the restored index UUID must differ from the pre-snapshot source UUID (line 958), but the comment on line 949 states "restore always creates a fresh index with a NEW UUID (not the snapshot's source UUID)". However, the test testV2SnapshotCreateAndRestoreForDFAIndex at line 389 calls assertRestoredIndexMatches with requireSameUUID=true, which contradicts the assertion that UUIDs must differ. This inconsistency will cause test failures or incorrect validation depending on which path is taken.

// For rename, the new UUID is also different. So in both cases the restored UUID must
// (a) be set, (b) differ from the pre-snapshot source UUID.
String restoredUuid = client.admin()
    .indices()
    .prepareGetSettings(restoredIndexName)
    .get()
    .getSetting(restoredIndexName, IndexMetadata.SETTING_INDEX_UUID);
assertNotNull("restored index UUID must be set", restoredUuid);
assertFalse("restored index UUID must be non-empty", restoredUuid.isEmpty());
assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
Possible Issue

Variable localFiles is used at line 2646 but is never declared or initialized in the visible scope. The code references localFiles.removeAll(RemoteStoreRefreshListener.EXCLUDE_FILES) and uploadFiles.containsAll(localFiles), but localFiles is not defined in the isRemoteSegmentStoreInSync method shown in the diff. This will cause a compilation error unless the variable is declared elsewhere in the method (not visible in the diff context).

Possible Issue

The deprecated 6-arg remoteDirectoryCleanup overload delegates to the 7-arg variant with null IndexMetadata. The comment warns this may leak per-format files for DFA indices, but the method is still marked @PublicApi (inherited from the class). Callers using this overload on DFA indices will silently leak files during cleanup. The deprecation notice should be more prominent (e.g., logged at runtime) or the method should throw an exception for DFA indices to prevent silent data leaks.

public static void remoteDirectoryCleanup(
    RemoteSegmentStoreDirectoryFactory remoteDirectoryFactory,
    String remoteStoreRepoForIndex,
    String indexUUID,
    ShardId shardId,
    RemoteStorePathStrategy pathStrategy,
    boolean forceClean
) {
    remoteDirectoryCleanup(remoteDirectoryFactory, remoteStoreRepoForIndex, indexUUID, shardId, pathStrategy, forceClean, null);
}
Data Loss Risk

The comment at line 1715 states "V1 shallow-copy is a Lucene-only mode — DFA indices use V2 snapshots only. If a DFA index were ever associated with a V1 snapshot, per-format files (e.g., parquet/) would leak on cleanup". However, the code passes null for IndexMetadata without any runtime check to prevent V1 snapshots of DFA indices. If a DFA index is mistakenly snapshotted via V1, cleanup will silently leak files. This should either throw an exception or log a warning when indexMetadata is null and the index is DFA-enabled.

    false,
    null  // V1 shallow-copy is a Lucene-only mode — DFA indices use V2 snapshots only. If a DFA index were ever associated with
          // a V1 snapshot, per-format files (e.g., parquet/) would leak on cleanup; tracked as a known limitation.
);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Suggestions ✨

Latest suggestions up to 674e230

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Unused parameter causes incorrect validation

The requireSameUUID parameter is never used in the method body. The code
unconditionally asserts that the restored UUID differs from the pre-snapshot UUID,
ignoring the requireSameUUID flag. This contradicts the method's contract where
requireSameUUID=true should verify UUID preservation.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [906-964]

 protected void assertRestoredIndexMatches(
     Client client,
     String restoredIndexName,
     IndexShard restoredShard,
     PreSnapshotState pre,
     boolean requireSameUUID
 ) throws IOException {
     ...
-    // UUID check: restore always creates a fresh index with a NEW UUID (not the snapshot's source UUID).
-    // For rename, the new UUID is also different. So in both cases the restored UUID must
-    // (a) be set, (b) differ from the pre-snapshot source UUID.
     String restoredUuid = client.admin()
         .indices()
         .prepareGetSettings(restoredIndexName)
         .get()
         .getSetting(restoredIndexName, IndexMetadata.SETTING_INDEX_UUID);
     assertNotNull("restored index UUID must be set", restoredUuid);
     assertFalse("restored index UUID must be non-empty", restoredUuid.isEmpty());
-    assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
+    if (requireSameUUID) {
+        assertEquals("restored index must preserve source UUID when requireSameUUID=true", pre.indexUUID, restoredUuid);
+    } else {
+        assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
+    }
     ...
 }
Suggestion importance[1-10]: 9

__

Why: The requireSameUUID parameter is completely ignored in the method body, causing incorrect validation logic. The method always asserts that UUIDs differ, contradicting the intended contract where requireSameUUID=true should verify UUID preservation. This is a critical correctness bug that breaks the test's validation semantics.

High
Incorrect boolean flag contradicts restore semantics

The test passes requireSameUUID=true but the comment states "restore always creates
a fresh index with a NEW UUID". This is contradictory. Based on OpenSearch restore
semantics, restored indices get new UUIDs, so the flag should be false.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [389]

 public void testV2SnapshotCreateAndRestoreForDFAIndex() throws Exception {
     ...
-    assertRestoredIndexMatches(client, indexName, shardAfter, pre, /* requireSameUUID */ true);
+    assertRestoredIndexMatches(client, indexName, shardAfter, pre, /* requireSameUUID */ false);
     ...
 }
Suggestion importance[1-10]: 8

__

Why: The test passes requireSameUUID=true but the code comment and OpenSearch restore semantics indicate restored indices always get new UUIDs. This contradicts the expected behavior and could mask validation failures. The flag should be false to match the documented restore behavior.

Medium
General
Path hierarchy validation missing

The method counts files where the path contains both indexUUID and categoryDirName
as directory components. However, it doesn't verify that categoryDirName appears
after indexUUID in the path hierarchy, which could lead to false positives if these
names appear in unrelated parent directories.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1549-1580]

 private static long countFilesUnder(Path rootPath, String indexUUID, String categoryDirName) throws IOException {
     if (Files.exists(rootPath) == false) return 0;
     long[] count = { 0 };
     Files.walkFileTree(rootPath, new java.nio.file.SimpleFileVisitor<>() {
         @Override
         public java.nio.file.FileVisitResult visitFile(Path file, java.nio.file.attribute.BasicFileAttributes attrs) {
-            boolean hasUuid = false;
-            boolean hasCategory = false;
+            boolean foundUuid = false;
             for (Path part : file) {
                 String name = part.toString();
                 if (indexUUID.equals(name)) {
-                    hasUuid = true;
-                } else if (categoryDirName.equals(name)) {
-                    hasCategory = true;
+                    foundUuid = true;
+                } else if (foundUuid && categoryDirName.equals(name)) {
+                    count[0]++;
+                    return java.nio.file.FileVisitResult.CONTINUE;
                 }
-            }
-            if (hasUuid && hasCategory) {
-                count[0]++;
             }
             return java.nio.file.FileVisitResult.CONTINUE;
         }
         ...
     });
     return count[0];
 }
Suggestion importance[1-10]: 7

__

Why: The method counts files where indexUUID and categoryDirName appear anywhere in the path, without verifying categoryDirName appears after indexUUID in the hierarchy. This could lead to false positives if these names appear in unrelated parent directories, potentially causing incorrect cleanup validation results.

Medium

Previous suggestions

Suggestions up to commit ec0519d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix UUID assertion logic inconsistency

The UUID assertion logic contradicts the requireSameUUID parameter. When
requireSameUUID is true (e.g., in testV2SnapshotCreateAndRestoreForDFAIndex), the
code should verify that the UUID matches the pre-snapshot UUID, not that it differs.
The current implementation always asserts inequality, making the requireSameUUID
parameter ineffective.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [956-959]

 protected void assertRestoredIndexMatches(
     Client client,
     String restoredIndexName,
     IndexShard restoredShard,
     PreSnapshotState pre,
     boolean requireSameUUID
 ) throws IOException {
     ...
-    // UUID check: restore always creates a fresh index with a NEW UUID (not the snapshot's source UUID).
-    // For rename, the new UUID is also different. So in both cases the restored UUID must
-    // (a) be set, (b) differ from the pre-snapshot source UUID.
     String restoredUuid = client.admin()
         .indices()
         .prepareGetSettings(restoredIndexName)
         .get()
         .getSetting(restoredIndexName, IndexMetadata.SETTING_INDEX_UUID);
     assertNotNull("restored index UUID must be set", restoredUuid);
     assertFalse("restored index UUID must be non-empty", restoredUuid.isEmpty());
-    assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
+    if (requireSameUUID) {
+        assertEquals("restored index must preserve the source UUID when requireSameUUID=true", pre.indexUUID, restoredUuid);
+    } else {
+        assertNotEquals("restored index must have a fresh UUID when requireSameUUID=false", pre.indexUUID, restoredUuid);
+    }
     ...
 }
Suggestion importance[1-10]: 9

__

Why: The UUID assertion logic directly contradicts the requireSameUUID parameter. When requireSameUUID=true (line 389), the code should verify UUID equality, but it always asserts inequality (line 958), making the parameter meaningless. This is a critical correctness bug that causes the test to validate the wrong behavior.

High
General
Add parquet directory cleanup validation

The test only validates cleanup of segments/ and translog/ directories but claims to
verify "ALL format directories" for DFA. For DFA indices with parquet format, the
test should also verify that format-specific directories (e.g., parquet/) are
cleaned up. Without this check, the test may pass even if per-format files leak.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1537-1541]

 public void testV2DeleteSnapshotCleansUpAllFormatFilesForDFA() throws Exception {
     ...
     assertBusy(() -> {
         long segCount = countFilesUnder(remoteRepoPath, indexUUID, "segments");
         long translogCount = countFilesUnder(remoteRepoPath, indexUUID, "translog");
+        long parquetCount = countFilesUnder(remoteRepoPath, indexUUID, "parquet");
         assertEquals("segments/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, segCount);
         assertEquals("translog/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, translogCount);
+        assertEquals("parquet/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, parquetCount);
     }, 120, TimeUnit.SECONDS);
 }
Suggestion importance[1-10]: 8

__

Why: The test claims to verify "ALL format directories" cleanup for DFA (line 1470-1475) but only checks segments/ and translog/. For DFA indices with parquet format, omitting parquet/ directory validation means the test may pass even if per-format files leak, defeating the test's stated purpose.

Medium
Add warning for deprecated cleanup method

The deprecated 6-arg overload passes null for IndexMetadata, which causes DFA
indices to leak per-format files during cleanup. The deprecation notice warns about
this but doesn't prevent usage. Consider logging a warning when this method is
called to alert operators of potential file leaks.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java [1343-1352]

 @Deprecated
 public static void remoteDirectoryCleanup(
     RemoteSegmentStoreDirectoryFactory remoteDirectoryFactory,
     String remoteStoreRepoForIndex,
     String indexUUID,
     ShardId shardId,
     RemoteStorePathStrategy pathStrategy,
     boolean forceClean
 ) {
+    logger.warn("Using deprecated remoteDirectoryCleanup without IndexMetadata may leak per-format files for DFA indices. " +
+                "Use the 7-arg variant that accepts IndexMetadata. indexUUID={}, shardId={}", indexUUID, shardId);
     remoteDirectoryCleanup(remoteDirectoryFactory, remoteStoreRepoForIndex, indexUUID, shardId, pathStrategy, forceClean, null);
 }
Suggestion importance[1-10]: 5

__

Why: The deprecated 6-arg overload passes null for IndexMetadata, which the deprecation notice warns may leak per-format files for DFA indices. Adding a runtime warning would alert operators to potential issues, though this is a minor improvement since the deprecation is already documented.

Low
Suggestions up to commit ec0519d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix UUID assertion logic inconsistency

The UUID assertion logic contradicts the requireSameUUID parameter. When
requireSameUUID is true (e.g., in testV2SnapshotCreateAndRestoreForDFAIndex), the
code should verify that the UUID matches the pre-snapshot UUID, not that it differs.
The current implementation always asserts inequality, making the requireSameUUID
parameter ineffective.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [956-959]

 protected void assertRestoredIndexMatches(
     Client client,
     String restoredIndexName,
     IndexShard restoredShard,
     PreSnapshotState pre,
     boolean requireSameUUID
 ) throws IOException {
     ...
-    // UUID check: restore always creates a fresh index with a NEW UUID (not the snapshot's source UUID).
-    // For rename, the new UUID is also different. So in both cases the restored UUID must
-    // (a) be set, (b) differ from the pre-snapshot source UUID.
     String restoredUuid = client.admin()
         .indices()
         .prepareGetSettings(restoredIndexName)
         .get()
         .getSetting(restoredIndexName, IndexMetadata.SETTING_INDEX_UUID);
     assertNotNull("restored index UUID must be set", restoredUuid);
     assertFalse("restored index UUID must be non-empty", restoredUuid.isEmpty());
-    assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
+    if (requireSameUUID) {
+        assertEquals("restored index must preserve source UUID when requireSameUUID=true", pre.indexUUID, restoredUuid);
+    } else {
+        assertNotEquals("restored index must have a fresh UUID when requireSameUUID=false", pre.indexUUID, restoredUuid);
+    }
     ...
 }
Suggestion importance[1-10]: 9

__

Why: The requireSameUUID parameter is ignored in the UUID assertion logic. The code always asserts that the restored UUID differs from the pre-snapshot UUID, even when requireSameUUID=true is passed (e.g., in testV2SnapshotCreateAndRestoreForDFAIndex at line 389). This is a critical logic bug that makes the parameter meaningless and causes incorrect validation.

High
Verify parquet directory cleanup

The test only validates cleanup of segments/ and translog/ directories but claims to
verify "ALL format directories" for DFA. For DFA indices with parquet format, the
test should also verify that parquet/ directory files are cleaned up, otherwise
format-specific files could leak undetected.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1536-1541]

 public void testV2DeleteSnapshotCleansUpAllFormatFilesForDFA() throws Exception {
     ...
     assertBusy(() -> {
         long segCount = countFilesUnder(remoteRepoPath, indexUUID, "segments");
         long translogCount = countFilesUnder(remoteRepoPath, indexUUID, "translog");
+        long parquetCount = countFilesUnder(remoteRepoPath, indexUUID, "parquet");
         assertEquals("segments/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, segCount);
         assertEquals("translog/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, translogCount);
+        assertEquals("parquet/ subtree must be cleaned up for indexUUID=" + indexUUID, 0, parquetCount);
     }, 120, TimeUnit.SECONDS);
 }
Suggestion importance[1-10]: 8

__

Why: The test testV2DeleteSnapshotCleansUpAllFormatFilesForDFA claims to verify cleanup of "ALL format directories" but only checks segments/ and translog/. For DFA indices with parquet as the primary format, the test should also verify parquet/ directory cleanup to catch format-specific file leaks, which is the core purpose of this DFA-specific test.

Medium
General
Add deprecation warning for cleanup

The deprecated 6-arg overload passes null for IndexMetadata, which prevents DFA
indices from routing to DataFormatAwareRemoteDirectory during cleanup. This can
cause per-format files to leak. Consider logging a warning when this deprecated
method is called to alert operators of potential cleanup issues.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java [1343-1352]

 @Deprecated
 public static void remoteDirectoryCleanup(
     RemoteSegmentStoreDirectoryFactory remoteDirectoryFactory,
     String remoteStoreRepoForIndex,
     String indexUUID,
     ShardId shardId,
     RemoteStorePathStrategy pathStrategy,
     boolean forceClean
 ) {
+    logger.warn("Using deprecated remoteDirectoryCleanup without IndexMetadata. DFA per-format files may leak for index {}", indexUUID);
     remoteDirectoryCleanup(remoteDirectoryFactory, remoteStoreRepoForIndex, indexUUID, shardId, pathStrategy, forceClean, null);
 }
Suggestion importance[1-10]: 5

__

Why: The deprecated 6-arg remoteDirectoryCleanup overload passes null for IndexMetadata, which prevents DFA indices from routing to DataFormatAwareRemoteDirectory during cleanup. Adding a warning would alert operators to potential per-format file leaks when this deprecated method is used, improving observability of a known limitation.

Low
Suggestions up to commit 97d7d61
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unused requireSameUUID parameter logic

The UUID assertion logic contradicts the requireSameUUID parameter which is never
used. The method always asserts that the restored UUID differs from the pre-snapshot
UUID, but the parameter suggests some callers expect the UUID to be preserved. This
creates a logic inconsistency that could mask bugs in UUID handling during restore
operations.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [906-964]

 protected void assertRestoredIndexMatches(
     Client client,
     String restoredIndexName,
     IndexShard restoredShard,
     PreSnapshotState pre,
     boolean requireSameUUID
 ) throws IOException {
     ...
     String restoredUuid = client.admin()
         .indices()
         .prepareGetSettings(restoredIndexName)
         .get()
         .getSetting(restoredIndexName, IndexMetadata.SETTING_INDEX_UUID);
     assertNotNull("restored index UUID must be set", restoredUuid);
     assertFalse("restored index UUID must be non-empty", restoredUuid.isEmpty());
-    assertNotEquals("restored index always has a fresh UUID, not the source UUID", pre.indexUUID, restoredUuid);
+    if (requireSameUUID) {
+        assertEquals("restored index UUID must match source UUID when requireSameUUID=true", pre.indexUUID, restoredUuid);
+    } else {
+        assertNotEquals("restored index always has a fresh UUID when requireSameUUID=false", pre.indexUUID, restoredUuid);
+    }
     ...
 }
Suggestion importance[1-10]: 9

__

Why: The suggestion identifies a critical logic bug where the requireSameUUID parameter is completely ignored, making the method always assert that UUIDs differ. This contradicts the parameter's purpose and could mask UUID-handling bugs. The fix correctly implements conditional logic based on the parameter, ensuring the method behaves as its signature promises.

High
Fix file counting path hierarchy logic

The file counting logic may produce incorrect results because it checks if indexUUID
and categoryDirName appear anywhere in the path components, not necessarily in the
correct hierarchical relationship. A file under a different index UUID's category
directory could be incorrectly counted if the target UUID appears elsewhere in the
path.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1549-1580]

 private static long countFilesUnder(Path rootPath, String indexUUID, String categoryDirName) throws IOException {
     if (Files.exists(rootPath) == false) return 0;
     long[] count = { 0 };
     Files.walkFileTree(rootPath, new java.nio.file.SimpleFileVisitor<>() {
         @Override
         public java.nio.file.FileVisitResult visitFile(Path file, java.nio.file.attribute.BasicFileAttributes attrs) {
-            boolean hasUuid = false;
-            boolean hasCategory = false;
-            for (Path part : file) {
-                String name = part.toString();
-                if (indexUUID.equals(name)) {
-                    hasUuid = true;
-                } else if (categoryDirName.equals(name)) {
-                    hasCategory = true;
+            Path parent = file.getParent();
+            if (parent != null && categoryDirName.equals(parent.getFileName().toString())) {
+                boolean hasUuid = false;
+                for (Path part : parent) {
+                    if (indexUUID.equals(part.toString())) {
+                        hasUuid = true;
+                        break;
+                    }
                 }
-            }
-            if (hasUuid && hasCategory) {
-                count[0]++;
+                if (hasUuid) {
+                    count[0]++;
+                }
             }
             return java.nio.file.FileVisitResult.CONTINUE;
         }
         ...
     });
     return count[0];
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion identifies a real bug where the current logic incorrectly counts files if indexUUID and categoryDirName appear anywhere in the path, not necessarily in the correct parent-child relationship. The improved code correctly validates the hierarchical structure by checking that categoryDirName is the immediate parent and indexUUID is an ancestor, preventing false positives in file counting.

Medium
Add null-safety check for index stats

Add null-safety check before accessing getIndex(indexName) to prevent potential
NullPointerException if the index stats are unavailable. The stats API may return
null for indices that are in an intermediate state during restore or deletion.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [199-213]

 private void assertDocCountInIndex(Client client, String indexName, long expectedCount) {
-    long actualCount = client.admin()
+    var indexStats = client.admin()
         .indices()
         .prepareStats(indexName)
         .clear()
         .setDocs(true)
         .get()
-        .getIndex(indexName)
-        .getPrimaries().docs.getCount();
+        .getIndex(indexName);
+    assertNotNull("index stats must be available for " + indexName, indexStats);
+    long actualCount = indexStats.getPrimaries().docs.getCount();
     assertEquals(
         "doc count mismatch for index " + indexName + " (expected " + expectedCount + ", got " + actualCount + ")",
         expectedCount,
         actualCount
     );
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential NullPointerException risk when getIndex(indexName) returns null during intermediate states. Adding a null check improves robustness, though the impact is moderate since test environments typically have stable index states.

Medium
General
Preserve index settings during IndexSettings construction

The method creates IndexSettings with Settings.EMPTY which may not preserve critical
node-level settings that affect directory behavior. This could cause inconsistent
behavior between the original index creation and cleanup operations, particularly
for DFA-enabled indices that rely on node settings for format routing.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryFactory.java [105-121]

 public Directory newDirectory(
     String repositoryName,
     String indexUUID,
     ShardId shardId,
     RemoteStorePathStrategy pathStrategy,
     IndexMetadata indexMetadata
 ) throws IOException {
-    IndexSettings indexSettings = indexMetadata != null ? new IndexSettings(indexMetadata, Settings.EMPTY) : null;
+    IndexSettings indexSettings = indexMetadata != null ? new IndexSettings(indexMetadata, indexMetadata.getSettings()) : null;
     return newDirectory(repositoryName, indexUUID, shardId, pathStrategy, null, false, false, indexSettings);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion points out that using Settings.EMPTY may lose important settings from IndexMetadata. However, the improved code indexMetadata.getSettings() is already the source of the IndexMetadata object, so this change is redundant. The original code correctly passes Settings.EMPTY as the node-level settings parameter to IndexSettings constructor, which is appropriate for this cleanup context where node settings aren't available.

Low
Suggestions up to commit 97d7d61
CategorySuggestion                                                                                                                                    Impact
General
Log warning for deprecated cleanup

The deprecated method delegates to the new 7-arg variant with a null IndexMetadata
parameter, which may cause DFA indices to fall back to plain RemoteDirectory
behavior and leak per-format files during cleanup. Consider logging a warning when
this deprecated method is called to alert operators that they should migrate to the
new API for proper DFA cleanup.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java [1337-1346]

 @Deprecated
 public static void remoteDirectoryCleanup(
     RemoteSegmentStoreDirectoryFactory remoteDirectoryFactory,
     String remoteStoreRepoForIndex,
     String indexUUID,
     ShardId shardId,
     RemoteStorePathStrategy pathStrategy,
     boolean forceClean
 ) {
+    logger.warn(
+        "Using deprecated remoteDirectoryCleanup without IndexMetadata for shard [{}]. "
+        + "DFA indices may not clean up per-format files correctly. Migrate to the 7-arg variant.",
+        shardId
+    );
     remoteDirectoryCleanup(remoteDirectoryFactory, remoteStoreRepoForIndex, indexUUID, shardId, pathStrategy, forceClean, null);
 }
Suggestion importance[1-10]: 6

__

Why: Adding a warning log when the deprecated method is called would help operators identify callsites that need migration for proper DFA cleanup. However, the suggestion doesn't account for the fact that logger is not a static field in this context, so the improved code would not compile as-is.

Low
Optimize path matching performance

The file counting logic iterates through all path components for every file, which
is inefficient for deep directory trees. The path matching could be optimized by
converting the file path to a string once and checking if it contains both the UUID
and category as path segments, reducing repeated string conversions.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1549-1570]

 private static long countFilesUnder(Path rootPath, String indexUUID, String categoryDirName) throws IOException {
     if (Files.exists(rootPath) == false) return 0;
     long[] count = { 0 };
     Files.walkFileTree(rootPath, new java.nio.file.SimpleFileVisitor<>() {
         @Override
         public java.nio.file.FileVisitResult visitFile(Path file, java.nio.file.attribute.BasicFileAttributes attrs) {
-            boolean hasUuid = false;
-            boolean hasCategory = false;
-            for (Path part : file) {
-                String name = part.toString();
-                if (indexUUID.equals(name)) {
-                    hasUuid = true;
-                } else if (categoryDirName.equals(name)) {
-                    hasCategory = true;
-                }
-            }
-            if (hasUuid && hasCategory) {
+            String pathStr = file.toString();
+            String separator = file.getFileSystem().getSeparator();
+            String uuidSegment = separator + indexUUID + separator;
+            String categorySegment = separator + categoryDirName + separator;
+            if (pathStr.contains(uuidSegment) && pathStr.contains(categorySegment)) {
                 count[0]++;
             }
             return java.nio.file.FileVisitResult.CONTINUE;
         }
         ...
Suggestion importance[1-10]: 4

__

Why: The suggestion offers a minor performance optimization by reducing repeated toString() calls on path components. However, the impact is minimal for typical test scenarios with shallow directory trees, and the original code is more explicit about matching path segments (not substrings).

Low
Suggestions up to commit 97d7d61
CategorySuggestion                                                                                                                                    Impact
General
Verify block before unblocking

The test blocks snapshot finalization but doesn't verify that the block was actually
hit before unblocking. If the snapshot completes before the block is established,
the test may pass incorrectly. Add an assertion after waitForBlock to confirm the
snapshot is actually blocked before calling unblockNode.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1788-1794]

-public void testV2SnapshotFailInFinalizeDoesNotCorruptDFAState() throws Exception {
-    ...
-    // Block finalization, fire snapshot, wait for block, then unblock to fail it
-    blockClusterManagerFromFinalizingSnapshotOnIndexFile(snapshotRepoName);
-    final org.opensearch.common.action.ActionFuture<
-        org.opensearch.action.admin.cluster.snapshots.create.CreateSnapshotResponse> snapshotFuture = startFullSnapshot(
-            snapshotRepoName,
-            "snap-fail"
-        );
-    awaitNumberOfSnapshotsInProgress(1);
-    waitForBlock(clusterManagerNode, snapshotRepoName, TimeValue.timeValueSeconds(30L));
-    unblockNode(snapshotRepoName, clusterManagerNode);
+blockClusterManagerFromFinalizingSnapshotOnIndexFile(snapshotRepoName);
+final org.opensearch.common.action.ActionFuture<
+    org.opensearch.action.admin.cluster.snapshots.create.CreateSnapshotResponse> snapshotFuture = startFullSnapshot(
+        snapshotRepoName,
+        "snap-fail"
+    );
+awaitNumberOfSnapshotsInProgress(1);
+waitForBlock(clusterManagerNode, snapshotRepoName, TimeValue.timeValueSeconds(30L));
+assertTrue("snapshot must be blocked before unblocking", snapshotFuture.isDone() == false);
+unblockNode(snapshotRepoName, clusterManagerNode);
 
-    // The blocked snapshot must throw SnapshotException
-    expectThrows(org.opensearch.snapshots.SnapshotException.class, snapshotFuture::actionGet);
+expectThrows(org.opensearch.snapshots.SnapshotException.class, snapshotFuture::actionGet);
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential race condition where the snapshot might complete before the block is established. Adding an assertion to verify snapshotFuture.isDone() == false after waitForBlock strengthens the test's reliability by ensuring the block was actually hit before unblocking. This is a valid improvement for test robustness.

Medium
Use AtomicLong for accumulator

The method uses a single-element array to accumulate the count inside the visitor,
which is a common workaround for effectively-final constraints. However, this
pattern is error-prone and can be replaced with AtomicLong for clearer intent and
thread-safety, even though the visitor runs single-threaded here.

sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/DataFormatAwareRestoreShallowSnapshotV2IT.java [1549-1579]

 private static long countFilesUnder(Path rootPath, String indexUUID, String categoryDirName) throws IOException {
     if (Files.exists(rootPath) == false) return 0;
-    long[] count = { 0 };
+    java.util.concurrent.atomic.AtomicLong count = new java.util.concurrent.atomic.AtomicLong(0);
     Files.walkFileTree(rootPath, new java.nio.file.SimpleFileVisitor<>() {
         @Override
         public java.nio.file.FileVisitResult visitFile(Path file, java.nio.file.attribute.BasicFileAttributes attrs) {
             boolean hasUuid = false;
             boolean hasCategory = false;
             for (Path part : file) {
                 String name = part.toString();
                 if (indexUUID.equals(name)) {
                     hasUuid = true;
                 } else if (categoryDirName.equals(name)) {
                     hasCategory = true;
                 }
             }
             if (hasUuid && hasCategory) {
-                count[0]++;
+                count.incrementAndGet();
             }
             return java.nio.file.FileVisitResult.CONTINUE;
         }
         ...
     });
-    return count[0];
+    return count.get();
 }
Suggestion importance[1-10]: 4

__

Why: While AtomicLong is clearer than a single-element array, this is a minor style improvement. The existing pattern is common in Java for effectively-final constraints and works correctly. The suggestion doesn't address a bug or significant maintainability issue, so the impact is low.

Low

Signed-off-by: Kamal Nayan <askkamal@amazon.com>
@ask-kamal-nayan ask-kamal-nayan force-pushed the snapshot-v2-dataformat branch from d803bb2 to 97d7d61 Compare May 19, 2026 16:52
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 97d7d61

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 97d7d61: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 97d7d61

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 97d7d61: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 97d7d61

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ec0519d

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ec0519d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ec0519d

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for ec0519d: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 46.15385% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.40%. Comparing base (debf6ed) to head (ec0519d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/opensearch/index/shard/StoreRecovery.java 0.00% 2 Missing ⚠️
...earch/index/store/RemoteSegmentStoreDirectory.java 0.00% 2 Missing ⚠️
...ndex/store/RemoteSegmentStoreDirectoryFactory.java 0.00% 2 Missing ⚠️
...e/metadata/TransportRemoteStoreMetadataAction.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21742      +/-   ##
============================================
+ Coverage     73.33%   73.40%   +0.07%     
- Complexity    74996    75047      +51     
============================================
  Files          6012     6012              
  Lines        340934   340939       +5     
  Branches      49076    49076              
============================================
+ Hits         250021   250280     +259     
+ Misses        71005    70742     -263     
- Partials      19908    19917       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Kamal Nayan added 2 commits May 21, 2026 19:35
Signed-off-by: Kamal Nayan <askkamal@amazon.com>
Signed-off-by: Kamal Nayan <askkamal@amazon.com>
@ask-kamal-nayan ask-kamal-nayan force-pushed the snapshot-v2-dataformat branch from ec0519d to 674e230 Compare May 21, 2026 14:06
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 674e230

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant