Feature/data format aware read only engine by vishwasgarg18 · Pull Request #21720 · opensearch-project/OpenSearch

vishwasgarg18 · 2026-05-18T18:04:05Z

Description

This PR introduces a data-format-aware read-only engine for warm (tiered) primaries in OpenSearch, along with the tiering service infrastructure needed to safely transition indices between hot and warm tiers.

Core Engine: `DataFormatAwareReadOnlyEngine`

Introduces a purpose-built read-only engine for primary shards on warm indices (isWarmIndex()). The engine operates on a CatalogSnapshot, has no IndexWriter, uses NoOpTranslogManager, and rejects all write operations.
Reads route through TieredSubdirectoryAwareDirectory to remote (warm) storage.
DataFormatAwareIndexerFactory updated to create the read-only engine for warm primaries.
Integration tests and unit tests added for the read-only engine.

Tiering Service Improvements

TransportPrepareTieringAction: new broadcast action that flushes and syncs all primaries before tiering begins.
TransportHotToWarmTierAction: adds read-only block, prepare step, retry logic, and DFA-aware tiering flow.
TieringService: sets auto_expand_replicas during the tiering transition.
HotToWarmTieringService: sets read_only_allow_delete on cancel to protect segments.
WarmToHotTieringService: removes the read-only block when transitioning back to hot.
IndexShard: fixes isRemoteSegmentStoreInSync for DFA using CatalogSnapshot.
Added already-warm guard to prevent redundant hot-to-warm tiering requests.
Added no-op deletion policy to protect warm-tier segment files.

TieredObjectStore: Fix for Warm PPL Queries

try_head_from_registry: uses prefix matching to detect directory HEAD requests and returns NotFound (matching LocalFileSystem behavior), allowing DataFusion to fall through to list_with_cache. Removes the fragile path_str.contains('.') heuristic.

Related Issues

Resolves #

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-05-18T18:05:26Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit ab301fd.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java	185	medium	checkRequestBlock deliberately returns null, bypassing all index-level cluster block checks. The comment justifies this as intentional (the action runs after a read_only_allow_delete block is set), but unconditionally skipping index-level authorization checks in a transport action is a security control bypass that warrants review — a malicious caller with access to this internal action name could operate on blocked indices.
sandbox/plugins/native-repository-fs/src/main/resources/META-INF/services/org.opensearch.plugins.NativeRemoteObjectStoreProvider	9	low	New SPI registration adding FsNativeObjectStorePlugin as a production NativeRemoteObjectStoreProvider. This is a net-new service provider entry in main/resources (not previously in test or main scope), meaning this plugin will be auto-discovered and loaded via Java SPI in all production deployments of this module.
sandbox/plugins/native-repository-azure/src/main/resources/META-INF/services/org.opensearch.plugins.NativeRemoteObjectStoreProvider	1	low	Service provider files for Azure, GCS, and S3 NativeRemoteObjectStoreProviders are moved from internalClusterTest/resources to main/resources. This broadens these SPI registrations from test-only scope to production scope, meaning these providers will now be auto-loaded in production deployments rather than only during integration tests. Same pattern applies to the GCS and S3 files.
server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java	107	low	addReadOnlyBlockAndPrepare sets INDEX_BLOCKS_READ_ONLY_ALLOW_DELETE on arbitrary caller-supplied index names via a cluster state update task. Failure paths attempt cleanup but do so asynchronously and best-effort. If cleanup fails (logged as warn), the index is left in a read_only_allow_delete state permanently, which is a potential availability impact vector for an authenticated user with tiering privileges.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 0 | Medium: 1 | Low: 3

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-05-18T19:20:15Z

PR Reviewer Guide 🔍

(Review updated until commit `5165667`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Possible Issue The `permanentSnapshotRef` is acquired in the constructor and released in `closeNoLock()`. However, if the constructor fails after acquiring the ref but before setting `success = true`, the ref is never released because `closeNoLock()` is not called in the finally block. This leaks the snapshot reference. The finally block only closes `readerManagersRef`, `catalogSnapshotManagerRef`, and `constructingCommitter`, but not `permanentSnapshotRef`. public DataFormatAwareReadOnlyEngine(EngineConfig engineConfig) { this.logger = Loggers.getLogger(DataFormatAwareReadOnlyEngine.class, engineConfig.getShardId()); assert engineConfig.isReadOnlyReplica() == false : "DataFormatAwareReadOnlyEngine must only be created for primary shards; shard " + engineConfig.getShardId(); this.engineConfig = engineConfig; this.shardId = engineConfig.getShardId(); this.store = engineConfig.getStore(); store.incRef(); Map<DataFormat, EngineReaderManager<?>> readerManagersRef = null; CatalogSnapshotManager catalogSnapshotManagerRef = null; Committer constructingCommitter = null; boolean success = false; try { // Create committer: isReplica=true opens the index without acquiring write.lock. // LuceneCommitterFactory detects isWarmIndex + !isReadOnlyReplica and returns // ReadOnlyCommitter, which throws on commit() — this engine never writes commits. this.committer = constructingCommitter = engineConfig.getCommitterFactory() .getCommitter(new CommitterConfig(engineConfig, () -> {}, true)); Map<String, String> userData = committer.getLastCommittedData(); // Catalog file deleter FileDeleter compositeDeleter = buildFileDeleter(); List<CatalogSnapshot> committed = committer.listCommittedSnapshots(); // Build per-format reader managers (same pattern as NRT) DataFormatRegistry registry = engineConfig.getDataFormatRegistry(); Map<String, Supplier<DataFormatDescriptor>> allDescriptors = registry.getFormatDescriptors(engineConfig.getIndexSettings()); Map<DataFormat, EngineReaderManager<?>> aggregated = new HashMap<>(); for (String formatName : allDescriptors.keySet()) { DataFormat format = registry.format(formatName); aggregated.putAll( registry.getReaderManager( new ReaderManagerConfig( Optional.of((IndexStoreProvider) dataFormat -> () -> store), format, registry, store.shardPath(), store.getDataformatAwareStoreHandles() ) ) ); } readerManagersRef = Map.copyOf(aggregated); this.readerManagers = readerManagersRef; // Register reader managers as catalog snapshot lifecycle listeners so they are notified // (via installSnapshot → afterRefresh) BEFORE latestCatalogSnapshot is swapped. This // guarantees readers are updated before the snapshot becomes externally visible. List<CatalogSnapshotLifecycleListener> snapshotListeners = new ArrayList<>(readerManagersRef.values()); catalogSnapshotManagerRef = new CatalogSnapshotManager( committed, new NoOpCatalogSnapshotDeletionPolicy(), compositeDeleter, Map.of(), snapshotListeners, store.shardPath(), committer ); this.catalogSnapshotManager = catalogSnapshotManagerRef; // Initialize sequence number tracking final SequenceNumbers.CommitInfo seqNoInfo = SequenceNumbers.loadSeqNoInfoFromLuceneCommit(userData.entrySet()); this.localCheckpointTracker = new LocalCheckpointTracker(seqNoInfo.maxSeqNo, seqNoInfo.localCheckpoint); // Initialize no-op translog String translogUUID = userData.get(Translog.TRANSLOG_UUID_KEY); this.translogManager = new NoOpTranslogManager( shardId, readLock, this::ensureOpen, new TranslogStats(0, 0, 0, 0, 0), Translog.EMPTY_TRANSLOG_SNAPSHOT, translogUUID != null ? translogUUID : "readonly-" + shardId, true ); // History UUID — fabricate if absent (same as NRT engine) for recovery protocol compliance this.historyUUID = userData.get(Engine.HISTORY_UUID_KEY); if (this.historyUUID == null) { this.historyUUID = UUIDs.randomBase64UUID(); } // Acquire permanent snapshot ref — held for engine's lifetime, released in closeNoLock. // The snapshot never changes; all queries share this single ref via the cached reader. this.permanentSnapshotRef = catalogSnapshotManagerRef.acquireSnapshot(); Map<DataFormat, Object> readers = new HashMap<>(); for (Map.Entry<DataFormat, EngineReaderManager<?>> entry : readerManagersRef.entrySet()) { Object r = entry.getValue().getReader(permanentSnapshotRef.get()); if (r != null) { readers.put(entry.getKey(), r); } } this.reader = new DataFormatAwareEngine.DataFormatAwareReader(permanentSnapshotRef, readers); success = true; logger.info("Created DataFormatAwareReadOnlyEngine"); } catch (IOException e) { throw new EngineCreationFailureException(shardId, "failed to create read-only engine", e); } finally { if (success == false) { if (readerManagersRef != null) { IOUtils.closeWhileHandlingException(readerManagersRef.values()); } IOUtils.closeWhileHandlingException(catalogSnapshotManagerRef); IOUtils.closeWhileHandlingException(constructingCommitter); if (isClosed.get() == false) { store.decRef(); } } } Race Condition In `addReadOnlyBlockAndPrepare()`, the read-only block is added via a cluster state update task. If the index is deleted between the task submission and execution, `indexMetadata` will be null at line 109, causing `execute()` to throw. However, `clusterStateProcessed()` at line 145 will still be called, attempting to run `executePrepareTiering()` on a deleted index. This can cause unexpected failures or resource leaks if prepare logic does not handle missing indices gracefully. private void addReadOnlyBlockAndPrepare( IndexTieringRequest request, ClusterState state, ActionListener<AcknowledgedResponse> listener ) { clusterService.submitStateUpdateTask( "add-read-only-block-for-tiering [" + request.getIndex() + "]", new ClusterStateUpdateTask(Priority.URGENT) { @Override public ClusterState execute(ClusterState currentState) { IndexMetadata indexMetadata = currentState.metadata().index(request.getIndex()); if (indexMetadata == null) { throw new IllegalStateException("Index [" + request.getIndex() + "] not found"); } Settings.Builder indexSettingsBuilder = Settings.builder() .put(indexMetadata.getSettings()) .put(IndexMetadata.INDEX_BLOCKS_READ_ONLY_ALLOW_DELETE_SETTING.getKey(), true); IndexMetadata.Builder indexMetadataBuilder = IndexMetadata.builder(indexMetadata) .settings(indexSettingsBuilder) .settingsVersion(1 + indexMetadata.getSettingsVersion()); Metadata.Builder metadataBuilder = Metadata.builder(currentState.metadata()).put(indexMetadataBuilder); ClusterBlocks.Builder blocks = ClusterBlocks.builder().blocks(currentState.blocks()); blocks.addIndexBlock(request.getIndex(), IndexMetadata.INDEX_READ_ONLY_ALLOW_DELETE_BLOCK); return ClusterState.builder(currentState).metadata(metadataBuilder).blocks(blocks).build(); } @Override public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { logger.info("Read-only block added for index [{}], proceeding with pre-tiering sync", request.getIndex()); executePrepareTiering(request, newState, listener, 1); } @Override public void onFailure(String source, Exception e) { logger.error("Failed to add read-only block for index [{}]", request.getIndex()); listener.onFailure( new IllegalStateException( "Failed to add read-only block for DFA index [" + request.getIndex() + "]. Please retry.", e ) ); } } ); } Possible Issue In `removeReadOnlyBlock()`, if the cluster state update task fails (line 271), the read-only block remains on the index. The warning at line 272 suggests manual removal, but there is no mechanism to retry or alert operators. If this happens during a tiering failure, the index is left in a stuck read-only state, requiring manual intervention. Consider adding a retry mechanism or more prominent alerting. private void removeReadOnlyBlock(String indexName) { clusterService.submitStateUpdateTask( "remove-read-only-block-for-tiering [" + indexName + "]", new ClusterStateUpdateTask(Priority.URGENT) { @Override public ClusterState execute(ClusterState currentState) { IndexMetadata indexMetadata = currentState.metadata().index(indexName); if (indexMetadata == null) { return currentState; } Settings.Builder indexSettingsBuilder = Settings.builder() .put(indexMetadata.getSettings()) .put(IndexMetadata.INDEX_BLOCKS_READ_ONLY_ALLOW_DELETE_SETTING.getKey(), false); IndexMetadata.Builder indexMetadataBuilder = IndexMetadata.builder(indexMetadata) .settings(indexSettingsBuilder) .settingsVersion(1 + indexMetadata.getSettingsVersion()); Metadata.Builder metadataBuilder = Metadata.builder(currentState.metadata()).put(indexMetadataBuilder); ClusterBlocks.Builder blocks = ClusterBlocks.builder().blocks(currentState.blocks()); blocks.removeIndexBlock(indexName, IndexMetadata.INDEX_READ_ONLY_ALLOW_DELETE_BLOCK); return ClusterState.builder(currentState).metadata(metadataBuilder).blocks(blocks).build(); } @Override public void onFailure(String source, Exception e) { logger.warn( "Failed to remove read-only block for index [{}] after tiering failure: {}. " + "Block can be removed manually via index settings.", indexName, e ); } @Override public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { logger.info("Read-only block removed for index [{}] after tiering failure", indexName); } } ); } Possible Issue In `syncAndFlush()`, the code flushes before refreshing (line 154). The comment at line 152 states that flushing first ensures committed segments exist before remote store sync. However, `waitForRemoteSync()` is called after `refresh()` (line 164), not after `flush()`. If `refresh()` fails between flush and remote sync, the remote store may be out of sync with the local commit, but the method does not detect this. Consider moving the remote sync check immediately after flush or adding error handling for refresh failures. private void syncAndFlush(IndexShard indexShard, ShardRouting shardRouting) throws IOException { logger.trace("Syncing and flushing shard [{}]", shardRouting.shardId()); indexShard.sync(); // Flush before refresh: committed segments_N must exist before waitForRemoteStoreSync uploads them. // Refreshing first would expose segments not yet committed — the remote store sync would miss them. indexShard.flush(new FlushRequest().force(true).waitIfOngoing(true)); indexShard.refresh("prepare_tiering"); }

github-actions · 2026-05-18T19:20:59Z

PR Code Suggestions ✨

Latest suggestions up to 5165667

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Make permit acquisition timeout configurable The 30-second timeout for acquiring primary permits may be insufficient for shards with long-running queries or heavy indexing load. If the timeout is exceeded, the tiering operation fails and must be retried manually. Consider making this timeout configurable via a cluster setting to allow operators to tune it based on their workload characteristics. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [56-138] -private static final TimeValue PERMITS_ACQUIRE_TIMEOUT = TimeValue.timeValueSeconds(30); -... -indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); +// Add a cluster setting for permit acquisition timeout +public static final Setting<TimeValue> PREPARE_TIERING_PERMITS_TIMEOUT_SETTING = Setting.timeSetting( + "cluster.tiering.prepare.permits_timeout", + TimeValue.timeValueSeconds(30), + TimeValue.timeValueSeconds(1), + Setting.Property.NodeScope, + Setting.Property.Dynamic +); Suggestion importance[1-10]: 7 __ Why: Good suggestion to make the 30-second timeout configurable. Different workloads may need different timeouts, and a fixed value could cause unnecessary failures. The suggested cluster setting approach is appropriate and would improve operational flexibility.	Medium
	Avoid blocking transport thread with synchronous validation The preflight validation runs synchronously in the transport thread before the cluster state update task. If validation is slow (e.g., checking disk space across many nodes), it blocks the transport thread. Consider moving validation into the cluster state update task itself or making it asynchronous to avoid blocking the transport layer. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [93-101] -try { - Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), state); - hotToWarmTieringService.preflightValidate(state, index); -} catch (Exception e) { - logger.info("Preflight validation failed for DFA index [{}]: {}", request.getIndex(), e.getMessage()); - listener.onFailure(e); - return; -} +// Move validation into the cluster state update task's execute() method +// to avoid blocking the transport thread on potentially slow I/O operations +addReadOnlyBlockAndPrepare(request, state, listener); Suggestion importance[1-10]: 6 __ Why: Valid concern about blocking the transport thread with potentially slow validation. However, the validation is already duplicated inside the cluster state task (line 95 comment mentions "validation also runs inside TieringService.tier()"), so moving it entirely would require careful coordination. The suggestion has merit but needs more design consideration.	Low
	Make warm replica count configurable or dynamic The hardcoded replica count of 1 in `auto_expand_replicas` may not be appropriate for all deployments. For clusters with many warm nodes, operators may want more replicas for better query parallelism and fault tolerance. Consider making the max replica count configurable via a cluster setting or deriving it from the number of available warm nodes. server/src/main/java/org/opensearch/storage/tiering/TieringService.java [534] -indexSettingsBuilder.put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-" + 1); +int maxReplicas = Math.min(1, clusterState.nodes().getWarmNodes().size() - 1); +indexSettingsBuilder.put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-" + maxReplicas); Suggestion importance[1-10]: 6 __ Why: The hardcoded replica count of 1 may not suit all deployments. However, the suggestion to derive from `clusterState.nodes().getWarmNodes().size()` could be problematic if warm nodes are added/removed dynamically. A configurable setting would be safer than dynamic calculation. The suggestion has merit but the implementation needs refinement.	Low
	Clarify committer configuration for warm primary The `isReplica=true` parameter passed to `CommitterConfig` is misleading for a warm primary engine. This flag causes `LuceneCommitterFactory` to return `ReadOnlyCommitter`, which is correct, but the semantic intent is unclear. Consider adding a dedicated `isReadOnlyPrimary` flag to `CommitterConfig` to explicitly distinguish warm primaries from replicas, improving code clarity and maintainability. server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java [139-140] this.committer = constructingCommitter = engineConfig.getCommitterFactory() - .getCommitter(new CommitterConfig(engineConfig, () -> {}, true)); + .getCommitter(new CommitterConfig(engineConfig, () -> {}, false, true)); // isReplica=false, isReadOnlyPrimary=true Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that `isReplica=true` is semantically misleading for a warm primary. However, adding a new parameter would require changes across multiple classes and the current approach works correctly. The impact is limited to code clarity rather than correctness.	Low

Previous suggestions

Suggestions up to commit 4b088ef

Category	Suggestion	Impact
General	Avoid unbounded recursion in retry logic The retry logic uses unbounded recursion which could cause stack overflow if `MAX_PREPARE_RETRIES` is set to a large value. Consider using an iterative approach with a loop or scheduling retries via the thread pool to avoid stack depth issues. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [165-192] private void executePrepareTiering( IndexTieringRequest request, ClusterState state, ActionListener<AcknowledgedResponse> listener, int attempt ) { PrepareTieringRequest prepareTieringRequest = new PrepareTieringRequest(request.getIndex()); prepareTieringRequest.timeout(request.timeout()); prepareTieringAction.execute(prepareTieringRequest, new ActionListener<BroadcastResponse>() { @Override public void onResponse(BroadcastResponse broadcastResponse) { if (broadcastResponse.getFailedShards() > 0) { if (attempt < MAX_PREPARE_RETRIES) { logger.warn( "Pre-tiering sync attempt [{}/{}] had {} failed shard(s) for index [{}], retrying", attempt, MAX_PREPARE_RETRIES, broadcastResponse.getFailedShards(), request.getIndex() ); - executePrepareTiering(request, state, listener, attempt + 1); + // Schedule retry on thread pool to avoid stack overflow + threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(() -> + executePrepareTiering(request, state, listener, attempt + 1) + ); return; } Suggestion importance[1-10]: 7 __ Why: Valid concern about potential stack overflow with recursive retries. However, with `MAX_PREPARE_RETRIES=3`, stack depth is minimal. The suggestion to use thread pool scheduling is a good defensive practice for maintainability if the retry count increases.	Medium
	Use async pattern for permit acquisition The `actionGet()` call blocks the current thread until permits are acquired or timeout occurs. This can block the transport thread pool. Consider using the async callback pattern with `ActionListener` to avoid blocking transport threads during permit acquisition. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [135-143] -private Releasable acquirePrimaryPermits(IndexShard indexShard, ShardRouting shardRouting) throws IOException { - try { - PlainActionFuture<Releasable> permitFuture = new PlainActionFuture<>(); - indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); - return permitFuture.actionGet(); - } catch (Exception e) { - throw new IOException("Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e); - } +private void acquirePrimaryPermitsAsync( + IndexShard indexShard, + ShardRouting shardRouting, + ActionListener<Releasable> listener +) { + indexShard.acquireAllPrimaryOperationsPermits( + ActionListener.wrap( + listener::onResponse, + e -> listener.onFailure(new IOException( + "Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e + )) + ), + PERMITS_ACQUIRE_TIMEOUT + ); } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that blocking `actionGet()` can tie up transport threads. However, this code runs in `ThreadPool.Names.MANAGEMENT` (as specified in the action constructor), which is designed for blocking operations. The async pattern would be better but is not critical here.	Low
	Extract hardcoded replica count constant The hardcoded replica count of `1` in `"0-" + 1` should be extracted as a named constant to improve maintainability and make the tiering replica policy explicit. This makes it easier to adjust the policy in the future. server/src/main/java/org/opensearch/storage/tiering/TieringService.java [531-534] +private static final int TIERING_MAX_REPLICAS = 1; + void updateIndexMetadataForTieringStart( final Metadata.Builder metadataBuilder, final RoutingTable.Builder routingTableBuilder, final IndexMetadata indexMetadata, final Index index ) { try { // 1. Build settings Settings.Builder indexSettingsBuilder = Settings.builder().put(indexMetadata.getSettings()).put(getTieringStartSettingsToAdd()); // 2. Handle replica updates using auto_expand_replicas - indexSettingsBuilder.put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-" + 1); + indexSettingsBuilder.put(IndexMetadata.SETTING_AUTO_EXPAND_REPLICAS, "0-" + TIERING_MAX_REPLICAS); Suggestion importance[1-10]: 5 __ Why: Extracting the hardcoded `1` as a named constant improves maintainability. However, the value is used only once in this context and the inline expression `"0-" + 1` is clear enough. The improvement is modest.	Low
	Clarify assertion intent with comment The assertion checks `isReadOnlyReplica() == false`, but this engine is intended for warm primaries. Consider renaming the config flag or adding a comment explaining that "read-only replica" in this context means "replica that receives segments" (not warm primary). The current assertion message is correct but the flag name is confusing. server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java [119-123] public DataFormatAwareReadOnlyEngine(EngineConfig engineConfig) { this.logger = Loggers.getLogger(DataFormatAwareReadOnlyEngine.class, engineConfig.getShardId()); + // isReadOnlyReplica() == false ensures this is a primary shard (warm primaries use this engine) assert engineConfig.isReadOnlyReplica() == false : "DataFormatAwareReadOnlyEngine must only be created for primary shards; shard " + engineConfig.getShardId(); this.engineConfig = engineConfig; this.shardId = engineConfig.getShardId(); this.store = engineConfig.getStore(); Suggestion importance[1-10]: 4 __ Why: The suggestion adds a clarifying comment about the assertion's intent. While helpful for code readability, the assertion message already explains the constraint clearly. The improvement is marginal.	Low

Suggestions up to commit ff0459c

Category	Suggestion	Impact
General	Preserve all request parameters on retry The retry logic creates a new `PrepareTieringRequest` on each attempt but only sets the timeout. If the original request had other settings (e.g., `indicesOptions`), they would be lost on retry. Ensure all relevant request parameters are preserved across retries to maintain consistent behavior. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [176-177] private void executePrepareTiering( IndexTieringRequest request, ClusterState state, ActionListener<AcknowledgedResponse> listener, int attempt ) { PrepareTieringRequest prepareTieringRequest = new PrepareTieringRequest(request.getIndex()); prepareTieringRequest.timeout(request.timeout()); + prepareTieringRequest.indicesOptions(request.indicesOptions()); Suggestion importance[1-10]: 5 __ Why: Valid point about preserving `indicesOptions` across retries. However, `PrepareTieringRequest` extends `BroadcastRequest`, which has default `indicesOptions` that are appropriate for internal operations. The original `IndexTieringRequest` may have user-specified options that should be propagated. This is a minor correctness issue worth addressing.	Low
	Avoid blocking transport thread with validation The preflight validation runs synchronously in the transport thread before any cluster state mutation. If validation is expensive (e.g., disk space checks across many nodes), this could block the transport thread. Consider moving the validation into the cluster state update task or making it asynchronous to avoid blocking the transport layer. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [96-100] -try { - Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), state); - hotToWarmTieringService.preflightValidate(state, index); -} catch (Exception e) { - logger.info("Preflight validation failed for DFA index [{}]: {}", request.getIndex(), e.getMessage()); - listener.onFailure(e); - return; -} +// Move validation inside the cluster state update task to avoid blocking transport thread +clusterService.submitStateUpdateTask( + "validate-and-add-read-only-block [" + request.getIndex() + "]", + new ClusterStateUpdateTask(Priority.URGENT) { + @Override + public ClusterState execute(ClusterState currentState) { + Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), currentState); + hotToWarmTieringService.preflightValidate(currentState, index); + // ... rest of block addition logic + } + // ... onFailure, clusterStateProcessed + } +); Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid concern about blocking the transport thread, but the impact is moderate. The preflight validation is designed to be quick (checking cluster state for warm node capacity), not expensive I/O. Moving it into the cluster state task would complicate error handling and lose the early-rejection benefit. The current design is intentional.	Low
	Use async pattern for permit acquisition The `actionGet()` call blocks the thread until permits are acquired or timeout occurs. If multiple shards are being prepared in parallel on the same node, this could lead to thread pool exhaustion. Consider using the async callback pattern instead of blocking to allow the thread to handle other work while waiting for permits. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [135-143] -private Releasable acquirePrimaryPermits(IndexShard indexShard, ShardRouting shardRouting) throws IOException { - try { - PlainActionFuture<Releasable> permitFuture = new PlainActionFuture<>(); - indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); - return permitFuture.actionGet(); - } catch (Exception e) { - throw new IOException("Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e); - } +private void acquirePrimaryPermitsAsync(IndexShard indexShard, ShardRouting shardRouting, ActionListener<Releasable> listener) { + indexShard.acquireAllPrimaryOperationsPermits( + ActionListener.wrap( + listener::onResponse, + e -> listener.onFailure(new IOException("Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e)) + ), + PERMITS_ACQUIRE_TIMEOUT + ); } Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies a blocking call, but the context matters. This code runs in a broadcast shard operation handler, which already expects blocking I/O (flush, sync). The 30s timeout is generous and thread pool exhaustion is unlikely given the limited concurrency of tiering operations. Refactoring to async would add complexity without clear benefit.	Low
	Clarify primary shard assertion logic The assertion checks `isReadOnlyReplica() == false` but the engine is named "ReadOnly". This is confusing because the flag name suggests replica status, but the check is actually verifying primary status. The assertion message clarifies intent, but the flag name creates cognitive dissonance. Consider renaming the flag or adding a helper method like `isPrimary()` to make the intent clearer at the call site. server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java [121-122] public DataFormatAwareReadOnlyEngine(EngineConfig engineConfig) { this.logger = Loggers.getLogger(DataFormatAwareReadOnlyEngine.class, engineConfig.getShardId()); - assert engineConfig.isReadOnlyReplica() == false + assert engineConfig.isPrimary() : "DataFormatAwareReadOnlyEngine must only be created for primary shards; shard " + engineConfig.getShardId(); Suggestion importance[1-10]: 2 __ Why: The suggestion points out a naming confusion, but the assertion is correct and the message is clear. The flag `isReadOnlyReplica` is an existing `EngineConfig` field used throughout the codebase. Adding a helper method `isPrimary()` would require changes to `EngineConfig` and is out of scope for this PR. The cognitive dissonance is minor and mitigated by the assertion message.	Low

Suggestions up to commit a251956

Category	Suggestion	Impact
Possible issue	Add timeout to blocking future call The `actionGet()` call blocks indefinitely if the future never completes. Although `PERMITS_ACQUIRE_TIMEOUT` is passed to `acquireAllPrimaryOperationsPermits`, the future itself has no timeout. If the permit acquisition hangs, this thread will block forever. Use `actionGet(timeout)` to enforce a timeout on the blocking call. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [135-143] private Releasable acquirePrimaryPermits(IndexShard indexShard, ShardRouting shardRouting) throws IOException { try { PlainActionFuture<Releasable> permitFuture = new PlainActionFuture<>(); indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); - return permitFuture.actionGet(); + return permitFuture.actionGet(PERMITS_ACQUIRE_TIMEOUT); } catch (Exception e) { throw new IOException("Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e); } } Suggestion importance[1-10]: 7 __ Why: Valid concern. While `PERMITS_ACQUIRE_TIMEOUT` is passed to `acquireAllPrimaryOperationsPermits`, the `actionGet()` call on line 139 blocks indefinitely if the future never completes. Using `actionGet(PERMITS_ACQUIRE_TIMEOUT)` ensures the blocking call respects the timeout, preventing indefinite hangs.	Medium
General	Add logging for missing index The `removeReadOnlyBlock` method is called on prepare failure to clean up the read-only block. However, if the index is deleted between the failure and this cleanup attempt, `indexMetadata` will be null and the method silently returns. This is correct, but the method should log a warning to aid debugging when the index disappears unexpectedly during cleanup. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [250-254] private void removeReadOnlyBlock(String indexName) { clusterService.submitStateUpdateTask( "remove-read-only-block-for-tiering [" + indexName + "]", new ClusterStateUpdateTask(Priority.URGENT) { @Override public ClusterState execute(ClusterState currentState) { IndexMetadata indexMetadata = currentState.metadata().index(indexName); if (indexMetadata == null) { + logger.warn("Index [{}] not found during read-only block removal", indexName); return currentState; } Suggestion importance[1-10]: 4 __ Why: The suggestion improves observability by logging when an index is unexpectedly missing during cleanup. This aids debugging in edge cases (e.g., concurrent index deletion). However, the impact is minor since the method already handles null metadata gracefully.	Low
General	Guard against excessive recursion depth The `executePrepareTiering` method recursively retries on failure but does not guard against stack overflow if `MAX_PREPARE_RETRIES` is set to a very large value. Consider using an iterative approach or adding a safeguard to prevent excessive recursion depth. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [165-177] private void executePrepareTiering( IndexTieringRequest request, ClusterState state, ActionListener<AcknowledgedResponse> listener, int attempt ) { + if (attempt > MAX_PREPARE_RETRIES) { + throw new IllegalStateException("Retry limit exceeded"); + } PrepareTieringRequest prepareTieringRequest = new PrepareTieringRequest(request.getIndex()); prepareTieringRequest.timeout(request.timeout()); Suggestion importance[1-10]: 2 __ Why: The suggestion is technically correct but has minimal impact. `MAX_PREPARE_RETRIES` is hardcoded to 3 (line 48), so recursion depth is bounded and stack overflow is not a realistic concern. The suggested guard is redundant given the existing retry limit checks in lines 164 and 216.	Low

Suggestions up to commit a526ff4

Category	Suggestion	Impact
General	Release lock before notifying listeners The write lock is held during the entire replication finalization, including listener notifications. If a refresh listener performs expensive I/O or blocks, this could delay other operations waiting for the write lock. Consider releasing the lock before notifying listeners or ensuring listeners are non-blocking. server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java [324-346] public void finalizeReplication(CatalogSnapshot incoming) throws IOException { + final long maxSeqNo; + final String incomingHistoryUUID; try (ReleasableLock lock = writeLock.acquire()) { ensureOpen(); - final long maxSeqNo = Long.parseLong(incoming.getUserData().get(SequenceNumbers.MAX_SEQ_NO)); - - notifyRefreshListenersBefore(); - boolean success = false; - try { - catalogSnapshotManager.applyReplicationSnapshot(incoming); - success = true; - } finally { - notifyRefreshListenersAfter(success); + maxSeqNo = Long.parseLong(incoming.getUserData().get(SequenceNumbers.MAX_SEQ_NO)); + incomingHistoryUUID = incoming.getUserData().get(Engine.HISTORY_UUID_KEY); + + catalogSnapshotManager.applyReplicationSnapshot(incoming); + if (incomingHistoryUUID != null) { + this.historyUUID = incomingHistoryUUID; } - ... + localCheckpointTracker.fastForwardProcessedSeqNo(maxSeqNo); } + // Notify listeners outside the lock + notifyRefreshListenersBefore(); + notifyRefreshListenersAfter(true); } Suggestion importance[1-10]: 6 __ Why: Valid concern about holding the write lock during listener notifications. However, the current pattern matches `DataFormatAwareNRTReplicationEngine` for consistency. Listeners are expected to be lightweight (e.g., updating stats). Changing this would require careful analysis of listener contracts and potential race conditions.	Low
	Avoid blocking transport thread with validation The preflight validation runs synchronously in the transport thread before any cluster state mutation. If validation is expensive (e.g., disk space checks across many nodes), this could block the transport thread. Consider moving the validation into the cluster state update task or making it async to avoid blocking the transport layer. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [89-100] -try { - Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), state); - hotToWarmTieringService.preflightValidate(state, index); -} catch (Exception e) { - logger.info("Preflight validation failed for DFA index [{}]: {}", request.getIndex(), e.getMessage()); - listener.onFailure(e); - return; -} +// Move validation into the cluster state update task or use async validation +clusterService.submitStateUpdateTask("validate-and-add-block [" + request.getIndex() + "]", + new ClusterStateUpdateTask(Priority.URGENT) { + @Override + public ClusterState execute(ClusterState currentState) { + try { + Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), currentState); + hotToWarmTieringService.preflightValidate(currentState, index); + return addReadOnlyBlock(currentState, request.getIndex()); + } catch (Exception e) { + throw new IllegalStateException("Validation failed", e); + } + } + // ... onFailure, clusterStateProcessed handlers + }); Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that synchronous validation in the transport thread could be a bottleneck. However, the PR already includes double-validation (preflight + inside cluster state task) for TOCTOU safety. Moving validation entirely into the cluster state task would eliminate the early-rejection benefit. The current design is a deliberate trade-off.	Low
	Use async permit acquisition for concurrency The `actionGet()` call blocks the thread until permits are acquired or timeout occurs. If multiple shards are processed sequentially on the same node, this could cause head-of-line blocking. Consider using async permit acquisition with callbacks to allow concurrent processing of multiple shards on the same node. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [135-143] -private Releasable acquirePrimaryPermits(IndexShard indexShard, ShardRouting shardRouting) throws IOException { - try { - PlainActionFuture<Releasable> permitFuture = new PlainActionFuture<>(); - indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); - return permitFuture.actionGet(); - } catch (Exception e) { - throw new IOException("Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e); - } +private void acquirePrimaryPermitsAsync(IndexShard indexShard, ShardRouting shardRouting, + ActionListener<Releasable> listener) { + indexShard.acquireAllPrimaryOperationsPermits( + ActionListener.wrap( + listener::onResponse, + e -> listener.onFailure(new IOException( + "Failed to acquire primary operation permits for shard [" + shardRouting.shardId() + "]", e)) + ), + PERMITS_ACQUIRE_TIMEOUT + ); } Suggestion importance[1-10]: 4 __ Why: The suggestion identifies a potential concurrency improvement, but `TransportBroadcastByNodeAction` already processes shards in parallel across nodes. The blocking `actionGet()` only affects sequential processing within a single node's shard batch, which is acceptable for this use case. The added complexity of async callbacks may not justify the marginal gain.	Low
	Use iterative retry instead of recursion The retry logic uses recursive calls to `executePrepareTiering`, which could lead to deep call stacks if `MAX_PREPARE_RETRIES` is increased. Consider using an iterative approach or a scheduled retry mechanism to avoid potential stack overflow issues and improve debuggability. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [165-237] -private void executePrepareTiering( +private void executePrepareTieringWithRetry( IndexTieringRequest request, ClusterState state, - ActionListener<AcknowledgedResponse> listener, - int attempt + ActionListener<AcknowledgedResponse> listener ) { - PrepareTieringRequest prepareTieringRequest = new PrepareTieringRequest(request.getIndex()); - prepareTieringRequest.timeout(request.timeout()); + AtomicInteger attempt = new AtomicInteger(1); + ActionListener<BroadcastResponse> retryListener = new ActionListener<>() { + @Override + public void onResponse(BroadcastResponse response) { + if (response.getFailedShards() > 0 && attempt.get() < MAX_PREPARE_RETRIES) { + attempt.incrementAndGet(); + prepareTieringAction.execute(new PrepareTieringRequest(request.getIndex()), this); + } else if (response.getFailedShards() > 0) { + // final failure + } else { + // success + } + } + // ... onFailure with similar retry logic + }; + prepareTieringAction.execute(new PrepareTieringRequest(request.getIndex()), retryListener); +} - prepareTieringAction.execute(prepareTieringRequest, new ActionListener<BroadcastResponse>() { - @Override - public void onResponse(BroadcastResponse broadcastResponse) { - if (broadcastResponse.getFailedShards() > 0) { - if (attempt < MAX_PREPARE_RETRIES) { - ... - executePrepareTiering(request, state, listener, attempt + 1); - return; - } - Suggestion importance[1-10]: 3 __ Why: The suggestion raises a theoretical concern about stack depth with `MAX_PREPARE_RETRIES=3`. In practice, 3 recursive calls pose no stack overflow risk. The current recursive approach is clearer and more maintainable than the proposed iterative version with `AtomicInteger`. The improved code also has incomplete error handling logic.	Low

Suggestions up to commit 9035f42

Category	Suggestion	Impact
General	Warn when fabricating missing historyUUID Fabricating a `historyUUID` when absent may cause issues during peer recovery or replica promotion if the primary and replica have different fabricated UUIDs. Consider logging a warning when fabrication occurs, or failing engine creation if `historyUUID` is required for correctness in the warm tier recovery protocol. server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java [196-199] if (this.historyUUID == null) { + logger.warn("historyUUID missing from commit for shard [{}], fabricating new UUID. This may cause recovery issues.", shardId); this.historyUUID = UUIDs.randomBase64UUID(); } Suggestion importance[1-10]: 7 __ Why: Valid suggestion to add a warning when `historyUUID` is fabricated. This matches the pattern used in `DataFormatAwareNRTReplicationEngine` and helps with debugging recovery issues. The concern about replica promotion is legitimate, though the fabrication logic mirrors the existing NRT engine behavior.	Medium
	Add retry for read-only block removal If `removeReadOnlyBlock` fails (e.g., cluster state update rejected), the index remains in a stuck read-only state. The user must manually intervene. Consider adding a retry mechanism or a background task that periodically attempts to remove orphaned read-only blocks from failed tiering operations to improve resilience. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [245-286] -private void removeReadOnlyBlock(String indexName) { +// Add retry logic or background cleanup task for orphaned read-only blocks +private void removeReadOnlyBlockWithRetry(String indexName, int attempt) { + if (attempt > 3) { + logger.error("Failed to remove read-only block after 3 attempts for index [{}]. Manual intervention required.", indexName); + return; + } clusterService.submitStateUpdateTask( "remove-read-only-block-for-tiering [" + indexName + "]", new ClusterStateUpdateTask(Priority.URGENT) { @Override public void onFailure(String source, Exception e) { - logger.warn( - "Failed to remove read-only block for index [{}] after tiering failure: {}. " - + "Block can be removed manually via index settings.", - indexName, - e - ); + logger.warn("Retry {}/3: Failed to remove read-only block for index [{}]: {}", attempt, indexName, e); + removeReadOnlyBlockWithRetry(indexName, attempt + 1); } // ... } ); } Suggestion importance[1-10]: 6 __ Why: Valid concern about orphaned read-only blocks if removal fails. Adding retry logic would improve resilience. However, the suggestion's implementation is simplistic (no exponential backoff, no state tracking). A more robust solution would involve tracking failed removals and a background cleanup task, but the basic retry idea has merit.	Low
	Avoid blocking transport thread with validation The preflight validation runs synchronously on the transport thread before any cluster state mutation. If validation is expensive (e.g., disk space checks across many nodes), this could block the transport thread. Consider moving the validation into the cluster state update task or making it asynchronous to avoid blocking the transport layer. server/src/main/java/org/opensearch/storage/action/tiering/TransportHotToWarmTierAction.java [96-100] -try { - Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), state); - hotToWarmTieringService.preflightValidate(state, index); -} catch (Exception e) { - logger.info("Preflight validation failed for DFA index [{}]: {}", request.getIndex(), e.getMessage()); - listener.onFailure(e); - return; -} +// Move validation into the cluster state update task to avoid blocking transport thread +clusterService.submitStateUpdateTask( + "preflight-validate-and-add-block [" + request.getIndex() + "]", + new ClusterStateUpdateTask(Priority.URGENT) { + @Override + public ClusterState execute(ClusterState currentState) { + try { + Index index = resolveRequestIndex(indexNameExpressionResolver, request.getIndex(), currentState); + hotToWarmTieringService.preflightValidate(currentState, index); + } catch (Exception e) { + throw new IllegalStateException("Preflight validation failed: " + e.getMessage(), e); + } + // Proceed to add read-only block... + } + // ... + } +); Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that synchronous validation on the transport thread could cause blocking. However, the current implementation already runs validation before expensive operations (block addition, prepare), which is the intended design. Moving validation into the cluster state task would complicate error handling and lose the early-rejection benefit. The concern is valid but the proposed solution may not be optimal.	Low
	Use async callback for permit acquisition Using `actionGet()` blocks the thread until permits are acquired or timeout occurs. If multiple shards are processed sequentially on the same node, this could cause cumulative blocking delays. Consider using the async callback pattern to avoid blocking the thread pool, especially if the action processes many shards concurrently. server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java [136-140] -PlainActionFuture<Releasable> permitFuture = new PlainActionFuture<>(); -indexShard.acquireAllPrimaryOperationsPermits(permitFuture, PERMITS_ACQUIRE_TIMEOUT); -return permitFuture.actionGet(); +// Use async callback to avoid blocking thread pool +indexShard.acquireAllPrimaryOperationsPermits(new ActionListener<Releasable>() { + @Override + public void onResponse(Releasable permit) { + try { + syncAndFlush(indexShard, shardRouting); + waitForRemoteSync(indexShard, shardRouting); + verifyNoUncommittedOps(indexShard, shardRouting); + } catch (IOException e) { + // handle error + } finally { + permit.close(); + } + } + @Override + public void onFailure(Exception e) { + throw new IOException("Failed to acquire permits", e); + } +}, PERMITS_ACQUIRE_TIMEOUT); Suggestion importance[1-10]: 4 __ Why: The suggestion to use async callbacks is theoretically better, but the current implementation using `actionGet()` is consistent with similar operations in the codebase (e.g., flush, refresh). The blocking is intentional to ensure sequential execution of sync→flush→refresh→verify. The concern about cumulative delays is valid but the impact is limited since this is a rare administrative operation.	Low

github-actions · 2026-05-21T14:47:39Z

Persistent review updated to latest commit a251956

github-actions · 2026-05-21T14:57:24Z

❌ Gradle check result for a251956: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-21T16:37:14Z

Persistent review updated to latest commit ff0459c

github-actions · 2026-05-21T16:47:01Z

❌ Gradle check result for ff0459c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-21T17:01:58Z

Persistent review updated to latest commit 4b088ef

github-actions · 2026-05-21T17:50:22Z

❌ Gradle check result for 4b088ef: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-05-21T21:03:07Z

❌ Gradle check result for 4b088ef: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Introduces a purpose-built read-only engine for primaries on indices marked with isWarmIndex(). The engine operates on CatalogSnapshot, has no IndexWriter, uses NoOpTranslogManager, and rejects all write operations. Reads route through TieredSubdirectoryAwareDirectory to remote storage. Key changes: - DataFormatAwareReadOnlyEngine implementing Indexer interface - DataFormatAwareIndexerFactory updated to create read-only engine for warm primaries - Unit tests for the read-only engine - Integration test skeleton for warm engine scenarios - run.gradle updated for multi-node warm testing - Test fixes for cherry-picked NRT replication changes Signed-off-by: nishchay21 <nishcha@amazon.com>

…auto_expand_replicas, IndexShard DFA sync fix - TransportPrepareTieringAction: broadcast action for flush+sync on primaries - TransportHotToWarmTierAction: read-only block + prepare + retry + tier flow for DFA - TieringService: auto_expand_replicas during tiering - HotToWarmTieringService: read_only_allow_delete on cancel - WarmToHotTieringService: remove read-only block on warm-to-hot - IndexShard: fix isRemoteSegmentStoreInSync for DFA using CatalogSnapshot - NRT engine: null-check on commitResult (from mgodwan fix) - Unit tests for all changes Signed-off-by: nishchay21 <nishchay21@users.noreply.github.com> Signed-off-by: nishchay21 <nishcha@amazon.com>

Signed-off-by: nishchay21 <nishchay21@users.noreply.github.com> Signed-off-by: nishchay21 <nishcha@amazon.com>

Signed-off-by: nishchay21 <nishcha@amazon.com>

Return NotFound for directory HEAD requests using registry prefix matching instead of hitting remote store. This matches LocalFileSystem behavior and allows DataFusion to fall through to list_with_cache. - try_head_from_registry: prefix match detects directories, returns NotFound - Removed fragile heuristic that used path_str.contains dot - Tests updated to expect NotFound for directory HEAD - list test registers local file in registry (list returns registry only) Signed-off-by: nishchay21 <nishcha@amazon.com>

Signed-off-by: nishchay21 <nishcha@amazon.com>

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

…pload to remote Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

github-actions · 2026-05-22T09:06:45Z

Persistent review updated to latest commit 5165667

github-actions · 2026-05-22T10:02:52Z

✅ Gradle check result for 5165667: SUCCESS

codecov · 2026-05-22T10:03:40Z

Codecov Report

❌ Patch coverage is 14.40000% with 107 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.49%. Comparing base (afa213c) to head (5165667).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...e/action/tiering/TransportHotToWarmTierAction.java	3.84%	100 Missing ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java	44.44%	4 Missing and 1 partial ⚠️
...org/opensearch/storage/tiering/TieringService.java	60.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #21720      +/-   ##
============================================
+ Coverage     73.46%   73.49%   +0.03%     
- Complexity    75254    75299      +45     
============================================
  Files          6023     6024       +1     
  Lines        341475   341712     +237     
  Branches      49141    49162      +21     
============================================
+ Hits         250855   251143     +288     
+ Misses        70680    70556     -124     
- Partials      19940    20013      +73

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vishwasgarg18 requested review from a team, Bukhtawar, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners May 18, 2026 18:04

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch 8 times, most recently from 2d4cb3c to 0ddec48 Compare May 18, 2026 19:18

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from 0ddec48 to 7bc2965 Compare May 18, 2026 19:20

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from 7bc2965 to 890d029 Compare May 18, 2026 19:25

mgodwan reviewed May 21, 2026

View reviewed changes

Comment thread server/src/main/java/org/opensearch/index/engine/DataFormatAwareReadOnlyEngine.java

mgodwan reviewed May 21, 2026

View reviewed changes

Comment thread server/src/main/java/org/opensearch/index/shard/IndexShard.java Outdated

mgodwan reviewed May 21, 2026

View reviewed changes

Comment thread server/src/main/java/org/opensearch/storage/action/tiering/TransportPrepareTieringAction.java

mgodwan reviewed May 21, 2026

View reviewed changes

Comment thread server/src/main/java/org/opensearch/storage/action/tiering/PrepareTieringRequest.java

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from a526ff4 to a251956 Compare May 21, 2026 14:46

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from a251956 to ff0459c Compare May 21, 2026 16:35

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from ff0459c to 4b088ef Compare May 21, 2026 17:00

nishchay21 and others added 12 commits May 22, 2026 14:34

Add already-warm check to prevent redundant hot-to-warm tiering

142f7af

Signed-off-by: nishchay21 <nishchay21@users.noreply.github.com> Signed-off-by: nishchay21 <nishcha@amazon.com>

Adding no-op deletion policy

ba6d646

Signed-off-by: nishchay21 <nishcha@amazon.com>

Add DataFormatAwareReadOnlyEngine integration tests

001577f

Signed-off-by: nishchay21 <nishcha@amazon.com>

Add native repository SPI providers

80563d9

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Additional refactoring

f06701e

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Add prevalidations before intiation adding readonly block and force u…

d5e47f0

…pload to remote Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Update ITs and set AUTO_EXPAND_REPLICAS default for warm

373852a

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Address review comments

f7d39c9

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

Return the same reader

5165667

Signed-off-by: Vishwas Garg <vishwasgarg14@gmail.com>

vishwasgarg18 force-pushed the feature/data-format-aware-read-only-engine branch from 4b088ef to 5165667 Compare May 22, 2026 09:05

Conversation

vishwasgarg18 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Core Engine: DataFormatAwareReadOnlyEngine

Tiering Service Improvements

TieredObjectStore: Fix for Warm PPL Queries

Related Issues

Check List

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 5165667)

Uh oh!

github-actions Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

codecov Bot commented May 22, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vishwasgarg18 commented May 18, 2026 •

edited

Loading

Core Engine: `DataFormatAwareReadOnlyEngine`

github-actions Bot commented May 18, 2026 •

edited

Loading

github-actions Bot commented May 18, 2026 •

edited

Loading

(Review updated until commit `5165667`)

github-actions Bot commented May 18, 2026 •

edited

Loading