Skip to content

Conversation

@RS146BIJAY
Copy link
Contributor

@RS146BIJAY RS146BIJAY commented Dec 2, 2025

Description

Fixing indexing regression and bug fixes for grouping criteria. For testing grouping criteria changes, enabled the grouping criteria on local and tested with setting criteria. Wil raise the changes for integ test enablement for CAS in a separate PR as that require decent changes in integ test as well.

Related Issues

#19919

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed an indexing regression and issues with grouping criteria.
    • Adjusted retry and lock-acquisition behavior; lowered default retry count and increased allowed maximum for robustness.
    • Narrowed returned composite field types to the intended subtype.
    • Made grouping field mapper skip certain source derivation/validation steps.
  • Tests

    • Added tests validating retry/lock-acquisition limits and behavior.
    • Removed obsolete tests for the prior retry branch.
  • Documentation

    • Updated changelog with these fixes.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Walkthrough

Adjusts lookup-map lock retry behavior by lowering the default and increasing the maximum retry setting, propagating a max-retry parameter into CompositeIndexWriter lookup paths using bounded tryAcquire loops; removes exception-driven retry from bulk action; narrows a mapper return set; updates tests and CHANGELOG entries. (≤50 words)

Changes

Cohort / File(s) Change Summary
Index settings
server/src/main/java/org/opensearch/index/IndexSettings.java
Changes INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTION default from 155, min remains 5, max from 100500.
Composite writer updates
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java
Lowers visibility/finality of several inner types/fields; makes LiveIndexWriterDeletesMap.current mutable; replaces previous getCurrentMap() usage with bounded mapReadLock.tryAcquire() retry loops; adds maxRetryOnLookupMapAcquisitionException parameter to computeIndexWriterIfAbsentForCriteria; replaces stream-based iterations with direct map-value iteration for metrics/rollback paths.
Bulk action retry removal
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
Removes LookupMapLockAcquisitionException import/helper and the retry branch that relied on that exception.
Mapper return narrowing
server/src/main/java/org/opensearch/index/mapper/MapperService.java
getCompositeFieldTypes() now filters compositeMappedFieldTypes to only StarTreeMapper.StarTreeFieldType instances using stream+collect.
Mapper imports
server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java
Adds imports for org.apache.lucene.index.LeafReader and org.opensearch.core.xcontent.XContentBuilder; adds two no-op overrides (canDeriveSource, deriveSource) on the field type.
Tests — removals
server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
Removes two tests and related imports that exercised LookupMapLockAcquisitionException retry scenarios.
Tests — updates/additions
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java
Updates calls to computeIndexWriterIfAbsentForCriteria to pass an extra MAX_NUMBER_OF_RETRIES argument; adds testMaxRetryCountWhenWriteLockDuringIndexing() to verify retry behavior and exception after retries; adds Mockito verification imports.
Test base constant
server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
Adds public static final int MAX_NUMBER_OF_RETRIES = 20.
Changelog
CHANGELOG.md
Adds Unreleased 3.x Fixed entries referencing fixes to EngineConfig.toBuilder and indexing/grouping-criteria fixes (PR references).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • CompositeIndexWriter.java: correctness of bounded tryAcquire retry loop, acquisition/release semantics, and behavior when map is closed.
    • Parameter propagation sites for maxRetryOnLookupMapAcquisitionException: ensure consistent values passed and no call-site misses.
    • IndexSettings.java: verify new default/min/max align with intended behavior and documentation.
    • TransportShardBulkAction.java and its tests: confirm removal of exception-driven retry matches expected runtime semantics and tests removed/updated accordingly.
    • New/updated tests: ensure mocked behaviors and retry-count assertions reflect the runtime retry loop and constant values.

Suggested labels

v3.4.0, backport 3.4

Suggested reviewers

  • andrross
  • dbwiddis
  • msfroh
  • reta
  • mch2
  • sachinpkale
  • shwetathareja
  • ashking94
  • kotwanikunal
  • cwperks
  • jed326

Poem

🐇 I count my hops and try again,

Bounded leaps beneath the glen,
Writers wake where locks once failed,
Tests keep tally, retries unveiled,
I nibble bugs and then I'm gone.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.39% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective of the PR: fixing an indexing regression and applying bug fixes for grouping criteria, which aligns with the substantial changes across multiple files.
Description check ✅ Passed The PR description provides context about the fix and testing approach, though it omits some non-critical template sections like API changes and documentation checklist items.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (2)

215-215: Reduced encapsulation of mapReadLock.

The visibility of mapReadLock has been changed from private final to package-private, allowing direct access from other classes in the same package. This field controls critical concurrency behavior, and exposing it directly increases the risk of misuse.

Consider:

  1. Keeping the field private and exposing only necessary operations through methods (e.g., tryAcquireLock()).
  2. If package-private access is required for the retry logic, add clear documentation about proper usage patterns and thread-safety requirements.
  3. Restrict access using a package-private accessor method rather than exposing the field directly.

498-498: Simplify boolean comparisons.

The condition uses explicit == false and == true comparisons which are redundant in Java.

Apply this diff:

-if (success == false && current != null && current.mapReadLock.isHeldByCurrentThread() == true) {
+if (!success && current != null && current.mapReadLock.isHeldByCurrentThread()) {
     current.mapReadLock.close();
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76826c and 1d42f98.

📒 Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (13)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

87-87: LGTM!

Import correctly added to support the stream operations in getCompositeFieldTypes().


693-697: Verify the behavior change scope and call frequency.

The filtering to return only StarTreeMapper.StarTreeFieldType instances represents a narrowed scope from returning all composite field types. Confirm this change is intentional and whether any callers expect other CompositeMappedFieldType implementations. Additionally, verify the call frequency of this method; if invoked on hot paths, consider caching the filtered result to avoid repeated stream collection operations.

server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1)

106-106: LGTM!

The test constant is appropriately set to a lower value (20) than the production default (100) for faster test execution while still being within the valid range (5-500).

server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (5)

44-46: LGTM!

Mockito imports correctly added to support the new verification test.


71-77: LGTM!

Method call correctly updated to include MAX_NUMBER_OF_RETRIES parameter, aligning with the new bounded retry API.


141-146: LGTM!

Method call correctly updated with retry parameter.


197-202: LGTM!

Method call correctly updated with retry parameter.


208-227: Test validates bounded retry semantics correctly.

The test properly verifies:

  1. LookupMapLockAcquisitionException is thrown after exhausting retries
  2. tryAcquire() is called exactly MAX_NUMBER_OF_RETRIES times

One consideration: the mock setup directly assigns to map.current and map.current.mapReadLock which accesses package-private fields. This works for testing but creates tight coupling to internal implementation details.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)

724-753: Retry logic moved to lower layer - verify exception handling.

The LookupMapLockAcquisitionException retry logic has been removed from bulk action handling and moved to CompositeIndexWriter with bounded retries. This architectural approach places retry logic closer to where the exception originates.

Ensure that when LookupMapLockAcquisitionException propagates up after max retries are exhausted, it's properly handled and doesn't cause unexpected bulk operation failures.

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: Significant default value change - verify upgrade impact.

The default retry count increased to 100 with a maximum of 500. Since this is a dynamic setting, existing indices will apply the new default upon upgrade. Consider whether this change should be documented in release notes for operators who have tuned their clusters based on previous defaults.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)

691-693: LGTM: Metrics gathering refactoring.

The refactoring from stream-based iteration to explicit for-loops improves code clarity and performance for these simple aggregation operations. The logic is correct in all cases, with proper handling of both current and old maps where necessary, and appropriate locking in ramBytesUsed().

Also applies to: 702-704, 731-742, 758-770, 796-806


210-210: Verify removal of final modifier is intentional.

The final modifier has been removed from CriteriaBasedIndexWriterLookup, CriteriaBasedWriterLock, and LiveIndexWriterDeletesMap. This allows subclassing of these internal implementation classes. Confirm whether:

  1. Subclassing is required for test mocking/stubbing.
  2. If so, consider restricting visibility to test scope or use sealed classes.
  3. If intentional for production extensibility, document extension points and invariants.

Also applies to: 301-301, 406-406


678-679: Verify retry configuration defaults and bounds.

The maxRetryOnLookupMapAcquisitionException setting controls retry behavior for lookup map acquisition. Without access to the codebase, I cannot confirm the specific default values, upper/lower bounds, or whether they align with industry best practices for lock acquisition retry mechanisms. Ensure the following:

  • Default retry count is reasonable (typically 3–10 attempts for in-thread locks per best practices)
  • Upper bounds prevent excessive retries that could cause performance degradation
  • Configuration is documented and tunable per workload expectations
  • Retry strategy includes backoff/jitter to avoid retry storms (not just blind spinning)

CHANGELOG.md Outdated
- Fix node bootstrap error when enable stream transport and remote cluster state ([#19948](https://github.com/opensearch-project/OpenSearch/pull/19948))
- Fix deletion failure/error of unused index template; case when an index template matches a data stream but has a lower priority. ([#20102](https://github.com/opensearch-project/OpenSearch/pull/20102))
- Fix toBuilder method in EngineConfig to include mergedSegmentTransferTracker([20105](https://github.com/opensearch-project/OpenSearch/pull/20105))
- Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor formatting inconsistency.

The PR link format is inconsistent with other entries in the changelog. Missing # before the PR number.

Apply this diff to fix the formatting:

-- Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
+- Fix indexing regression and bug fixes for grouping criteria ([#20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
- Fix indexing regression and bug fixes for grouping criteria ([#20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
🤖 Prompt for AI Agents
CHANGELOG.md around line 94: the PR link is missing the '#' before the PR number
which makes it inconsistent with other entries; update the link by inserting '#'
before the PR number so it reads .../pull/#20145 (i.e. change
(https://github.com/opensearch-project/OpenSearch/pull/20145) to include the
hash) and ensure the rest of the entry formatting matches surrounding lines.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

475-489: Add exponential backoff to retry loop.

The retry loop performs a busy-wait by calling tryAcquire() repeatedly without any delay between attempts. This pattern wastes CPU cycles under contention when the lock is frequently unavailable.

As suggested in the previous review, add exponential backoff between retry attempts with Thread.sleep(Math.min(1L << counter, 100L)) and handle InterruptedException by restoring the interrupt flag and propagating LookupMapLockAcquisitionException.

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

483-483: Simplify boolean comparisons.

The expressions current.isClosed() == true (line 483) and current.mapReadLock.isHeldByCurrentThread() == true (line 498) contain redundant boolean comparisons.

Apply this diff to simplify:

-                    if (current != null && current.isClosed() == true) {
+                    if (current != null && current.isClosed()) {
-                if (success == false && current != null && current.mapReadLock.isHeldByCurrentThread() == true) {
+                if (!success && current != null && current.mapReadLock.isHeldByCurrentThread()) {

Also applies to: 498-498

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1d42f98 and 4b247a0.

📒 Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (5)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
🔇 Additional comments (6)
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)

717-731: Behavior change: retries now only on version conflicts in onComplete

The retry path in onComplete now exclusively handles VersionConflictEngineException (via isConflictException) bounded by retryOnConflict, with the previous LookupMapLockAcquisitionException-specific retry branch removed. That seems consistent with moving lock-acquisition retry logic down into CompositeIndexWriter and keeps this layer from depending on engine-internal exception types.

Please double-check that:

  • LookupMapLockAcquisitionException is fully retried/consumed within CompositeIndexWriter (or below), and
  • it does not bubble back up to this point expecting a retry here anymore,

so we don’t inadvertently change failure behavior for callers that previously relied on the bulk action to transparently retry on transient lookup-map lock contention.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (5)

210-210: Visibility changes support the new retry mechanism.

The removal of final modifiers from inner classes and the change of mapReadLock to package-private access are necessary to support the refactored retry logic. The package-private access on mapReadLock (line 215) enables LiveIndexWriterDeletesMap.computeIndexWriterIfAbsentForCriteria to call tryAcquire() directly at line 482.

Also applies to: 215-215, 301-301, 406-406, 408-408


466-471: Proper parameter propagation for configurable retry limit.

The addition of maxRetryOnLookupMapAcquisitionException parameter enables configurable retry behavior, and the value is correctly sourced from engineConfig.getIndexSettings().

Also applies to: 675-680


691-693: Simplified iteration improves readability.

The refactoring from stream-based iteration to direct iteration over criteriaBasedIndexWriterMap.values() is cleaner and avoids unnecessary intermediate operations.

Also applies to: 702-704


731-750: Proper tragic exception checking across all writers.

The iteration through both current and old writer maps to detect tragic exceptions is thorough and correctly checks if writers are closed before accessing their tragic exception state.


753-774: Correct synchronization and state checks.

The ramBytesUsed() method properly acquires write locks before iterating, and the rollback() method correctly checks if writers are open before attempting rollback operations.

Also applies to: 794-811

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for 4b247a0: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (1)

44-46: Retry wiring in tests looks correct; consider simplifying the mocking for readability.

  • Updating all computeIndexWriterIfAbsentForCriteria invocations to pass MAX_NUMBER_OF_RETRIES keeps the tests consistent with the new API and the retry semantics; this looks correct.
  • testMaxRetryCountWhenWriteLockDuringIndexing correctly verifies that tryAcquire() is invoked exactly MAX_NUMBER_OF_RETRIES times when the lock is never obtained, and the LookupMapLockAcquisitionException is thrown as expected.

As a minor test ergonomics tweak, you could stub and verify directly on writerLock instead of going through map.current.mapReadLock in the when(...) and verify(...) calls. That would make the test a bit less coupled to the internal layout of LiveIndexWriterDeletesMap and CriteriaBasedIndexWriterLookup while preserving the behavior being asserted.

Also applies to: 72-77, 141-146, 197-202, 208-227

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

210-215: Visibility and mutability changes for nested types are acceptable but could use an explicit “for testing” annotation.

Making CriteriaBasedIndexWriterLookup and CriteriaBasedWriterLock more visible, and relaxing mapReadLock and LiveIndexWriterDeletesMap.current from final, is understandable to support the new tests that need to mock and override these internals.

To keep the public surface area tidy and signal intent, consider adding an explicit @opensearch.internal (or similar) Javadoc tag or comment on these nested types/fields indicating that they are exposed primarily for testing. That helps discourage external production code from depending on them and makes future refactors easier.

Also applies to: 301-301, 406-412

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b247a0 and dbcae67.

📒 Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
  • server/src/main/java/org/opensearch/index/IndexSettings.java
  • CHANGELOG.md
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

87-87: LGTM!

The import is necessary for the Collectors.toSet() operation used in the getCompositeFieldTypes() method below.


694-696: Verify initialization safety, caller expectations, and method naming alignment.

The method now filters to return only StarTreeMapper.StarTreeFieldType instances from compositeMappedFieldTypes:

  1. Potential NPE risk: Verify that getCompositeFieldTypes() is never called before internalMerge() initializes compositeMappedFieldTypes at line 552. If called during early initialization phases, .stream() could fail on a null reference.

  2. Semantic narrowing: Confirm whether the method name getCompositeFieldTypes() still accurately reflects its behavior. If other composite field type implementations exist or may be added, consider renaming to getStarTreeFieldTypes() or updating documentation to clarify the filtering behavior.

  3. Performance: If getCompositeFieldTypes() is called frequently in hot paths, consider caching the filtered result to avoid recreating the set on each invocation.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

691-705: Iteration refactors over current/old writer maps look correct and improve clarity.

  • getFlushingBytes() and getPendingNumDocs() now iterate directly over liveIndexWriterDeletesMap.current.criteriaBasedIndexWriterMap.values(), summing per-child metrics before adding the accumulating writer’s values. This preserves behavior and is straightforward.
  • getTragicException() now checks both current and old child writers for a tragic exception before falling back to the accumulating writer, which ensures group-specific failures are surfaced.
  • ramBytesUsed() and rollback() explicitly iterate over both current and old writers, and the use of mapWriteLock.acquire() around the ramBytesUsed() traversals is appropriate for a consistent snapshot.

Overall, these loops are clear and consistent with the data structures being used; no issues from a correctness or concurrency standpoint.

Also applies to: 731-742, 757-772, 796-805

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

❌ Gradle check result for dbcae67: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

694-696: Consider renaming method to reflect StarTree-specific filtering.

The method name getCompositeFieldTypes() suggests it returns all composite field types, but it now filters to return only StarTreeMapper.StarTreeFieldType instances. This could mislead callers who expect all composite types.

Consider renaming to getStarTreeFieldTypes() or adding a clarifying comment:

+    /**
+     * Returns only StarTree composite field types.
+     * For all composite field types, use getCompositeFieldTypesFromMapper().
+     */
     public Set<CompositeMappedFieldType> getCompositeFieldTypes() {
         return compositeMappedFieldTypes.stream()
             .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType)
             .collect(Collectors.toSet());
     }

694-696: Consider caching the filtered result to avoid repeated stream operations.

The method creates a new stream, filters, and collects to a Set on every invocation. Since compositeMappedFieldTypes only changes during merge operations (line 552), the filtered result could be cached in a separate volatile field and updated alongside compositeMappedFieldTypes.

Example optimization:

 private volatile Set<CompositeMappedFieldType> compositeMappedFieldTypes;
+private volatile Set<CompositeMappedFieldType> starTreeFieldTypes;
 
 // In internalMerge() after line 552:
 this.compositeMappedFieldTypes = getCompositeFieldTypesFromMapper();
+this.starTreeFieldTypes = compositeMappedFieldTypes.stream()
+    .filter(type -> type instanceof StarTreeMapper.StarTreeFieldType)
+    .collect(Collectors.toSet());
 
 public Set<CompositeMappedFieldType> getCompositeFieldTypes() {
-    return compositeMappedFieldTypes.stream()
-        .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType)
-        .collect(Collectors.toSet());
+    return starTreeFieldTypes;
 }
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

475-489: Consider adding exponential backoff between retry attempts.

The retry loop repeatedly calls tryAcquire() without delay, creating a busy-wait pattern. With a default max retry of 100 (configurable up to 500), this can waste significant CPU cycles under contention.

A past review comment suggested adding exponential backoff, but the current code doesn't implement it. Consider adding a small delay (e.g., Thread.sleep(Math.min(1L << counter, 100L))) between attempts, with appropriate InterruptedException handling.

 int counter = 0;
 while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) {
     // This function acquires a first read lock on a map which does not have any write lock present. Current keeps
     // on getting rotated during refresh, so there will be one current on which read lock can be obtained.
     // Validate that no write lock is applied on the map and the map is not closed. Idea here is write lock was
     // never applied on this map as write lock gets only during closing time. We are doing this instead of acquire,
     // because acquire can also apply a read lock in case refresh completed and map is closed.
     current = this.current.mapReadLock.tryAcquire();
     if (current != null && current.isClosed() == true) {
         current.mapReadLock.close();
         current = null;
     }

+    if (current == null && counter < maxRetryOnLookupMapAcquisitionException - 1) {
+        try {
+            Thread.sleep(Math.min(1L << counter, 100L));
+        } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            throw new LookupMapLockAcquisitionException(shardId, "Interrupted during retry", e);
+        }
+    }
     ++counter;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbcae67 and 7ee65b1.

📒 Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.md
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: LGTM! Increased retry limits for lock acquisition.

The changes to INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTION appropriately raise the default from 15 to 100 and the maximum from 100 to 500, providing better tolerance for lock contention scenarios during index writes with context-aware grouping criteria.

server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

87-87: LGTM!

The Collectors import is necessary for the stream operations added in getCompositeFieldTypes().


694-696: Verify that filtering to StarTree types only doesn't break existing functionality.

This change narrows the return value to only StarTreeMapper.StarTreeFieldType instances. Ensure that:

  1. Other composite field type implementations (if any) are correctly handled elsewhere
  2. The lookup sets (fieldsPartOfCompositeMappings, nestedFieldsPartOfCompositeMappings) built from all compositeMappedFieldTypes remain sufficient for other composite types
  3. External callers of getCompositeFieldTypes() and isCompositeIndexPresent() are not expecting all composite field types

To verify: Search the codebase for other CompositeMappedFieldType implementations, all call sites of getCompositeFieldTypes() and isCompositeIndexPresent(), and confirm how the filtered results are used downstream.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)

730-730: LGTM: Retry logic correctly moved to lower layer.

The removal of the exception-driven retry loop simplifies the bulk action layer. Retry handling for lookup map lock acquisition is now managed within CompositeIndexWriter.computeIndexWriterIfAbsentForCriteria (with bounded tryAcquire loops and a configurable max retry parameter), which provides better encapsulation.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)

497-502: LGTM: Finally block correctly handles lock cleanup.

The finally block now uses assert for the isHeldByCurrentThread() check, which addresses the previous concern about UnsupportedOperationException when assertions are disabled. In production, the lock is unconditionally closed on error; in tests/debug builds, ownership is validated.


692-693: LGTM: Improved encapsulation in iteration patterns.

The refactoring to iterate directly over DisposableIndexWriter values (instead of extracting and operating on IndexWriter instances) improves encapsulation and reduces coupling. The consistent pattern across getFlushingBytes(), getPendingNumDocs(), getTragicException(), ramBytesUsed(), and rollback() enhances maintainability.

Also applies to: 703-704, 732-743, 759-772, 797-806


210-210: Verify that visibility relaxations don't expose internal state inappropriately.

The removal of final modifiers from CriteriaBasedIndexWriterLookup, CriteriaBasedWriterLock, and LiveIndexWriterDeletesMap, along with widening mapReadLock to package-private and making current non-final, enables mocking in tests.

While these changes support testing, ensure they don't inadvertently expose internal state or allow unintended subclassing or mutation by other classes in the package. Specifically, verify that:

  • mapReadLock (now package-private) is not accessed outside of test and core composite writer code
  • The inner classes cannot be extended by unintended classes in the same package
  • current (now non-final) is not reassigned outside controlled contexts
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (2)

72-77: LGTM: Test calls updated consistently with new signature.

The calls to computeIndexWriterIfAbsentForCriteria correctly pass the new MAX_NUMBER_OF_RETRIES parameter, maintaining existing test coverage while adapting to the updated signature.

Also applies to: 141-146, 197-202


44-46: LGTM: New test provides good coverage of retry exhaustion.

The new testMaxRetryCountWhenWriteLockDuringIndexing test effectively validates the bounded retry behavior by:

  • Mocking lock acquisition to always fail (return null)
  • Asserting that LookupMapLockAcquisitionException is thrown after exhausting retries
  • Verifying via Mockito that tryAcquire() is called exactly MAX_NUMBER_OF_RETRIES times

This complements the existing integration test at line 179 and provides focused unit-level coverage of the retry mechanism.

Also applies to: 208-227

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

694-696: Consider renaming method to reflect StarTree-specific filtering.

The method name getCompositeFieldTypes() suggests it returns all composite field types, but it now filters to return only StarTreeMapper.StarTreeFieldType instances. This could mislead callers who expect all composite types.

Consider renaming to getStarTreeFieldTypes() or adding a clarifying comment:

+    /**
+     * Returns only StarTree composite field types.
+     * For all composite field types, use getCompositeFieldTypesFromMapper().
+     */
     public Set<CompositeMappedFieldType> getCompositeFieldTypes() {
         return compositeMappedFieldTypes.stream()
             .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType)
             .collect(Collectors.toSet());
     }

694-696: Consider caching the filtered result to avoid repeated stream operations.

The method creates a new stream, filters, and collects to a Set on every invocation. Since compositeMappedFieldTypes only changes during merge operations (line 552), the filtered result could be cached in a separate volatile field and updated alongside compositeMappedFieldTypes.

Example optimization:

 private volatile Set<CompositeMappedFieldType> compositeMappedFieldTypes;
+private volatile Set<CompositeMappedFieldType> starTreeFieldTypes;
 
 // In internalMerge() after line 552:
 this.compositeMappedFieldTypes = getCompositeFieldTypesFromMapper();
+this.starTreeFieldTypes = compositeMappedFieldTypes.stream()
+    .filter(type -> type instanceof StarTreeMapper.StarTreeFieldType)
+    .collect(Collectors.toSet());
 
 public Set<CompositeMappedFieldType> getCompositeFieldTypes() {
-    return compositeMappedFieldTypes.stream()
-        .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType)
-        .collect(Collectors.toSet());
+    return starTreeFieldTypes;
 }
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

475-489: Consider adding exponential backoff between retry attempts.

The retry loop repeatedly calls tryAcquire() without delay, creating a busy-wait pattern. With a default max retry of 100 (configurable up to 500), this can waste significant CPU cycles under contention.

A past review comment suggested adding exponential backoff, but the current code doesn't implement it. Consider adding a small delay (e.g., Thread.sleep(Math.min(1L << counter, 100L))) between attempts, with appropriate InterruptedException handling.

 int counter = 0;
 while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) {
     // This function acquires a first read lock on a map which does not have any write lock present. Current keeps
     // on getting rotated during refresh, so there will be one current on which read lock can be obtained.
     // Validate that no write lock is applied on the map and the map is not closed. Idea here is write lock was
     // never applied on this map as write lock gets only during closing time. We are doing this instead of acquire,
     // because acquire can also apply a read lock in case refresh completed and map is closed.
     current = this.current.mapReadLock.tryAcquire();
     if (current != null && current.isClosed() == true) {
         current.mapReadLock.close();
         current = null;
     }

+    if (current == null && counter < maxRetryOnLookupMapAcquisitionException - 1) {
+        try {
+            Thread.sleep(Math.min(1L << counter, 100L));
+        } catch (InterruptedException e) {
+            Thread.currentThread().interrupt();
+            throw new LookupMapLockAcquisitionException(shardId, "Interrupted during retry", e);
+        }
+    }
     ++counter;
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dbcae67 and 7ee65b1.

📒 Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.md
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: LGTM! Increased retry limits for lock acquisition.

The changes to INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTION appropriately raise the default from 15 to 100 and the maximum from 100 to 500, providing better tolerance for lock contention scenarios during index writes with context-aware grouping criteria.

server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)

87-87: LGTM!

The Collectors import is necessary for the stream operations added in getCompositeFieldTypes().


694-696: Verify that filtering to StarTree types only doesn't break existing functionality.

This change narrows the return value to only StarTreeMapper.StarTreeFieldType instances. Ensure that:

  1. Other composite field type implementations (if any) are correctly handled elsewhere
  2. The lookup sets (fieldsPartOfCompositeMappings, nestedFieldsPartOfCompositeMappings) built from all compositeMappedFieldTypes remain sufficient for other composite types
  3. External callers of getCompositeFieldTypes() and isCompositeIndexPresent() are not expecting all composite field types

To verify: Search the codebase for other CompositeMappedFieldType implementations, all call sites of getCompositeFieldTypes() and isCompositeIndexPresent(), and confirm how the filtered results are used downstream.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)

730-730: LGTM: Retry logic correctly moved to lower layer.

The removal of the exception-driven retry loop simplifies the bulk action layer. Retry handling for lookup map lock acquisition is now managed within CompositeIndexWriter.computeIndexWriterIfAbsentForCriteria (with bounded tryAcquire loops and a configurable max retry parameter), which provides better encapsulation.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)

497-502: LGTM: Finally block correctly handles lock cleanup.

The finally block now uses assert for the isHeldByCurrentThread() check, which addresses the previous concern about UnsupportedOperationException when assertions are disabled. In production, the lock is unconditionally closed on error; in tests/debug builds, ownership is validated.


692-693: LGTM: Improved encapsulation in iteration patterns.

The refactoring to iterate directly over DisposableIndexWriter values (instead of extracting and operating on IndexWriter instances) improves encapsulation and reduces coupling. The consistent pattern across getFlushingBytes(), getPendingNumDocs(), getTragicException(), ramBytesUsed(), and rollback() enhances maintainability.

Also applies to: 703-704, 732-743, 759-772, 797-806


210-210: Verify that visibility relaxations don't expose internal state inappropriately.

The removal of final modifiers from CriteriaBasedIndexWriterLookup, CriteriaBasedWriterLock, and LiveIndexWriterDeletesMap, along with widening mapReadLock to package-private and making current non-final, enables mocking in tests.

While these changes support testing, ensure they don't inadvertently expose internal state or allow unintended subclassing or mutation by other classes in the package. Specifically, verify that:

  • mapReadLock (now package-private) is not accessed outside of test and core composite writer code
  • The inner classes cannot be extended by unintended classes in the same package
  • current (now non-final) is not reassigned outside controlled contexts
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (2)

72-77: LGTM: Test calls updated consistently with new signature.

The calls to computeIndexWriterIfAbsentForCriteria correctly pass the new MAX_NUMBER_OF_RETRIES parameter, maintaining existing test coverage while adapting to the updated signature.

Also applies to: 141-146, 197-202


44-46: LGTM: New test provides good coverage of retry exhaustion.

The new testMaxRetryCountWhenWriteLockDuringIndexing test effectively validates the bounded retry behavior by:

  • Mocking lock acquisition to always fail (return null)
  • Asserting that LookupMapLockAcquisitionException is thrown after exhausting retries
  • Verifying via Mockito that tryAcquire() is called exactly MAX_NUMBER_OF_RETRIES times

This complements the existing integration test at line 179 and provides focused unit-level coverage of the retry mechanism.

Also applies to: 208-227

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

❌ Gradle check result for 7ee65b1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

✅ Gradle check result for 7ee65b1: SUCCESS

@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.20%. Comparing base (d47931e) to head (11f7776).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
.../opensearch/index/engine/CompositeIndexWriter.java 48.57% 9 Missing and 9 partials ⚠️
.../index/mapper/ContextAwareGroupingFieldMapper.java 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20145      +/-   ##
============================================
- Coverage     73.30%   73.20%   -0.10%     
+ Complexity    71732    71706      -26     
============================================
  Files          5793     5793              
  Lines        328056   328047       -9     
  Branches      47245    47243       -2     
============================================
- Hits         240476   240162     -314     
- Misses        68264    68596     +332     
+ Partials      19316    19289      -27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +483 to +486
if (current != null && current.isClosed() == true) {
current.mapReadLock.close();
current = null;
}
Copy link
Contributor

@Bukhtawar Bukhtawar Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this logic centrally in close(). Also didn't quite understand why the close operation done as a part of write lock acquisition doesn't handle this logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the scenario where try acquire succeded in obtaining the lock on the current writer but the map itself rotated and the writer got closed. In that case, we close the old writer as we retry to obtain lock again on the current. As this ensures the lock is correctly released on the old writer before we try acquiring lock on new writer.

Comment on lines +803 to +805
for (DisposableIndexWriter disposableIndexWriter : liveIndexWriterDeletesMap.old.criteriaBasedIndexWriterMap.values()) {
if (disposableIndexWriter.getIndexWriter().isOpen() == true) {
disposableIndexWriter.getIndexWriter().rollback();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be thread-safe for instance the index writer might be closed while we are doing a rollback?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

❌ Gradle check result for 4f48472: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

462-465: Potential criteria lookup bug across current/old maps when resolving writers by UID

In LiveIndexWriterDeletesMap:

String getCriteriaForDoc(BytesRef key) {
    return current.getCriteriaForDoc(key);
}

and in CompositeIndexWriter.getIndexWriterForIdFromLookup:

String criteria = getCriteriaForDoc(uid);
...
DisposableIndexWriter disposableIndexWriter = indexWriterLookup.getIndexWriterForCriteria(criteria);

getIndexWriterForIdFromLookup is used for both:

  • getIndexWriterForIdFromCurrent (passing liveIndexWriterDeletesMap.current), and
  • getIndexWriterForIdFromOld (passing liveIndexWriterDeletesMap.old).

But getCriteriaForDoc(uid) always queries the current lookup, never the indexWriterLookup passed in. Combined with putCriteriaForDoc only updating the current lookup, this means:

  • After a refresh, documents whose criteria were recorded in the map that is now old will still have their criteria stored in old.criteria, not in the new current map.
  • getIndexWriterForIdFromOld acquires the old map’s read lock but looks up the criteria only in current, so it will often fail to find the criteria for pre‑refresh documents and return null, skipping the intended partial delete on the old child writer.

A minimal local fix is to have the map consult both current and old when resolving criteria:

-        String getCriteriaForDoc(BytesRef key) {
-            return current.getCriteriaForDoc(key);
-        }
+        String getCriteriaForDoc(BytesRef key) {
+            // Prefer the current map, but fall back to old for documents that were
+            // written before the last refresh and still live in the old map.
+            String criteria = current.getCriteriaForDoc(key);
+            if (criteria == null) {
+                criteria = old.getCriteriaForDoc(key);
+            }
+            return criteria;
+        }

This keeps the getIndexWriterForIdFromLookup API untouched while ensuring deletes/update clean‑up can still locate writers in both generations of the lookup map.

Also applies to: 630-647, 655-657

🧹 Nitpick comments (4)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (4)

210-215: Visibility & mutability changes for lookup/lock classes – consider tightening API surface

CriteriaBasedIndexWriterLookup is now public, LiveIndexWriterDeletesMap is non‑final, and mapReadLock is package‑visible and no longer final. This all widens the surface for external or test code to interact with internal locking structures.

If the only consumer is tests, consider:

  • Making CriteriaBasedIndexWriterLookup package‑private instead of public.
  • Keeping mapReadLock private final and exposing it via a package‑private accessor (possibly annotated @VisibleForTesting).
  • Keeping LiveIndexWriterDeletesMap final unless subclassing is required.

This preserves invariants while still allowing test access.

Also applies to: 301-305, 406-409


466-471: Bounded lock‑acquisition retry logic looks sound; document lock‑lifecycle expectations for callers

The new computeIndexWriterIfAbsentForCriteria(..., ShardId shardId, int maxRetryOnLookupMapAcquisitionException) plus the while loop using mapReadLock.tryAcquire() correctly:

  • Bounds retries by maxRetryOnLookupMapAcquisitionException.
  • Avoids using a CriteriaBasedIndexWriterLookup that has been marked closed (and closes its read lock before retrying).
  • Ensures that on failure (success == false) any acquired read lock is released, while on success the read lock remains held for the caller to release later.

Two follow‑ups to keep this robust:

  • Call‑site discipline: every successful call must be paired with a later current.mapReadLock.close() (as done by the try‑with‑resources around getLookupMap().getMapReadLock() in the indexing paths). It’s worth adding a short comment to this method indicating that the read lock remains held on the success path and must be closed by the caller.
  • Config validation: ensure IndexSettings.getMaxRetryOnLookupMapAcquisitionException() never returns a negative value; a value of 0 is handled (immediate LookupMapLockAcquisitionException), but negative values would skip the loop and behave the same as 0, which may surprise operators.

Overall, the retry/error path and assertion‑only use of isHeldByCurrentThread() avoid the earlier UnsupportedOperationException risk.

Also applies to: 475-489, 498-503, 671-681


692-694: Flushing/pending stats only account for current child writers; consider including old for completeness

getFlushingBytes() and getPendingNumDocs() now iterate only over:

liveIndexWriterDeletesMap.current.criteriaBasedIndexWriterMap.values()

and ignore liveIndexWriterDeletesMap.old. This is consistent with the existing TODO in getPendingNumDocs, but note that:

  • ramBytesUsed() and getTragicException() do consider both current and old maps.
  • During refresh, old writers can still hold in‑flight bytes/docs that won’t be reflected in these metrics.

If you want these stats to reflect all outstanding work (not just the newest generation), consider also iterating over liveIndexWriterDeletesMap.old.criteriaBasedIndexWriterMap.values() here, or at least documenting that these methods intentionally report only the current generation.

Also applies to: 703-705


754-775: ramBytesUsed locking strategy is correct but could share code for current/old loops

ramBytesUsed() now:

  • Acquires mapWriteLock on current and old in two separate try (ReleasableLock ...) blocks.
  • Iterates criteriaBasedIndexWriterMap.values() and sums ramBytesUsed() only for open writers.

This is a solid pattern: taking the write lock avoids concurrent rotation/mutation while measuring, and filtering by isOpen() avoids touching already‑closed writers.

To reduce duplication and risk of future drift between the two blocks, consider extracting a small helper that sums RAM usage for a single CriteriaBasedIndexWriterLookup, and call it for both current and old. Behaviour would remain unchanged.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee65b1 and 4f48472.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java (2 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
🧰 Additional context used
🧬 Code graph analysis (1)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (1)
server/src/main/java/org/opensearch/search/aggregations/metrics/TopHitsAggregator.java (1)
  • Collectors (85-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
🔇 Additional comments (9)
server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java (1)

186-199: LGTM - appropriate no-op overrides for context-aware field mapper.

The empty implementations for canDeriveSource() and deriveSource() correctly exempt this synthetic field from source derivation since it's not part of the ingested document. The Javadoc clearly explains the rationale.

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: Significant increase in retry defaults - ensure monitoring is in place.

The default retry count increased from 15 to 100, and the maximum from 100 to 500. This addresses the indexing regression by allowing more retries during lock acquisition contention. However, under heavy contention, this could increase indexing latency.

Consider adding metrics/logging around retry counts so operators can monitor lock contention and tune this setting appropriately.

server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1)

106-106: LGTM - appropriate test constant for retry behavior verification.

Using a smaller value (20) than the production default (100) is a reasonable choice to keep test execution times manageable while still validating retry behavior.

server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (3)

44-46: LGTM - necessary imports for retry verification tests.


71-77: LGTM - updated to pass retry parameter.

The call sites are correctly updated to pass MAX_NUMBER_OF_RETRIES to match the new method signature.


208-227: Good test for validating bounded retry behavior.

The test correctly verifies that:

  1. LookupMapLockAcquisitionException is thrown after retries are exhausted
  2. tryAcquire() is called exactly MAX_NUMBER_OF_RETRIES times

Note: The direct field assignment pattern (map.current = mock(...), map.current.mapReadLock = writerLock) relies on these fields having package-private visibility, which aligns with the visibility changes made in CompositeIndexWriter.

server/src/main/java/org/opensearch/index/mapper/MapperService.java (1)

693-697: Verify the behavioral change of excluding non-StarTreeFieldType composite mappings.

This change narrows getCompositeFieldTypes() to return only StarTreeFieldType instances, excluding any other CompositeMappedFieldType implementations (e.g., ContextAwareGroupingFieldMapper). Ensure this is the intended behavior and that all callers of this method do not rely on receiving other composite field types.

Additionally, consider caching the filtered result instead of streaming on each call if this method is invoked frequently.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (2)

732-736: Including both current and old child writers in getTragicException is a good improvement

The updated getTragicException() now scans both current and old criteriaBasedIndexWriterMap.values() before falling back to the accumulating writer. This avoids dropping tragic exceptions that occur in writers that have just rotated to the old map but haven’t yet been merged/closed.

The additional isOpen() == false guard also prevents spurious reads while writers are still healthy. This change looks correct and improves observability of failures.

Also applies to: 739-743


795-805: Rollback over both current and old child writers looks consistent with live‑map semantics

The rollback() implementation now explicitly iterates over:

liveIndexWriterDeletesMap.current.criteriaBasedIndexWriterMap.values()
liveIndexWriterDeletesMap.old.criteriaBasedIndexWriterMap.values()

and calls rollback() on open child writers before rolling back the accumulating writer and marking the composite as closed. This matches the two‑generation LiveIndexWriterDeletesMap model and ensures no child writer is left unrolled back.

There is still inherent race potential if some other code closes a child writer concurrently, but given the error‑path nature of rollback() and the isOpen() guard, this is acceptable and not a regression from prior behaviour.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
CHANGELOG.md (1)

99-104: Deduplicate entry and normalize PR link formatting.

There is now a duplicate EngineConfig.toBuilder fix (20105) and both new PR links (20105, 20145) lack the leading #, and one is missing a space before (. To keep the changelog clean and consistent with surrounding entries, drop the duplicate line and fix link formatting:

- - Fix toBuilder method in EngineConfig to include mergedSegmentTransferTracker([#20105](https://github.com/opensearch-project/OpenSearch/pull/20105))
- - Fix toBuilder method in EngineConfig to include mergedSegmentTransferTracker([20105](https://github.com/opensearch-project/OpenSearch/pull/20105))
- - Fix indexing regression and bug fixes for grouping criteria. ([20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
+ - Fix toBuilder method in EngineConfig to include mergedSegmentTransferTracker ([#20105](https://github.com/opensearch-project/OpenSearch/pull/20105))
+ - Fix indexing regression and bug fixes for grouping criteria ([#20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (1)

87-88: Clarify that getCompositeFieldTypes now returns only StarTree-based composites.

The method now filters compositeMappedFieldTypes down to StarTreeMapper.StarTreeFieldType instances before returning. That’s fine if only StarTree-based composites are ever expected here, but the method name reads like it returns all composite field types.

Consider either:

  • Updating the Javadoc to state it only returns StarTree composite field types, or
  • Introducing a more specific helper (e.g., getStarTreeFieldTypes) and delegating from existing callers,

to avoid surprises if other composite index types are added later.

Also applies to: 693-697

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

210-215: Limit exposure of internal lookup/lock types if only used for testing.

CriteriaBasedIndexWriterLookup and CriteriaBasedWriterLock are now public/non-final, and mapReadLock and LiveIndexWriterDeletesMap.current are mutable, primarily to support tests that override these internals.

If these types and fields are not meant for general external use, consider:

  • Keeping them package-private instead of public, and/or
  • Adding a brief comment (e.g., “visible for testing”) near the loosened visibility/mutability,

to signal that they’re internal implementation details and avoid accidental production coupling.

Also applies to: 301-381, 406-409

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f48472 and 1c15758.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java (1 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
  • server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: detect-breaking-change
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
🔇 Additional comments (4)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: Retry-setting bound changes look consistent with new lock-acquisition retry logic.

Raising the default/max for index.context_aware.max_retry_on_lookup_map_acquisition_exception (100 / 500) is consistent with delegating lock-acquisition retries into CompositeIndexWriter. No structural issues seen with the updated bounds or flags.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)

717-733: Update retry behavior now only handles version conflicts.

The onComplete path for updates now retries only on VersionConflictEngineException; special retry handling for lookup-map lock acquisition failures has been removed, which matches delegating that retry logic down into the index writer layer.

No issues from a bulk action perspective.

server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (1)

44-47: Tests correctly cover new lookup-map retry semantics.

The updates to pass MAX_NUMBER_OF_RETRIES into computeIndexWriterIfAbsentForCriteria, and the new testMaxRetryCountWhenWriteLockDuringIndexing that mocks map.current/writerLock and asserts tryAcquire() is called exactly MAX_NUMBER_OF_RETRIES, provide good coverage of the bounded-retry and failure behavior.

The direct field manipulation of LiveIndexWriterDeletesMap.current and mapReadLock does couple the test to internals, but it’s reasonable here to validate the retry loop.

Also applies to: 72-78, 141-147, 195-203, 208-227

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

466-503: Bounded lock-acquisition retry and settings wiring look correct.

The new computeIndexWriterIfAbsentForCriteria in LiveIndexWriterDeletesMap:

  • Uses a bounded loop over tryAcquire() on current.mapReadLock, checking current == null || current.isClosed() and retrying up to maxRetryOnLookupMapAcquisitionException.
  • Throws LookupMapLockAcquisitionException with the shard id once the retry budget is exhausted.
  • Ensures any acquired read lock is released on failure in the finally block.

CompositeIndexWriter.computeIndexWriterIfAbsentForCriteria correctly threads through:

  • The current LiveIndexWriterDeletesMap instance, and
  • engineConfig.getIndexSettings().getMaxRetryOnLookupMapAcquisitionException(),

so behavior is fully governed by the new index setting.

No correctness issues stand out in the new retry logic; just be aware that increasing the max to higher values will proportionally extend worst-case wait time before surfacing LookupMapLockAcquisitionException.

Also applies to: 671-681

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

❌ Gradle check result for 1c15758: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

210-241: Lock leak in retry loop can block refresh and cause deadlocks

The new LiveIndexWriterDeletesMap.computeIndexWriterIfAbsentForCriteria implementation acquires a read lock via this.current.mapReadLock.tryAcquire() but never releases it on the success path:

  • On each successful tryAcquire(), the underlying ReentrantReadWriteLock read lock is taken.
  • success is set to true after creating/looking up the writer, so the finally block does not close current.mapReadLock.
  • As a result, at least one read lock is left held on the old map’s mapLock.

During beforeRefresh() you then do:

liveIndexWriterDeletesMap = liveIndexWriterDeletesMap.buildTransitionMap();
try (Releasable ignore = liveIndexWriterDeletesMap.old.mapWriteLock.acquire(), ...) {
    ...
}

old.mapWriteLock.acquire() needs all read locks on the old map to be released. Since the compute method never closes the read lock on success, this write-lock acquisition can block indefinitely, stalling refresh and rotation (and any logic that depends on it).

The new tests exercise the failure path (write lock held / tryAcquire() returning null) but do not cover the successful acquisition path or verify that the read lock is released. This bug is therefore not caught by the current tests.

You can fix this by tracking the successfully acquired CriteriaBasedWriterLock separately and always closing it in finally, regardless of success/failure, while still preserving the bounded retry semantics. For example:

@@
-        DisposableIndexWriter computeIndexWriterIfAbsentForCriteria(
-            String criteria,
-            CheckedBiFunction<String, CriteriaBasedIndexWriterLookup, DisposableIndexWriter, IOException> indexWriterSupplier,
-            ShardId shardId,
-            int maxRetryOnLookupMapAcquisitionException
-        ) {
-            boolean success = false;
-            CriteriaBasedIndexWriterLookup current = null;
-            try {
-                int counter = 0;
-                while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) {
-                    // This function acquires a first read lock on a map which does not have any write lock present. Current keeps
-                    // on getting rotated during refresh, so there will be one current on which read lock can be obtained.
-                    // Validate that no write lock is applied on the map and the map is not closed. Idea here is write lock was
-                    // never applied on this map as write lock gets only during closing time. We are doing this instead of acquire,
-                    // because acquire can also apply a read lock in case refresh completed and map is closed.
-                    current = this.current.mapReadLock.tryAcquire();
-                    if (current != null && current.isClosed() == true) {
-                        current.mapReadLock.close();
-                        current = null;
-                    }
-
-                    ++counter;
-                }
-
-                if (current == null || current.isClosed()) {
-                    throw new LookupMapLockAcquisitionException(shardId, "Unable to obtain lock on the current Lookup map", null);
-                }
-                DisposableIndexWriter writer = current.computeIndexWriterIfAbsentForCriteria(criteria, indexWriterSupplier);
-                success = true;
-                return writer;
-            } finally {
-                if (success == false && current != null) {
-                    assert current.mapReadLock.isHeldByCurrentThread() == true;
-                    current.mapReadLock.close();
-                }
-            }
-        }
+        DisposableIndexWriter computeIndexWriterIfAbsentForCriteria(
+            String criteria,
+            CheckedBiFunction<String, CriteriaBasedIndexWriterLookup, DisposableIndexWriter, IOException> indexWriterSupplier,
+            ShardId shardId,
+            int maxRetryOnLookupMapAcquisitionException
+        ) {
+            CriteriaBasedIndexWriterLookup current = null;
+            CriteriaBasedWriterLock acquiredLock = null;
+            try {
+                int counter = 0;
+                while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) {
+                    // This function acquires a read lock on the current map. Current keeps getting rotated during refresh,
+                    // so there will eventually be a map on which a read lock can be obtained.
+                    final CriteriaBasedWriterLock candidateLock = this.current.mapReadLock;
+                    final CriteriaBasedIndexWriterLookup lookup = candidateLock.tryAcquire();
+                    if (lookup == null) {
+                        // Could not obtain the read lock due to a concurrent writer; retry.
+                        ++counter;
+                        continue;
+                    }
+                    if (lookup.isClosed()) {
+                        // Map was already closed; release the lock and retry.
+                        candidateLock.close();
+                        ++counter;
+                        continue;
+                    }
+                    current = lookup;
+                    acquiredLock = candidateLock;
+                    break;
+                }
+
+                if (current == null || current.isClosed()) {
+                    throw new LookupMapLockAcquisitionException(shardId, "Unable to obtain lock on the current Lookup map", null);
+                }
+
+                return current.computeIndexWriterIfAbsentForCriteria(criteria, indexWriterSupplier);
+            } finally {
+                if (acquiredLock != null) {
+                    acquiredLock.close();
+                }
+            }
+        }

This preserves the bounded retry behavior and ensures every successful tryAcquire() is matched with a corresponding close(), so beforeRefresh() can still obtain the write lock on the old map.

You may also want to add a test that covers the successful acquisition path and asserts that mapReadLock.close() is called exactly once per successful tryAcquire() to guard against regressions.

Also applies to: 301-381, 406-417, 466-503

🧹 Nitpick comments (2)
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (1)

208-227: New max-retry test is precise and robust

testMaxRetryCountWhenWriteLockDuringIndexing cleanly validates that:

  • tryAcquire() is attempted exactly MAX_NUMBER_OF_RETRIES times, and
  • a LookupMapLockAcquisitionException is thrown when the read lock cannot be acquired.

Using a mocked CriteriaBasedIndexWriterLookup and CriteriaBasedWriterLock to control tryAcquire() is a good way to isolate this behavior.

If you want even stronger coverage, you could add a complementary test where tryAcquire() succeeds after N failed attempts and assert that tryAcquire() is not called more than necessary.

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

692-705: Iteration refactors over writer maps look correct

The refactors to iterate directly over criteriaBasedIndexWriterMap.values() for:

  • getFlushingBytes
  • getPendingNumDocs
  • getTragicException (both current and old maps)
  • ramBytesUsed (with write locks held)
  • rollback (for both current and old maps)

are straightforward and maintain the original intent:

  • Metrics and state are still aggregated across all relevant child writers.
  • ramBytesUsed and rollback correctly guard map traversal with the write lock.
  • The checks on isOpen() before calling ramBytesUsed(), getTragicException(), or rollback() remain intact.

No functional regressions spotted here.

If desired, you could factor the duplicated “iterate over writer map and apply X” pattern into a small helper to reduce repetition, but it’s not necessary for correctness.

Also applies to: 732-736, 739-743, 759-762, 767-770, 797-806

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c15758 and 11f7776.

📒 Files selected for processing (9)
  • CHANGELOG.md (1 hunks)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1 hunks)
  • server/src/main/java/org/opensearch/index/IndexSettings.java (1 hunks)
  • server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (10 hunks)
  • server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java (2 hunks)
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java (2 hunks)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java (0 hunks)
  • server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (4 hunks)
  • server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1 hunks)
💤 Files with no reviewable changes (1)
  • server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
  • server/src/main/java/org/opensearch/index/mapper/MapperService.java
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: gradle-check
  • GitHub Check: Analyze (java)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, macos-15-intel)
🔇 Additional comments (6)
server/src/main/java/org/opensearch/index/mapper/ContextAwareGroupingFieldMapper.java (2)

11-12: New imports are correctly scoped to source-derivation override

LeafReader and XContentBuilder are only used in the new deriveSource override, so adding these imports is appropriate and keeps dependencies minimal.


186-199: No-op overrides appropriately exclude CAS mapper from source derivation

Given parseCreateField already prevents this field from being ingested, overriding canDeriveSource and deriveSource as no-ops cleanly keeps the Context Aware Segment mapper out of generic source validation/generation, matching the Javadoc intent.

server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1)

106-106: Shared retry constant for tests looks good

Exposing MAX_NUMBER_OF_RETRIES as a test base constant is reasonable and keeps retry-related tests consistent; no issues from a correctness standpoint.

server/src/main/java/org/opensearch/index/IndexSettings.java (1)

499-506: Retry setting bounds change is reasonable, but behavior depends on caller

The new INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTION range (min 5, max 500, default 100) is fine and keeps retries bounded. The practical impact now fully depends on how getMaxRetryOnLookupMapAcquisitionException() is honored in CompositeIndexWriter’s retry loop, which is reviewed separately.

server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (1)

44-47: Test updates correctly track new retry-aware API

The added Mockito imports and the updated calls to computeIndexWriterIfAbsentForCriteria(..., new ShardId(...), MAX_NUMBER_OF_RETRIES) keep the tests in sync with the new method signature and with the shared retry constant from the base class. The wiring looks consistent across all affected tests.

Also applies to: 72-77, 141-146, 197-203

server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)

671-681: Index settings are correctly wired into retry behavior

The wrapper computeIndexWriterIfAbsentForCriteria now forwards both the shard ID and engineConfig.getIndexSettings().getMaxRetryOnLookupMapAcquisitionException() into the LiveIndexWriterDeletesMap retry loop. This cleanly centralizes retry configuration in IndexSettings and avoids hard‑coding retry counts in production code.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 8, 2025

✅ Gradle check result for 11f7776: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants