Skip to content

Improve/native memory admission control's precision addresses multiple bug fixes#21749

Merged
Bukhtawar merged 11 commits into
opensearch-project:mainfrom
Bukhtawar:improve/native-memory-admission-control
May 20, 2026
Merged

Improve/native memory admission control's precision addresses multiple bug fixes#21749
Bukhtawar merged 11 commits into
opensearch-project:mainfrom
Bukhtawar:improve/native-memory-admission-control

Conversation

@Bukhtawar
Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar commented May 20, 2026

Description

Summary

  • Auto-derive effective native memory budget as totalPhysicalMemory - jvmMaxHeap when
    node.native_memory.limit is not explicitly configured, eliminating the hard requirement
    to set this setting for admission control to function
  • Subtract JVM non-heap committed memory (metaspace + code cache + compressed class space)
    from the native usage calculation so the metric reflects actual native allocations
    (Arrow buffers, jemalloc/DataFusion, direct byte buffers) rather than fixed JVM overhead
  • Downgrade log from WARN to DEBUG when the limit is unconfigured to prevent log spam on
    every poll cycle (default 2s from 500ms)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

  #  │                           Commit                            │                                                                                       Changes                                                                                       │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 1   │ d240228a11c Improve native memory admission control         │ Auto-derive effectiveNativeMemory as totalRAM - jvmMaxHeap when node.native_memory.limit is unset. Subtract JVM non-heap (metaspace + code cache) from nativeUsed so metric         │
  │     │ accuracy                                                    │ reflects actual native allocations. Downgrade log spam from WARN to DEBUG when limit is unconfigured.                                                                               │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 2   │ ac164789e31 Add integration tests for native memory         │ 5 ITs: auto-derived limit, explicit limit tracking, monitor mode pass-through, below-threshold success, dynamic limit update.                                                       │
  │     │ admission control                                           │                                                                                                                                                                                     │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 3   │ 6063e625186 Fix integration tests                           │ Use stats-injection assertions instead of transport-rejection (which requires full shard routing). Tests verify utilization values are correctly tracked and observable.            │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 4   │ 2859e30fffd Change native memory polling interval default   │ Reduce polling from 500ms to 2s (/proc/self/status read is unnecessary at sub-second frequency). Keep RssAnon unavailable log at WARN. Fix tests that used windows shorter than 2s. │
  │     │ to 2s                                                       │                                                                                                                                                                                     │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 5   │ 8eeb786ebf8 Close outstanding child allocators on           │ Force-close any remaining child allocators (e.g., coordinator) before closing the root allocator. Fixes IllegalStateException: closed with outstanding child allocators on node     │
  │     │ ArrowNativeAllocator shutdown                               │ shutdown. Also fixed a second 1s window in ClusterInfoServiceIT.                                                                                                                    │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 6   │ 53f8dc8f561 Fork fragment execution handler to search       │ Change ThreadPool.Names.SAME → ThreadPool.Names.SEARCH for indices:data/read/analytics/fragment. Prevents blocking the transport I/O thread during query execution (observed 115s   │
  │     │ thread pool                                                 │ blocking in production). Added logger to AnalyticsSearchService. Test verifying correct executor registration.                                                                      │
  ├─────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ 7   │ b60f1b525a6 Skip flaky TieringStatusIT.testTieringStatus    │ @AwaitsFix on a flaky unrelated test.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Bukhtawar added 3 commits May 20, 2026 04:55
- Auto-derive effective native memory budget as totalRAM - jvmMaxHeap
  when node.native_memory.limit is not explicitly configured. This
  eliminates the requirement to set the setting for admission control
  to function.

- Subtract JVM non-heap committed memory (metaspace + code cache) from
  the native usage calculation so the metric reflects actual native
  allocations (Arrow, jemalloc/DataFusion) rather than fixed JVM
  overhead.

- Downgrade log from WARN to DEBUG when the limit is simply unconfigured
  to prevent log spam on every poll cycle.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Tests cover:
- Auto-derived limit when node.native_memory.limit is not configured
- Enforced mode rejecting requests when utilization exceeds threshold
- Monitor mode allowing requests through while counting breaches
- Requests succeeding when utilization is below threshold
- Dynamic threshold update taking effect at runtime

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Use cluster-level stats injection and verify utilization tracking
rather than attempting to trigger transport-level rejection (which
requires the full shard allocation + SearchTransportService path).
Tests cover: auto-derived limit, explicit limit tracking, monitor
mode pass-through, below-threshold success, and dynamic limit update.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar requested a review from a team as a code owner May 20, 2026 00:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Reviewer Guide 🔍

(Review updated until commit 3bf2ede)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Exception Swallowed

The catch block at line 112 silently swallows exceptions when sending error responses. If channel.sendResponse(e) fails, the original error e is lost and the coordinator may never learn the fragment failed, leading to hung queries or incomplete results.

try {
    channel.sendResponse(e);
} catch (Exception ignored) {}
Exception Swallowed

The catch block at line 186 silently swallows exceptions during child allocator cleanup. If closing a child allocator fails due to leaked buffers or other resource issues, the error is hidden. This can mask memory leaks or prevent proper diagnostics during shutdown.

try {
    child.close();
} catch (Exception e) {
    // best-effort — log but don't block shutdown
}
Possible Issue

When totalPhysicalMemory or jvmMaxHeap is -1 (indicating unavailable data), the condition at line 144 may incorrectly pass if one value is valid and the other is -1, leading to a negative or incorrect limit. For example, if totalPhysicalMemory is 8GB and jvmMaxHeap is -1, the check totalPhysicalMemory > jvmMaxHeap succeeds, yielding an invalid limit of 8GB - (-1).

if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
    limit = totalPhysicalMemory - jvmMaxHeap;

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Suggestions ✨

Latest suggestions up to 3bf2ede

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Handle unlimited JVM heap case

When jvmMaxHeap is -1 (unlimited heap), the condition jvmMaxHeap > 0L fails and
auto-detection returns 0L, disabling tracking. Consider handling the unlimited heap
case explicitly or logging a warning to clarify why tracking is disabled.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [139-153]

 long computeEffectiveNativeMemory(ResourceTrackerSettings resourceTrackerSettings) {
     long limit = resourceTrackerSettings.getNativeMemoryLimitBytes();
     if (limit <= 0L) {
         long totalPhysicalMemory = OsProbe.getInstance().getTotalPhysicalMemorySize();
         long jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
-        if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
+        if (jvmMaxHeap < 0L) {
+            LOGGER.warn("JVM max heap is unlimited; native memory tracking disabled without explicit node.native_memory.limit");
+            return 0L;
+        }
+        if (totalPhysicalMemory > 0L && totalPhysicalMemory > jvmMaxHeap) {
             limit = totalPhysicalMemory - jvmMaxHeap;
         } else {
             return 0L;
         }
     }
     int bufferPercent = resourceTrackerSettings.getNativeMemoryBufferPercent();
     long buffer = limit * bufferPercent / 100L;
     return Math.max(0L, limit - buffer);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that when jvmMaxHeap is -1 (unlimited), the auto-detection logic fails silently. Adding a warning log when this case is detected improves observability and helps users understand why native memory tracking is disabled. This is a valuable improvement for operational clarity.

Medium
Log suppressed channel errors

The onFailure handler silently swallows exceptions when channel.sendResponse(e)
fails. This can hide critical errors during error propagation. Consider logging the
suppressed exception at WARN or ERROR level to aid debugging.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchTransportService.java [106-113]

 @Override
 public void onFailure(Exception e) {
     if (e instanceof StreamException se && se.getErrorCode() == StreamErrorCode.CANCELLED) {
         return;
     }
     try {
         channel.sendResponse(e);
-    } catch (Exception ignored) {}
+    } catch (Exception suppressed) {
+        logger.warn("Failed to send error response to channel", suppressed);
+    }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that swallowing exceptions in channel.sendResponse(e) can hide critical errors. Logging the suppressed exception would aid debugging. However, this is a minor improvement in error visibility rather than a critical bug fix.

Low
Check task cancellation before submission

If executor.execute() throws RejectedExecutionException (e.g., thread pool
saturated), the outer catch invokes responseHandler.onFailure(e) but the fragment
resources are never acquired or closed. This could leak resources. Consider checking
task cancellation before submission or wrapping the entire logic to ensure proper
cleanup.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchService.java [122-144]

 public void executeFragmentStreamingAsync(
     FragmentExecutionRequest request,
     IndexShard shard,
     AnalyticsShardTask task,
     StreamingFragmentResponseHandler responseHandler,
     Executor executor
 ) {
+    if (task.isCancelled()) {
+        responseHandler.onFailure(new TaskCancelledException("Task cancelled before execution"));
+        return;
+    }
     try {
         executor.execute(() -> {
             try (FragmentResources ctx = executeFragmentStreaming(request, shard, task)) {
                 Iterator<EngineResultBatch> it = ctx.stream().iterator();
                 while (it.hasNext()) {
                     responseHandler.onBatch(it.next());
                 }
                 responseHandler.onComplete();
             } catch (Exception e) {
                 responseHandler.onFailure(e);
             }
         });
     } catch (Exception e) {
         responseHandler.onFailure(e);
     }
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion addresses a potential resource leak if executor.execute() throws RejectedExecutionException. However, the proposed fix only adds a pre-check for task cancellation, which doesn't fully address the resource leak concern. The check is useful but doesn't guarantee cleanup in all rejection scenarios.

Low

Previous suggestions

Suggestions up to commit 93a8c26
CategorySuggestion                                                                                                                                    Impact
General
Add logging for child allocator closure failures

The comment states "log but don't block shutdown" but the exception is silently
swallowed without logging. Add logging to track which child allocators fail to
close, as this could indicate resource leaks or other issues during shutdown.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [183-189]

 for (BufferAllocator child : new ArrayList<>(root.getChildAllocators())) {
     try {
         child.close();
     } catch (Exception e) {
         // best-effort — log but don't block shutdown
+        logger.warn("Failed to close child allocator during shutdown: {}", child.getName(), e);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The comment indicates logging should occur, but the exception is silently swallowed. Adding logging would help track resource leaks during shutdown, though this is a best-effort cleanup path so the impact is moderate.

Low
Log failures when sending error responses

Silently ignoring exceptions when sending error responses can hide critical
transport layer failures. Log the suppressed exception to aid debugging, especially
since this is an error-handling path where visibility is crucial.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchTransportService.java [105-113]

 @Override
 public void onFailure(Exception e) {
     if (e instanceof StreamException se && se.getErrorCode() == StreamErrorCode.CANCELLED) {
         return;
     }
     try {
         channel.sendResponse(e);
-    } catch (Exception ignored) {}
+    } catch (Exception sendError) {
+        logger.warn("Failed to send error response to channel", sendError);
+    }
 }
Suggestion importance[1-10]: 5

__

Why: Logging suppressed exceptions in error-handling paths aids debugging. However, this is a secondary failure (failure to send an error response) in an already-failed scenario, so the impact is moderate.

Low
Log when native usage is zero despite valid RSS

If rssAnon is smaller than heapCommitted + nonHeapCommitted, nativeUsed becomes zero
but the method continues to compute utilization. When effectiveNativeMemory is
valid, this returns 0% utilization, which may mask scenarios where RSS is
unexpectedly low (potential data collection issue).

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [105-111]

 long nativeUsed = Math.max(0L, rssAnon - heapCommitted - nonHeapCommitted);
 
 long effectiveNativeMemory = effectiveNativeMemorySupplier.getAsLong();
 if (effectiveNativeMemory <= 0L) {
     LOGGER.debug("Native memory tracking inactive: node.native_memory.limit not configured and auto-detection unavailable");
     return 0L;
 }
 
+if (nativeUsed == 0L && rssAnon > 0L) {
+    LOGGER.debug("Native memory usage computed as zero: rssAnon={} heapCommitted={} nonHeapCommitted={}", rssAnon, heapCommitted, nonHeapCommitted);
+}
+
Suggestion importance[1-10]: 4

__

Why: Adding debug logging when nativeUsed is zero could help identify data collection issues. However, zero native usage is a valid state (e.g., when heap dominates RSS), so this is primarily a diagnostic enhancement with limited impact.

Low
Possible issue
Handle executor rejection before task submission

If executor.execute() throws RejectedExecutionException (e.g., thread pool shutdown
or queue full), the outer catch block calls responseHandler.onFailure() but the
fragment resources are never acquired or cleaned up. This could leave the transport
thread hanging without a response.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchService.java [130-143]

-executor.execute(() -> {
-    try (FragmentResources ctx = executeFragmentStreaming(request, shard, task)) {
-        Iterator<EngineResultBatch> it = ctx.stream().iterator();
-        while (it.hasNext()) {
-            responseHandler.onBatch(it.next());
+try {
+    executor.execute(() -> {
+        try (FragmentResources ctx = executeFragmentStreaming(request, shard, task)) {
+            Iterator<EngineResultBatch> it = ctx.stream().iterator();
+            while (it.hasNext()) {
+                responseHandler.onBatch(it.next());
+            }
+            responseHandler.onComplete();
+        } catch (Exception e) {
+            responseHandler.onFailure(e);
         }
-        responseHandler.onComplete();
-    } catch (Exception e) {
-        responseHandler.onFailure(e);
-    }
-});
+    });
+} catch (RejectedExecutionException e) {
+    responseHandler.onFailure(e);
+}
Suggestion importance[1-10]: 2

__

Why: The outer catch (Exception e) block at line 141-143 already handles RejectedExecutionException from executor.execute(), so the suggested change is redundant. The existing code correctly calls responseHandler.onFailure(e) for any exception including rejection.

Low
Suggestions up to commit e10f542
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle unlimited JVM heap configuration correctly

When jvmMaxHeap is -1 (unlimited heap), the condition jvmMaxHeap > 0L fails and
auto-detection returns 0, disabling native memory tracking. Handle the unlimited
heap case by using committed heap instead of max heap for budget calculation.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [141-149]

 if (limit <= 0L) {
     long totalPhysicalMemory = OsProbe.getInstance().getTotalPhysicalMemorySize();
     long jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
+    if (jvmMaxHeap < 0L) {
+        jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getCommitted();
+    }
     if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
         limit = totalPhysicalMemory - jvmMaxHeap;
     } else {
         return 0L;
     }
 }
Suggestion importance[1-10]: 8

__

Why: This addresses a real edge case where jvmMaxHeap returns -1 for unlimited heap configurations, which would incorrectly disable native memory tracking. The fallback to committed heap is a reasonable solution to maintain functionality in this scenario.

Medium
General
Add logging for child allocator closure failures

The comment states "log but don't block shutdown" but no logging occurs when
exceptions are caught. Add logging to track which child allocators fail to close
during shutdown, aiding in debugging resource leaks.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [183-189]

 for (BufferAllocator child : new ArrayList<>(root.getChildAllocators())) {
     try {
         child.close();
     } catch (Exception e) {
-        // best-effort — log but don't block shutdown
+        logger.warn("Failed to close child allocator during shutdown: " + child.getName(), e);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the comment mentions logging but no actual logging occurs. Adding logging would help debug resource leaks during shutdown. However, the suggestion assumes a logger variable exists without verifying it in the code context.

Low
Log suppressed exceptions during error response sending

Silently ignoring exceptions when sending error responses can hide critical
transport layer failures. Log the suppressed exception to maintain visibility into
communication issues during error handling.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchTransportService.java [106-113]

 public void onFailure(Exception e) {
     if (e instanceof StreamException se && se.getErrorCode() == StreamErrorCode.CANCELLED) {
         return;
     }
     try {
         channel.sendResponse(e);
-    } catch (Exception ignored) {}
+    } catch (Exception sendError) {
+        LOGGER.debug("Failed to send error response to channel", sendError);
+    }
 }
Suggestion importance[1-10]: 5

__

Why: Valid suggestion to log suppressed exceptions for better debugging. However, the impact is moderate since this is error-path logging that may not affect normal operation, and the LOGGER constant is correctly identified in the file.

Low
Suggestions up to commit 606f361
CategorySuggestion                                                                                                                                    Impact
Possible issue
Include non-heap in auto-derived limit

The auto-derived limit calculation doesn't account for non-heap memory (metaspace,
code cache), which is subtracted in getUsage() but not in the budget calculation.
This inconsistency can lead to inflated utilization percentages. Subtract
nonHeapCommitted from the auto-derived limit to maintain consistency.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [142-149]

 long totalPhysicalMemory = OsProbe.getInstance().getTotalPhysicalMemorySize();
 long jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
-if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
-    limit = totalPhysicalMemory - jvmMaxHeap;
+long nonHeapCommitted = ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getCommitted();
+if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > (jvmMaxHeap + nonHeapCommitted)) {
+    limit = totalPhysicalMemory - jvmMaxHeap - nonHeapCommitted;
 } else {
     return 0L;
 }
Suggestion importance[1-10]: 8

__

Why: This identifies a significant inconsistency: getUsage() subtracts nonHeapCommitted from the numerator but computeEffectiveNativeMemory() doesn't subtract it from the auto-derived denominator. This asymmetry can cause inflated utilization percentages, making this a valid correctness issue that affects the accuracy of native memory tracking.

Medium
General
Log child allocator close failures

The comment states "log but don't block shutdown" but the exception is silently
swallowed without logging. This makes debugging allocator leaks difficult. Add
logging to track which child allocators failed to close properly.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [183-189]

 for (BufferAllocator child : new ArrayList<>(root.getChildAllocators())) {
     try {
         child.close();
     } catch (Exception e) {
-        // best-effort — log but don't block shutdown
+        logger.warn("Failed to close child allocator during shutdown: " + child.getName(), e);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the comment mentions logging but the code silently swallows exceptions. Adding logging would improve debuggability of allocator cleanup issues, though this is a minor enhancement since the best-effort approach is already documented.

Low
Warn on negative native memory calculation

The calculation can produce negative values when rssAnon is less than heapCommitted
+ nonHeapCommitted, which is then clamped to zero. This scenario may indicate
measurement inconsistencies or race conditions. Consider logging a warning when this
occurs to help diagnose potential issues with the native memory tracking.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [105]

-long nativeUsed = Math.max(0L, rssAnon - heapCommitted - nonHeapCommitted);
+long nativeUsed = rssAnon - heapCommitted - nonHeapCommitted;
+if (nativeUsed < 0L) {
+    LOGGER.warn("Native memory calculation resulted in negative value: rssAnon={} heapCommitted={} nonHeapCommitted={}", 
+                rssAnon, heapCommitted, nonHeapCommitted);
+    nativeUsed = 0L;
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion adds diagnostic logging for an edge case where rssAnon is less than heapCommitted + nonHeapCommitted. While this could help identify measurement inconsistencies, the Math.max(0L, ...) already handles this gracefully, making the logging a minor debugging aid rather than a critical fix.

Low
Suggestions up to commit 1c25c76
CategorySuggestion                                                                                                                                    Impact
Possible issue
Subtract non-heap from auto-derived budget

The auto-derived limit calculation doesn't account for non-heap memory (metaspace,
code cache), which is later subtracted in getUsage(). This creates an inconsistency
where the budget includes non-heap but usage excludes it, potentially causing
incorrect utilization percentages.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [142-149]

 long totalPhysicalMemory = OsProbe.getInstance().getTotalPhysicalMemorySize();
 long jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
-if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
-    limit = totalPhysicalMemory - jvmMaxHeap;
+long nonHeapCommitted = ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getCommitted();
+if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > (jvmMaxHeap + nonHeapCommitted)) {
+    limit = totalPhysicalMemory - jvmMaxHeap - nonHeapCommitted;
 } else {
     return 0L;
 }
Suggestion importance[1-10]: 8

__

Why: This identifies a real inconsistency: getUsage() subtracts nonHeapCommitted from rssAnon, but the auto-derived budget doesn't account for it. This could lead to incorrect utilization percentages when the limit is auto-derived, potentially causing admission control to trigger incorrectly.

Medium
General
Add logging for child allocator closure failures

The comment states "log but don't block shutdown" but the exception is silently
swallowed without logging. Add logging to track which child allocators fail to
close, as this could indicate resource leaks or shutdown issues that need
investigation.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [183-189]

 for (BufferAllocator child : new ArrayList<>(root.getChildAllocators())) {
     try {
         child.close();
     } catch (Exception e) {
-        // best-effort — log but don't block shutdown
+        logger.warn("Failed to close child allocator during shutdown: {}", child.getName(), e);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The comment explicitly states "log but don't block shutdown" but the exception is silently swallowed. Adding logging would help track resource leaks during shutdown, though this is a minor improvement since the best-effort nature is already documented.

Low
Use sentinel value for unavailable tracking

Returning 0L when the effective native memory is unavailable could mask
configuration issues or auto-detection failures. Consider using a sentinel value
like -1L to distinguish "tracking disabled" from "0% utilization", allowing callers
to handle these cases differently.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [107-111]

 long effectiveNativeMemory = effectiveNativeMemorySupplier.getAsLong();
 if (effectiveNativeMemory <= 0L) {
     LOGGER.debug("Native memory tracking inactive: node.native_memory.limit not configured and auto-detection unavailable");
-    return 0L;
+    return -1L;
 }
Suggestion importance[1-10]: 4

__

Why: While using -1L as a sentinel could help distinguish "tracking disabled" from "0% utilization", the current implementation returning 0L is reasonable for admission control purposes. This change would require updates to all callers and may not provide significant practical benefit.

Low
Suggestions up to commit 53f8dc8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Subtract non-heap from auto-derived limit

The auto-derived limit calculation doesn't account for non-heap memory (metaspace,
code cache), which is already subtracted in getUsage(). This creates an
inconsistency where the denominator includes non-heap but the numerator excludes it,
potentially causing inflated utilization percentages.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [142-149]

 long totalPhysicalMemory = OsProbe.getInstance().getTotalPhysicalMemorySize();
 long jvmMaxHeap = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax();
-if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > jvmMaxHeap) {
-    limit = totalPhysicalMemory - jvmMaxHeap;
+long nonHeapCommitted = ManagementFactory.getMemoryMXBean().getNonHeapMemoryUsage().getCommitted();
+if (totalPhysicalMemory > 0L && jvmMaxHeap > 0L && totalPhysicalMemory > (jvmMaxHeap + nonHeapCommitted)) {
+    limit = totalPhysicalMemory - jvmMaxHeap - nonHeapCommitted;
 } else {
     return 0L;
 }
Suggestion importance[1-10]: 8

__

Why: This identifies a significant inconsistency where getUsage() subtracts nonHeapCommitted from the numerator but the auto-derived limit doesn't account for it in the denominator, potentially causing inflated utilization percentages and incorrect admission control decisions.

Medium
General
Use sentinel value for unavailable metrics

Returning 0L when RSS is unavailable causes the admission controller to see 0%
utilization, potentially allowing requests when memory tracking is broken. Consider
returning a sentinel value or throwing an exception to prevent incorrect admission
decisions.

server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java [97-101]

 long rssAnon = rssAnonSupplier.getAsLong();
 if (rssAnon < 0L) {
     LOGGER.warn("Native memory poll skipped: RssAnon unavailable from /proc/self/status");
-    return 0L;
+    return -1L; // Sentinel value indicating tracking failure
 }
Suggestion importance[1-10]: 7

__

Why: Returning 0L when RSS is unavailable makes the admission controller see 0% utilization, which could allow requests when memory tracking is broken. Using a sentinel value like -1L would better signal tracking failure to upstream consumers.

Medium
Add logging for child allocator closure failures

The comment mentions logging exceptions but the catch block silently swallows them.
Add actual logging to track which child allocators fail to close and why, aiding
debugging of resource leaks during shutdown.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [183-189]

 for (BufferAllocator child : new ArrayList<>(root.getChildAllocators())) {
     try {
         child.close();
     } catch (Exception e) {
-        // best-effort — log but don't block shutdown
+        logger.warn("Failed to close child allocator during shutdown: {}", child.getName(), e);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The comment states "log but don't block shutdown" but the catch block silently swallows exceptions. Adding actual logging would help debug resource leaks during shutdown, though this is a minor improvement since the best-effort approach is already documented.

Low

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 3360ee6.

PathLineSeverityDescription
server/src/main/java/org/opensearch/node/resource/tracker/AverageNativeMemoryUsageTracker.java113lowLog level downgraded from WARN to DEBUG when native memory admission control is inactive (effectiveNativeMemory <= 0). This reduces operational visibility into a silent failure of a resource protection control, making it harder to detect when admission control is not functioning.
server/src/main/java/org/opensearch/node/resource/tracker/ResourceTrackerSettings.java79lowDefault polling interval changed from a named constant (Defaults.POLLING_INTERVAL_IN_MILLIS) to a hardcoded literal (2 seconds). Without seeing the original constant value, the behavior change cannot be assessed at a glance, and the constant's documented intent is lost.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3360ee6

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 3360ee6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Reduce polling frequency from 500ms to 2s for the native memory
tracker. Reading /proc/self/status every 500ms is unnecessary given
native memory changes gradually. Keep RssAnon-unavailable log at WARN
level since that indicates a platform issue.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from 3360ee6 to 2859e30 Compare May 20, 2026 05:29
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2859e30

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2859e30: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e76084e

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e76084e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Before closing the root allocator, force-close any remaining child
allocators that were created via ArrowAllocatorService but not closed
by their owners (e.g., DefaultPlanExecutor's coordinator allocator).
This prevents IllegalStateException on node shutdown in tests and
production graceful restarts.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from e76084e to 8eeb786 Compare May 20, 2026 08:10
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 8eeb786

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 8eeb786: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a3e8dec

Change the fragment execution streaming handler from
ThreadPool.Names.SAME to ThreadPool.Names.SEARCH. Running on SAME
blocks the transport I/O thread for the entire query duration (observed
at 115s in production), starving heartbeats, cluster state updates,
and other transport actions.

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from a3e8dec to 53f8dc8 Compare May 20, 2026 09:24
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 53f8dc8

@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from b60f1b5 to 1c25c76 Compare May 20, 2026 09:38
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1c25c76

@Bukhtawar Bukhtawar changed the title Improve/native memory admission control's precision and default handling Improve/native memory admission control's precision addresses multiple bug fixes May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1c25c76

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 1c25c76: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.38%. Comparing base (f15c326) to head (3bf2ede).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...ensearch/arrow/allocator/ArrowNativeAllocator.java 0.00% 4 Missing and 1 partial ⚠️
...ource/tracker/AverageNativeMemoryUsageTracker.java 86.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21749      +/-   ##
============================================
- Coverage     73.43%   73.38%   -0.06%     
+ Complexity    75074    74990      -84     
============================================
  Files          6012     6012              
  Lines        340934   340940       +6     
  Branches      49076    49076              
============================================
- Hits         250352   250182     -170     
- Misses        70640    70807     +167     
- Partials      19942    19951       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from 1c25c76 to 606f361 Compare May 20, 2026 10:48
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 606f361

@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from 606f361 to 8273c98 Compare May 20, 2026 10:58
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@Bukhtawar Bukhtawar force-pushed the improve/native-memory-admission-control branch from 8273c98 to e10f542 Compare May 20, 2026 10:59
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e10f542

Copy link
Copy Markdown
Contributor

@bharath-techie bharath-techie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Fixes ReplicationTracker assertion race condition (opensearch-project#3923) by adding
MockTransportService interceptor to gracefully handle checkpoint updates
during relocation handoff. Removes @AwaitsFix as the root cause is now
addressed.

Signed-off-by: Kavya Aggarwal <kavyaagg@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 93a8c26

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 93a8c26

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 93a8c26: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 93a8c26: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3bf2ede

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 3bf2ede: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 3bf2ede: SUCCESS

@Bukhtawar Bukhtawar merged commit be29fd0 into opensearch-project:main May 20, 2026
17 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants