Skip to content

Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats#21732

Merged
Bukhtawar merged 23 commits into
opensearch-project:mainfrom
gaurav-amz:unified-allocator-followup
May 22, 2026
Merged

Native allocator: dynamic settings, query/datafusion pools, plugin nodeStats#21732
Bukhtawar merged 23 commits into
opensearch-project:mainfrom
gaurav-amz:unified-allocator-followup

Conversation

@gaurav-amz
Copy link
Copy Markdown
Contributor

@gaurav-amz gaurav-amz commented May 19, 2026

Description

Builds on #21703 to add the query, datafusion with allocator, dynamic-tuning, stats, and pool-wiring pieces.

Architecture: three native-memory trackers, three knobs

Each tracker owns the bytes it can actually see, with a process-level cap above all of them.

            ┌──────────────────────────────────────────────────────────┐
            │  Process-level cap (admission control)                   │
            │                                                          │
            │  node.native_memory.limit                                │
            │    Defaults to 79% × (RAM − JVM heap), cgroup-aware.     │
            │                                                          │
            │  ┌────────────────────────┐  ┌────────────────────────┐  │
            │  │ Arrow tracker          │  │ DataFusion tracker     │  │
            │  │ (ArrowNativeAllocator. │  │ (Rust MemoryPool)      │  │
            │  │  root)                 │  │                        │  │
            │  │                        │  │ Tracks:                │  │
            │  │ Tracks:                │  │   HashAggregate        │  │
            │  │   POOL_FLIGHT (5% NM)  │  │   Sort buffers         │  │
            │  │   POOL_INGEST (8% NM)  │  │   TopK heap            │  │
            │  │   POOL_QUERY  (5% NM)  │  │   Spill staging        │  │
            │  │                        │  │                        │  │
            │  │ Bounded by:            │  │ Bounded by:            │  │
            │  │  native.allocator      │  │  datafusion.memory_    │  │
            │  │  .root.limit           │  │  pool_limit_bytes      │  │
            │  │  (20% × NM by default) │  │  (75% × NM by default) │  │
            │  └────────────────────────┘  └────────────────────────┘  │
            │                                                          │
            └──────────────────────────────────────────────────────────┘

Three layers, three responsibilities. Each is necessary; none can be replaced by another.

  • Arrow tracker (ArrowNativeAllocator) accounts for every Arrow BufferAllocator allocation in the JVM. Partitioned into FLIGHT / INGEST / QUERY pools so one plugin can't starve another. All Arrow buffers descend from the same RootAllocator, preserving Arrow's same-root invariant for cross-plugin zero-copy handoff.
  • DataFusion tracker (Rust MemoryPool) accounts for DataFusion's own working memory and triggers spill / fail-fast when a query exceeds budget. Lives entirely on the Rust side; updates flow in via FFM through NativeBridge.setMemoryPoolLimit.
  • Admission control caps the OS process. node.native_memory.limit is the operator-declared off-heap budget AC throttles against; framework-derived defaults (root, pools, DataFusion pool) all scale from this single number.

POOL_DATAFUSION is intentionally not present. DataFusion's working memory is allocated by Rust operators directly and reported only to DataFusion's own MemoryPool — it never flows through Arrow's BufferAllocator API. Adding a Java-side pool that pretended to track it would have required either a per-allocation FFM round-trip (performance disaster) or a config-only mirror (a setting that returns HTTP 200 and silently does nothing). The Rust-side datafusion.memory_pool_limit_bytes is the honest knob for that layer.

Worked example: 64 GB / 16 GB-heap node

With operator-declared -Xmx16g on a 64 GB host (bare metal or cgroup-limited container), defaults compose to:

64 GB RAM
  │ ├── JVM Heap (16 GB / 25%)
  │ │   ├── Indexing pressure, Writer buffer
  │ │   ├── Transport Netty
  │ │   └── etc.
  │ │
  │ └── Off-Heap (48 GB / 75%)
  │     │
  │     ├── node.native_memory.limit (37.92 GB / 79% of off-heap)
  │     │   │
  │     │   ├── Java Arrow Root (7.58 GB / 20% of NM)
  │     │   │   ├── POOL_FLIGHT  (1.90 GB / 5% of NM)
  │     │   │   │   └── server, client — flight transport batches
  │     │   │   ├── POOL_INGEST  (3.03 GB / 8% of NM)
  │     │   │   │   └── ArrowBufferPool / parquet VSR allocators
  │     │   │   └── POOL_QUERY   (1.90 GB / 5% of NM)
  │     │   │       └── analytics-search-service → per-query children
  │     │   │
  │     │   └── DataFusion Rust pool (28.44 GB / 75% of NM)
  │     │       └── datafusion.memory_pool_limit_bytes
  │     │           ├── hash tables, sort buffers, aggs, intermediate batches
  │     │           └── triggers spill on exhaustion
  │     │
  │     └── Unmanaged (~10.08 GB / 21% of off-heap)
  │         ├── Lucene mmap
  │         ├── OS Page Cache
  │         └── Sidecars / non-Arrow native consumers
  │
  │ Independent (disk):
  │   datafusion.spill_memory_limit_bytes (32 GB / 50% of physical RAM)

As a flat table for quick reference:

RAM                                              64 GB
JVM heap (operator-configured -Xmx)              16 GB
Off-heap (RAM − heap)                            48 GB
node.native_memory.limit  (79% of off-heap)      37.92 GB    ← AC throttle threshold
  native.allocator.root.limit  (20% of NM)        7.58 GB    ← Arrow framework cap
    pool.flight.max  (5% of NM)                   1.90 GB    ← Flight transport pool
    pool.ingest.max  (8% of NM)                   3.03 GB    ← Parquet VSR pool
    pool.query.max   (5% of NM)                   1.90 GB    ← analytics-engine pool
  datafusion.memory_pool_limit  (75% of NM)      28.44 GB    ← Rust runtime pool (sibling)
Unmanaged (Lucene mmap, OS page cache, etc.)     10.08 GB

Independent budget (disk staging, not memory):

datafusion.spill_memory_limit_bytes (50% of physical RAM)    32 GB

How a query flows through these layers

Take a concrete example: a user issues a PPL query that goes through analytics-engine and dispatches to DataFusion.

  1. Coordinator receives the request.
     → Admission control checks node.native_memory.limit budget.
     → If OK, proceeds; otherwise rejects with 429.

  2. AnalyticsSearchService creates a per-fragment Arrow allocator:
        allocator = getPoolAllocator(POOL_QUERY).newChildAllocator("frag-N", 0, Long.MAX_VALUE)
     → Future BufferAllocator.buffer(...) calls on this allocator
       increment POOL_QUERY's counter and the root counter via Arrow's
       parent-cap chain at allocateBytes.
     → Bounded by native.allocator.pool.query.max.

  3. AnalyticsSearchService dispatches to DataFusion via NativeBridge.
     → NativeBridge.executeQueryAsync(...) marshals plan bytes via FFM.

  4. DataFusion runs the query in Rust:
     → HashAggregate builds a hash table.
     → reservation.try_grow(50MB) → DataFusion MemoryPool counter += 50MB.
     → Bounded by datafusion.memory_pool_limit_bytes.
     → If exceeded, HashAggregate spills to disk
       (which itself is bounded by datafusion.spill_memory_limit_bytes).

  5. DataFusion produces a result batch in Rust.
     → Java imports it via Arrow C Data Interface.
     → The import allocates Java ArrowBufs under the per-fragment allocator
       from step 2. POOL_QUERY counter += result_size.

  6. Result returns to coordinator.
     → Per-fragment allocator closes; POOL_QUERY counter decrements.
     → DataFusion query completes; MemoryPool counter decrements.

Each layer accounts for what it owns. No double-counting (the result bytes are counted in DataFusion's pool only while they exist as Rust buffers; after Java imports them, ownership transfers and the Java side counts them; when Java closes the import, the bytes are freed). Each operator-tunable knob bounds a real, observable thing.

Operator surface

After this PR, an operator inspecting a node running the query above sees:

{
  "native_allocator": {
    "root": {"allocated": "150MB", "limit": "7.58GB"},
    "pools": {
      "flight":  {"allocated": "20MB",  "limit": "1.90GB"},
      "ingest":  {"allocated": "30MB",  "limit": "3.03GB"},
      "query":   {"allocated": "100MB", "limit": "1.90GB"}
    }
  },
  "datafusion": {
    "memory_pool": {"usage": "2.4GB", "limit": "28.44GB"},
    "spill":       {"usage": "0",     "limit": "32GB"}
  }
}

Three numbers, three sources, three knobs. The operator looks at each in isolation when tuning that layer:

  • "DataFusion queries are OOMing" → bump datafusion.memory_pool_limit_bytes.
  • "Flight RPC is starving Parquet ingest" → bump parquet.native.pool.flight.max or lower parquet.native.pool.ingest.max.
  • "Whole node is using too much memory" → lower node.native_memory.limit.

All pool min/max settings are Setting.Property.Dynamic. Pool limit changes propagate to consumer-side child allocators automatically via Arrow's parent-cap check at allocateBytes — child allocators are created with Long.MAX_VALUE and inherit the live parent cap on every allocation, so dynamic resizes reach in-flight workloads without restart and without an explicit notification SPI. The grouped validator rejects cross-setting violations (sum of pool mins > root, per-pool min > max) at PUT time with HTTP 400 rather than at the next allocation.

Behavior change to call out

Admission control is now active by default. node.native_memory.limit defaults to 79% × (RAM − JVM heap) instead of 0 (unconfigured). On upgrade, AC will start tracking native memory utilization and the framework's pool caps will derive sensible values from the operator's declared off-heap budget without any explicit configuration.

Operators who want pre-existing opt-out behavior (AC unconfigured, all framework caps unbounded) can set:

node.native_memory.limit: 0b

This restores the prior "explicit-opt-in" semantics — useful for nodes where Lucene mmap or non-Arrow native consumers dominate and the operator does not want admission control throttling search/index traffic.

Changes after initial review

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gaurav-amz gaurav-amz requested a review from a team as a code owner May 19, 2026 08:45
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit dcddc3c)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Race Condition

The rebalancerEnabled flag is set in setRebalanceInterval and read in getOrCreatePool without synchronization. If setRebalanceInterval is called concurrently with getOrCreatePool, a pool could be created with the wrong initial limit. For example, if rebalancerEnabled is read as false while setRebalanceInterval(60) is executing, the pool starts at max when it should start at min. This can cause pools to over-allocate before the first rebalance tick.

    rebalancerEnabled = intervalSeconds > 0;
    if (rebalancerEnabled) {
        rebalanceTask = rebalancer.scheduleAtFixedRate(this::rebalance, intervalSeconds, intervalSeconds, TimeUnit.SECONDS);
    }
}

@Override
public PoolHandle getOrCreatePool(String poolName, long limit) {
    return getOrCreatePool(poolName, limit, limit);
}

/**
 * Creates or returns a pool with min/max limits.
 *
 * @param poolName logical pool name
 * @param min guaranteed minimum bytes (always available)
 * @param max maximum bytes the pool can burst to
 * @return the pool handle
 */
public PoolHandle getOrCreatePool(String poolName, long min, long max) {
    poolMins.putIfAbsent(poolName, min);
    poolMaxes.putIfAbsent(poolName, max);
    return pools.computeIfAbsent(poolName, name -> {
        // Pick an initial limit that's safe for both rebalancer-on and rebalancer-off
        // deployments. When rebalancing is enabled, start at min (the original PR's
        // "guarantee + burst" semantics): the next rebalance tick will distribute
        // headroom up to each pool's max. When rebalancing is disabled (the default),
        // pools with min=0 would otherwise reject every allocation until a tick that
        // never comes — start at max so consumers can allocate immediately.
        long initial = rebalancerEnabled ? min : max;
        BufferAllocator child = root.newChildAllocator(name, 0, initial);
Overflow Risk

In rebalance, bonusPerPool is computed as headroom / poolCount. If poolCount is zero (all pools removed between the check and the loop), the division is safe but the subsequent loop is a no-op. However, if headroom is Long.MAX_VALUE and poolCount is 1, bonusPerPool equals Long.MAX_VALUE, and min + bonusPerPool at line 256 can overflow. The code does not guard against this scenario, which can occur if the root limit is Long.MAX_VALUE and a single pool has min = 1.

int poolCount = pools.size();
long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0;

for (Map.Entry<String, ArrowPoolHandle> entry : pools.entrySet()) {
    String name = entry.getKey();
    BufferAllocator alloc = entry.getValue().allocator;
    long min = poolMins.getOrDefault(name, 0L);
    long max = poolMaxes.getOrDefault(name, Long.MAX_VALUE);
    long currentAllocation = alloc.getAllocatedMemory();

    long effectiveLimit = min + bonusPerPool;
Possible Issue

updateSpillMemoryLimit checks isSpillLimitDynamic() and logs a warning if false, but the cluster-settings update is accepted unconditionally. If the native library does not support runtime updates, the new value is stored in cluster state but never applied to the running service. On the next node restart, the service reads the new value from settings and applies it, but until then the node runs with the old limit. This can cause confusion when operators PUT a new limit, see HTTP 200, but observe no change in behavior until restart.

void updateSpillMemoryLimit(long newLimitBytes) {
    DataFusionService service = dataFusionService;
    if (service == null) {
        logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes);
        return;
    }
    if (!service.isSpillLimitDynamic()) {
        logger.warn(
            "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
                + "support runtime spill resize, so the new value will only take effect after a node restart",
            newLimitBytes
        );
        return;
    }
    try {
        service.setSpillMemoryLimit(newLimitBytes);
        logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes);
    } catch (IllegalStateException e) {
        logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes);
    } catch (UnsupportedOperationException e) {
        // isSpillLimitDynamic() guard above should make this unreachable, but defend
        // against a race between probe and call.
        logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes);
    }

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Suggestions ✨

Latest suggestions up to dcddc3c

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Skip unnecessary reads when registry is null

When registry == null, the loop continues reading name and payload for all remaining
entries but discards them. This wastes I/O and CPU. After detecting registry ==
null, break out of the loop immediately to avoid unnecessary reads.

server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [297-334]

 private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException {
     int size = in.readVInt();
     if (size == 0) {
         return Collections.emptyMap();
     }
     NamedWriteableRegistry registry = in.namedWriteableRegistry();
+    if (registry == null) {
+        for (int i = 0; i < size; i++) {
+            in.readString();
+            in.readBytesReference();
+        }
+        return Collections.emptyMap();
+    }
     Map<String, PluginNodeStats> result = new HashMap<>(size);
     for (int i = 0; i < size; i++) {
         String name = in.readString();
         BytesReference payload = in.readBytesReference();
-        if (registry == null) {
-            continue;
-        }
         try (
             StreamInput rawIn = payload.streamInput();
             NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry)
         ) {
             payloadIn.setVersion(in.getVersion());
             PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class);
             result.put(name, stats);
         } catch (IOException | IllegalArgumentException e) {
-            // Receiver doesn't have the plugin's NamedWriteable registered (typical
-            // during rolling upgrades or in a non-uniform plugin install). Drop the
-            // entry; the rest of NodeStats remains decodable.
         }
     }
     return result;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that when registry == null, the loop continues reading and discarding all entries. Breaking early after detecting registry == null avoids unnecessary I/O. However, the improved code must still consume the remaining entries from the stream to maintain wire-format alignment—the suggestion's loop does this correctly. The optimization is valid and improves efficiency in the defensive registry == null path.

Low
Distribute headroom remainder to avoid truncation

The division headroom / poolCount can lose precision when headroom is not evenly
divisible by poolCount. Consider using a remainder distribution strategy to allocate
leftover bytes to pools, ensuring the full headroom is utilized rather than
truncated.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [247-248]

 long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0;
+long remainder = poolCount > 0 ? headroom % poolCount : 0;
+// Distribute remainder to first `remainder` pools
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies precision loss from integer division, but the impact is minor—leftover bytes are typically small relative to pool sizes. The added complexity of remainder distribution may not justify the marginal improvement in headroom utilization.

Low
Validate duplicate plugin registration

The method modifies a LinkedHashSet after construction, which could break plugin
ordering guarantees if mockPlugins contains duplicates or conflicts with
nodePlugins(). Consider validating that mockPlugins doesn't contain duplicates of
plugins already in nodePlugins() to maintain deterministic ordering.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-524]

 Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
+for (Class<? extends Plugin> mockPlugin : mockPlugins) {
+    if (!plugins.add(mockPlugin)) {
+        logger.warn("Mock plugin {} already present in nodePlugins(), skipping duplicate", mockPlugin.getName());
+    }
+}
 return plugins;
Suggestion importance[1-10]: 4

__

Why: The suggestion addresses a potential issue with duplicate plugins in mockPlugins, but the concern is somewhat theoretical. The LinkedHashSet naturally handles duplicates by not adding them twice, maintaining the first occurrence. Adding explicit validation with logging could help with debugging but doesn't fix a critical bug. The impact is moderate as it mainly improves observability.

Low

Previous suggestions

Suggestions up to commit dcddc3c
CategorySuggestion                                                                                                                                    Impact
General
Log when min exceeds max

When newMin exceeds max, the method silently caps at max without logging or warning.
Operators may expect the min to be honored and not realize it's being capped,
leading to confusion during troubleshooting.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172]

 public void setPoolMin(String poolName, long newMin) {
     ArrowPoolHandle handle = pools.get(poolName);
     if (handle == null) {
         throw new IllegalStateException("Pool '" + poolName + "' does not exist");
     }
     poolMins.put(poolName, newMin);
     long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE);
     long current = handle.allocator.getLimit();
     long target = Math.min(newMin, max);
+    if (newMin > max) {
+        logger.warn("Pool '{}' min ({}) exceeds max ({}); capping at max", poolName, newMin, max);
+    }
     if (target > current) {
         handle.allocator.setLimit(target);
     }
 }
Suggestion importance[1-10]: 6

__

Why: Adding a warning when newMin exceeds max improves operator visibility into silent capping behavior. The suggestion is correct and helpful for troubleshooting, though the scenario is already caught by the grouped validator at cluster-settings update time, so the log would only fire if the validator is bypassed or misconfigured. Moderate impact.

Low
Rate-limit spill-limit warning logs

The method logs a warning every time a spill-limit update is attempted when the
native library doesn't support dynamic resize. In a cluster with frequent settings
updates, this can spam logs. Consider rate-limiting or logging only on the first
occurrence.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444]

+private volatile boolean spillLimitWarningLogged = false;
+
 void updateSpillMemoryLimit(long newLimitBytes) {
     DataFusionService service = dataFusionService;
     if (service == null) {
         logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes);
         return;
     }
     if (!service.isSpillLimitDynamic()) {
-        logger.warn(
-            "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
-                + "support runtime spill resize, so the new value will only take effect after a node restart",
-            newLimitBytes
-        );
+        if (!spillLimitWarningLogged) {
+            logger.warn(
+                "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
+                    + "support runtime spill resize, so the new value will only take effect after a node restart",
+                newLimitBytes
+            );
+            spillLimitWarningLogged = true;
+        }
         return;
     }
     ...
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that repeated warnings can spam logs when the native library doesn't support dynamic spill resize. Rate-limiting via a boolean flag is a simple, effective mitigation. However, the scenario is expected to be rare (only during rolling upgrades or when an operator repeatedly updates the setting), so the impact is moderate.

Low
Log when registry is null

When registry == null, the method silently skips all entries without logging. This
defensive path is rarely hit in production, but when it does occur, operators have
no visibility into why plugin stats are missing. Add a debug log to aid
troubleshooting.

server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [310-334]

 private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException {
     int size = in.readVInt();
     if (size == 0) {
         return Collections.emptyMap();
     }
     NamedWriteableRegistry registry = in.namedWriteableRegistry();
+    if (registry == null) {
+        logger.debug("NamedWriteableRegistry not attached to stream; dropping {} plugin stats entries", size);
+        for (int i = 0; i < size; i++) {
+            in.readString();
+            in.readBytesReference();
+        }
+        return Collections.emptyMap();
+    }
     Map<String, PluginNodeStats> result = new HashMap<>(size);
     for (int i = 0; i < size; i++) {
         String name = in.readString();
         BytesReference payload = in.readBytesReference();
-        if (registry == null) {
-            continue;
-        }
         try (
             StreamInput rawIn = payload.streamInput();
             NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry)
         ) {
             payloadIn.setVersion(in.getVersion());
             PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class);
             result.put(name, stats);
         } catch (IOException | IllegalArgumentException e) {
-            // Receiver doesn't have the plugin's NamedWriteable registered (typical
-            // during rolling upgrades or in a non-uniform plugin install). Drop the
-            // entry; the rest of NodeStats remains decodable.
+            // Receiver doesn't have the plugin's NamedWriteable registered
         }
     }
     return result;
 }
Suggestion importance[1-10]: 5

__

Why: Adding a debug log when registry == null improves troubleshooting visibility for a defensive path that should rarely be hit in production. The suggestion is correct and helpful, though the scenario is uncommon (production paths always attach a registry), so the impact is moderate. The improved code also correctly consumes the skipped entries to avoid stream corruption.

Low
Avoid headroom truncation in rebalance

The division headroom / poolCount can lose precision when distributing small
headroom across multiple pools. Consider using a remainder-aware distribution to
ensure all available headroom is allocated rather than truncated.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [247]

 long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0;
+long remainder = poolCount > 0 ? headroom % poolCount : 0;
+// Distribute remainder to first N pools to avoid losing bytes to truncation
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that integer division can lose precision when distributing headroom. However, the impact is minor—typically a few bytes lost to truncation—and the added complexity of remainder distribution may not be worth it for the rebalancer's coarse-grained allocation model. The suggestion is valid but offers marginal improvement.

Low
Preserve plugin order when adding mockPlugins

The mockPlugins collection is added to a LinkedHashSet using addAll(), which may not
preserve insertion order if mockPlugins itself is unordered. This could reintroduce
non-determinism that the LinkedHashSet was meant to prevent. Consider converting
mockPlugins to a LinkedHashSet during initialization or ensuring it maintains order.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523]

 Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
+// Preserve order when adding mockPlugins
+mockPlugins.forEach(plugins::add);
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about potential order preservation, but addAll() on a LinkedHashSet does preserve the iteration order of the source collection. The improved_code using forEach is functionally equivalent and doesn't provide meaningful improvement. The concern is somewhat theoretical unless mockPlugins is proven to be unordered.

Low
Suggestions up to commit 1bd0be2
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate sum of pool maxes

Add validation to ensure the sum of pool maxes does not exceed the root limit. The
current validator only checks that sum of mins ≤ root, but operators can configure
pool maxes that collectively exceed the root cap, leading to runtime allocation
failures when pools attempt to burst simultaneously.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [322-334]

 static void validateUpdate(Settings settings) {
     long rootLimit = ROOT_LIMIT_SETTING.get(settings);
     long flightMin = FLIGHT_MIN_SETTING.get(settings);
     long flightMax = FLIGHT_MAX_SETTING.get(settings);
     long ingestMin = INGEST_MIN_SETTING.get(settings);
     long ingestMax = INGEST_MAX_SETTING.get(settings);
     long queryMin = QUERY_MIN_SETTING.get(settings);
     long queryMax = QUERY_MAX_SETTING.get(settings);
     validateMinMax(NativeAllocatorPoolConfig.POOL_FLIGHT, flightMin, flightMax);
     validateMinMax(NativeAllocatorPoolConfig.POOL_INGEST, ingestMin, ingestMax);
     validateMinMax(NativeAllocatorPoolConfig.POOL_QUERY, queryMin, queryMax);
     validateMinSum(rootLimit, flightMin, ingestMin, queryMin);
+    validateMaxSum(rootLimit, flightMax, ingestMax, queryMax);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a gap in validation: the current code only checks that sum of mins ≤ root, but does not validate that sum of maxes ≤ root. This could lead to runtime allocation failures when multiple pools attempt to burst simultaneously. However, the PR's design intentionally allows pool maxes to sum above root (see lines 99-102 comments about "18% of native_memory.limit fits within root.limit (20%) by default, leaving 2 pp headroom"), so this is more of an enhancement than a critical bug.

Medium
General
Validate pool min against root

Verify that newMin does not exceed the root limit before updating. A dynamic PUT
that sets a pool min above the root cap will be accepted by the validator (which
only checks sum of mins), but the live limit raise will fail silently or cause
unexpected behavior when the allocator attempts to exceed the root.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172]

 public void setPoolMin(String poolName, long newMin) {
     ArrowPoolHandle handle = pools.get(poolName);
     if (handle == null) {
         throw new IllegalStateException("Pool '" + poolName + "' does not exist");
+    }
+    if (newMin > root.getLimit()) {
+        throw new IllegalArgumentException("Pool min (" + newMin + ") exceeds root limit (" + root.getLimit() + ")");
     }
     poolMins.put(poolName, newMin);
     long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE);
     long current = handle.allocator.getLimit();
     long target = Math.min(newMin, max);
     if (target > current) {
         handle.allocator.setLimit(target);
     }
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds a defensive check that newMin does not exceed the root limit. While the grouped validator in validateUpdate already checks sum of mins ≤ root, this per-pool check would catch the edge case where a single pool's min exceeds root. However, the existing validator should already prevent this scenario at the cluster-settings level, making this a defensive enhancement rather than a critical fix.

Low
Log dropped plugin stats entries

Log the dropped plugin-stats entries at debug level so operators can diagnose
missing stats during rolling upgrades. The current silent drop makes it difficult to
distinguish between "plugin not installed" and "deserialization failed for another
reason."

server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [296-334]

 private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException {
     int size = in.readVInt();
     if (size == 0) {
         return Collections.emptyMap();
     }
     NamedWriteableRegistry registry = in.namedWriteableRegistry();
     Map<String, PluginNodeStats> result = new HashMap<>(size);
     for (int i = 0; i < size; i++) {
         String name = in.readString();
         BytesReference payload = in.readBytesReference();
         if (registry == null) {
             continue;
         }
         try (
             StreamInput rawIn = payload.streamInput();
             NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry)
         ) {
             payloadIn.setVersion(in.getVersion());
             PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class);
             result.put(name, stats);
         } catch (IOException | IllegalArgumentException e) {
-            // Receiver doesn't have the plugin's NamedWriteable registered (typical
-            // during rolling upgrades or in a non-uniform plugin install). Drop the
-            // entry; the rest of NodeStats remains decodable.
+            logger.debug("Dropped plugin stats entry '{}': {}", name, e.getMessage());
         }
     }
     return result;
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion adds debug logging for dropped plugin-stats entries, which would help operators diagnose missing stats during rolling upgrades. This is a reasonable observability improvement, though the current silent drop is intentional (per the comment at line 329-331) to avoid log spam during normal rolling upgrades. The improvement is useful but not critical.

Low
Validate spill limit is non-negative

Validate that newLimitBytes is non-negative before attempting the native call. The
setting parser rejects negative values at parse time, but defensive validation here
prevents potential native-side crashes if the value is corrupted in transit or the
parser is bypassed.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444]

 void updateSpillMemoryLimit(long newLimitBytes) {
+    if (newLimitBytes < 0) {
+        logger.warn("Ignoring invalid spill memory limit update: {}B (must be >= 0)", newLimitBytes);
+        return;
+    }
     DataFusionService service = dataFusionService;
     if (service == null) {
         logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes);
         return;
     }
-    if (!service.isSpillLimitDynamic()) {
-        logger.warn(
-            "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
-                + "support runtime spill resize, so the new value will only take effect after a node restart",
-            newLimitBytes
-        );
-        return;
-    }
-    try {
-        service.setSpillMemoryLimit(newLimitBytes);
-        logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes);
-    } catch (IllegalStateException e) {
-        logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes);
-    } catch (UnsupportedOperationException e) {
-        logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes);
-    }
+    ...
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds defensive validation that newLimitBytes is non-negative before the native call. The setting parser already rejects negative values (line 159), so this is redundant defensive code. It's a minor improvement for robustness but not addressing a real gap in the current implementation.

Low
Preserve plugin order when adding mockPlugins

The mockPlugins collection is added to a LinkedHashSet using addAll(), which may not
preserve insertion order if mockPlugins itself is unordered. This could reintroduce
non-determinism that the LinkedHashSet was meant to prevent. Consider converting
mockPlugins to a LinkedHashSet during initialization or ensuring it maintains order.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523]

 Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
+// Preserve order when adding mockPlugins
+mockPlugins.forEach(plugins::add);
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about potential order preservation, but the impact is limited. The improved_code using forEach(plugins::add) is functionally equivalent to addAll() for a LinkedHashSet. The real issue would only occur if mockPlugins contains duplicates or if its iteration order matters, which is not clearly demonstrated in the PR context.

Low
Suggestions up to commit b033e5c
CategorySuggestion                                                                                                                                    Impact
General
Log skipped plugin stats entries

The catch block silently drops unknown plugin stats entries without logging. During
a rolling upgrade or plugin version mismatch, operators have no visibility into
which stats were skipped, making it hard to diagnose incomplete metrics.

server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [320-333]

 try (
     StreamInput rawIn = payload.streamInput();
     NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry)
 ) {
     payloadIn.setVersion(in.getVersion());
     PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class);
     result.put(name, stats);
 } catch (IOException | IllegalArgumentException e) {
     // Receiver doesn't have the plugin's NamedWriteable registered (typical
     // during rolling upgrades or in a non-uniform plugin install). Drop the
     // entry; the rest of NodeStats remains decodable.
+    logger.debug("Skipping unknown plugin stats entry '{}' during deserialization: {}", name, e.getMessage());
 }
Suggestion importance[1-10]: 7

__

Why: Good observability improvement. During rolling upgrades or plugin mismatches, operators would benefit from knowing which stats entries were skipped. The debug-level logging is appropriate since this is expected behavior in mixed-version clusters, not an error condition.

Medium
Enforce runtime spill limit capability check

The method doesn't check isSpillLimitDynamic() before calling setSpillLimit().
According to the javadoc, this should throw UnsupportedOperationException when the
feature is unavailable. Add a guard to enforce this contract.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionService.java [157-159]

 public void setSpillMemoryLimit(long newLimitBytes) {
+    if (!isSpillLimitDynamic()) {
+        throw new UnsupportedOperationException("Spill limit cannot be changed at runtime");
+    }
     NativeBridge.setSpillLimit(getNativeRuntime().get(), newLimitBytes);
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that setSpillMemoryLimit() should validate isSpillLimitDynamic() before proceeding, as documented in the javadoc. This prevents runtime errors and enforces the contract, making it a valuable improvement for robustness.

Medium
Include pool context in overflow error

The overflow check uses Math.addExact but the error message does not indicate which
pool caused the overflow. When an operator hits this with multiple pools configured,
they cannot identify the problematic configuration without inspecting all pool mins.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [377-381]

 try {
     sum = Math.addExact(sum, min);
 } catch (ArithmeticException overflow) {
-    throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow);
+    throw new IllegalArgumentException(
+        "Sum of pool minimums overflows at pool index " + i + " (min=" + min + ", sum so far=" + sum + ")",
+        overflow
+    );
 }
Suggestion importance[1-10]: 6

__

Why: Valid improvement to error diagnostics. The current error message doesn't indicate which pool caused the overflow, making troubleshooting harder. However, the mins array doesn't have pool names at this point in the code (it's a varargs of longs), so the suggestion would need adjustment to reference pool names from the calling context.

Low
Clarify spill-limit warning for operators

The method returns early when isSpillLimitDynamic() is false, logging a warning but
accepting the cluster-state update. If the native library never gains the symbol,
operators will see repeated warnings on every cluster-settings PUT without
actionable feedback that the setting is effectively node-scope-only until restart.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-432]

 void updateSpillMemoryLimit(long newLimitBytes) {
     DataFusionService service = dataFusionService;
     if (service == null) {
         logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes);
         return;
     }
     if (!service.isSpillLimitDynamic()) {
         logger.warn(
             "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
-                + "support runtime spill resize, so the new value will only take effect after a node restart",
+                + "support runtime spill resize (missing df_set_spill_limit symbol), so the new value will only take effect after a node restart. "
+                + "To avoid this warning, configure the setting in opensearch.yml instead of via _cluster/settings.",
             newLimitBytes
         );
         return;
     }
Suggestion importance[1-10]: 5

__

Why: The suggestion improves operator guidance by explicitly mentioning the missing symbol and recommending opensearch.yml configuration. This is helpful for reducing confusion, though the existing warning already conveys the core issue (runtime resize not supported, restart required).

Low
Document headroom division truncation behavior

The division headroom / poolCount can lose precision when headroom is not evenly
divisible by poolCount. Consider using Math.floorDiv or documenting the intentional
truncation behavior, as the remainder is effectively discarded rather than
distributed.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [246]

+// Integer division intentionally truncates remainder; undistributed headroom
+// stays in the root allocator's unallocated space.
 long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0;
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that integer division truncates the remainder, but this is standard Java behavior and the code is already clear. Adding a comment would be helpful for maintainability, though the impact is minor since the behavior is mathematically correct and the remainder staying in the root is acceptable.

Low
Preserve plugin order for mockPlugins

The mockPlugins collection is added to plugins using addAll(), which may not
preserve insertion order if mockPlugins is a HashSet. This could reintroduce
non-deterministic plugin ordering. Ensure mockPlugins is also order-preserving or
convert it before adding.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523]

 Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
+plugins.addAll(new LinkedHashSet<>(mockPlugins));
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that mockPlugins could be a HashSet, which might reintroduce non-deterministic ordering. However, the impact is limited since mockPlugins is typically controlled by the test framework and may already be order-preserving. The fix is defensive but not critical.

Low
Suggestions up to commit a9ff751
CategorySuggestion                                                                                                                                    Impact
General
Add diagnostic context to overflow error

The overflow check uses Math.addExact which throws ArithmeticException on overflow,
but the error message doesn't indicate which pool caused the overflow or what the
current sum was. For large deployments with many pools, include diagnostic context
(current sum, failing min value) to help operators identify the misconfiguration.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [377-381]

 try {
     sum = Math.addExact(sum, min);
 } catch (ArithmeticException overflow) {
-    throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow);
+    throw new IllegalArgumentException(
+        "Sum of pool minimums overflows. Current sum: " + sum + ", attempted to add: " + min,
+        overflow
+    );
 }
Suggestion importance[1-10]: 6

__

Why: Adding the current sum and failing min value to the error message would significantly improve debuggability for operators trying to diagnose pool configuration issues. This is a practical enhancement that helps identify which pool configuration caused the overflow.

Low
Clarify warning for missing dynamic spill support

The method returns early after logging a warning when the native library doesn't
support dynamic spill resize, but the cluster-state update has already been
accepted. This could confuse operators who see HTTP 200 but no immediate effect.
Consider documenting this behavior in the setting's Javadoc or emitting a more
prominent warning at startup if the symbol is missing.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [426-432]

 if (!service.isSpillLimitDynamic()) {
     logger.warn(
         "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
-            + "support runtime spill resize, so the new value will only take effect after a node restart",
+            + "support runtime spill resize (df_set_spill_limit symbol absent), so the new value will only take effect after a node restart. "
+            + "Upgrade to a newer native library for live updates.",
         newLimitBytes
     );
     return;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion improves the warning message by explicitly mentioning the missing df_set_spill_limit symbol and suggesting an upgrade path. This enhances operator experience by making the limitation clearer, though the existing warning already conveys the essential information.

Low
Document integer division truncation behavior

The division headroom / poolCount can lose precision when headroom is not evenly
divisible by poolCount. Consider using a more precise distribution algorithm or
documenting the intentional truncation behavior, as small pools may systematically
receive less than their fair share over many rebalance cycles.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [246]

+// Integer division intentionally truncates remainder; undistributed bytes stay in root headroom
 long bonusPerPool = poolCount > 0 ? headroom / poolCount : 0;
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that integer division truncates the remainder, but the impact is minimal since the undistributed bytes remain in the root headroom and will be distributed in the next rebalance cycle. The comment would improve code clarity but doesn't address a functional issue.

Low
Return immutable empty list

The nodeStats() method returns a mutable list that could be modified by callers,
potentially causing unexpected behavior. Return an immutable empty list using
List.of() instead of Collections.emptyList() to prevent accidental modifications and
align with modern Java best practices.

server/src/main/java/org/opensearch/plugins/Plugin.java [266-268]

 public List<PluginNodeStats> nodeStats() {
-    return Collections.emptyList();
+    return List.of();
 }
Suggestion importance[1-10]: 4

__

Why: While List.of() is more modern and returns an immutable list, Collections.emptyList() is also immutable and widely used. The suggestion is valid but offers minimal practical benefit since both return immutable empty lists. The impact is low as this is a default implementation that plugins can override.

Low
Preserve deterministic plugin ordering

The method modifies a LinkedHashSet after construction, which could lead to
unexpected plugin ordering if mockPlugins contains duplicates or conflicts with
nodePlugins(). Consider using Stream.concat() to preserve insertion order
deterministically and avoid potential ordering issues.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-524]

-Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
-return plugins;
+return Stream.concat(
+    nodeConfigurationSource.nodePlugins().stream(),
+    mockPlugins.stream()
+).distinct().collect(Collectors.toCollection(LinkedHashSet::new));
Suggestion importance[1-10]: 3

__

Why: The suggestion adds .distinct() which could alter behavior if duplicate plugins are intentionally allowed. The current code using LinkedHashSet already handles duplicates and preserves order. The Stream.concat() approach is more complex without clear benefits, and the concern about "unexpected plugin ordering" is already addressed by the PR's switch to LinkedHashSet.

Low
Suggestions up to commit a9ff751
CategorySuggestion                                                                                                                                    Impact
General
Validate individual pool mins early

The validator checks rootLimit == Long.MAX_VALUE to skip validation when the root is
unbounded. However, if any individual pool min is Long.MAX_VALUE, the Math.addExact
call will throw ArithmeticException even when the root is bounded. Consider adding
an early check for individual mins being Long.MAX_VALUE to provide a clearer error
message before attempting the sum.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowBasePlugin.java [374-384]

 private static void validateMinSum(long rootLimit, long... mins) {
     if (rootLimit == Long.MAX_VALUE) {
         return;
+    }
+    for (long min : mins) {
+        if (min == Long.MAX_VALUE) {
+            throw new IllegalArgumentException("Pool minimum cannot be Long.MAX_VALUE when root limit is bounded");
+        }
     }
     long sum = 0;
     for (long min : mins) {
         try {
             sum = Math.addExact(sum, min);
         } catch (ArithmeticException overflow) {
             throw new IllegalArgumentException("Sum of pool minimums overflows.", overflow);
         }
     }
     if (sum > rootLimit) {
         throw new IllegalArgumentException(
             "Sum of pool minimums (" + sum + ") exceeds root limit (" + rootLimit + ")"
         );
     }
 }
Suggestion importance[1-10]: 6

__

Why: Adding an early check for individual pool mins being Long.MAX_VALUE provides a clearer error message before attempting the sum, improving error handling and user experience. This is a moderate improvement in code quality and maintainability.

Low
Log when min update deferred

The method updates poolMins unconditionally but only raises the live limit when
target > current. If newMin is below the current limit, the recorded min diverges
from the live limit until the next rebalance tick. Consider logging a debug message
when the min is updated but the live limit is not raised, so operators can observe
that the change is deferred to the rebalancer.

plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java [160-172]

 public void setPoolMin(String poolName, long newMin) {
     ArrowPoolHandle handle = pools.get(poolName);
     if (handle == null) {
         throw new IllegalStateException("Pool '" + poolName + "' does not exist");
     }
     poolMins.put(poolName, newMin);
     long max = poolMaxes.getOrDefault(poolName, Long.MAX_VALUE);
     long current = handle.allocator.getLimit();
     long target = Math.min(newMin, max);
     if (target > current) {
         handle.allocator.setLimit(target);
+    } else {
+        logger.debug("Pool '{}' min updated to {} but live limit {} not raised; rebalancer will apply on next tick", poolName, newMin, current);
     }
 }
Suggestion importance[1-10]: 5

__

Why: Adding a debug log when the min is updated but the live limit is not raised improves observability for operators, helping them understand that the change is deferred to the rebalancer. This is a useful enhancement for debugging and monitoring, though not critical.

Low
Optimize null registry path

When registry == null, the method continues the loop but never populates result, so
it will return an empty map after reading all entries. The loop should break early
after detecting registry == null to avoid unnecessary reads of the remaining
entries. Alternatively, log a debug message once when the registry is absent so
operators can diagnose why plugin stats are missing.

server/src/main/java/org/opensearch/action/admin/cluster/node/stats/NodeStats.java [296-334]

 private static Map<String, PluginNodeStats> readPluginStats(StreamInput in) throws IOException {
     int size = in.readVInt();
     if (size == 0) {
         return Collections.emptyMap();
     }
     NamedWriteableRegistry registry = in.namedWriteableRegistry();
+    if (registry == null) {
+        for (int i = 0; i < size; i++) {
+            in.readString();
+            in.readBytesReference();
+        }
+        return Collections.emptyMap();
+    }
     Map<String, PluginNodeStats> result = new HashMap<>(size);
     for (int i = 0; i < size; i++) {
         String name = in.readString();
         BytesReference payload = in.readBytesReference();
-        if (registry == null) {
-            continue;
-        }
         try (
             StreamInput rawIn = payload.streamInput();
             NamedWriteableAwareStreamInput payloadIn = new NamedWriteableAwareStreamInput(rawIn, registry)
         ) {
             payloadIn.setVersion(in.getVersion());
             PluginNodeStats stats = payloadIn.readNamedWriteable(PluginNodeStats.class);
             result.put(name, stats);
         } catch (IOException | IllegalArgumentException e) {
-            // Receiver doesn't have the plugin's NamedWriteable registered (typical
-            // during rolling upgrades or in a non-uniform plugin install). Drop the
-            // entry; the rest of NodeStats remains decodable.
+            // Drop unknown entries
         }
     }
     return result;
 }
Suggestion importance[1-10]: 5

__

Why: Breaking early when registry == null avoids unnecessary reads and improves efficiency. This is a minor optimization that enhances code performance and readability, though the impact is limited to edge cases.

Low
Eliminate redundant exception catch

The method catches UnsupportedOperationException after checking
isSpillLimitDynamic(), which should make this branch unreachable. The comment
acknowledges this is defensive against a race. However, if the race is truly
possible, the isSpillLimitDynamic() check should be moved inside the try block or
synchronized to avoid the time-of-check-time-of-use gap. Otherwise, remove the
redundant catch block.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/DataFusionPlugin.java [420-444]

 void updateSpillMemoryLimit(long newLimitBytes) {
     DataFusionService service = dataFusionService;
     if (service == null) {
         logger.debug("DataFusion service not yet initialized; ignoring spill limit update to {}B", newLimitBytes);
         return;
     }
-    if (!service.isSpillLimitDynamic()) {
-        logger.warn(
-            "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
-                + "support runtime spill resize, so the new value will only take effect after a node restart",
-            newLimitBytes
-        );
-        return;
-    }
     try {
+        if (!service.isSpillLimitDynamic()) {
+            logger.warn(
+                "Updated DataFusion spill memory limit to {}B at the cluster level; the loaded native library does not "
+                    + "support runtime spill resize, so the new value will only take effect after a node restart",
+                newLimitBytes
+            );
+            return;
+        }
         service.setSpillMemoryLimit(newLimitBytes);
         logger.info("Updated DataFusion spill memory limit to {}B", newLimitBytes);
     } catch (IllegalStateException e) {
         logger.warn("Ignoring spill memory limit update to {}B; service is not running", newLimitBytes);
-    } catch (UnsupportedOperationException e) {
-        logger.warn("Ignoring spill memory limit update to {}B; native runtime does not support live updates", newLimitBytes);
     }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion to move the isSpillLimitDynamic() check inside the try block or remove the redundant catch block is valid for code clarity. However, the current defensive approach is acceptable, and the change is minor in impact.

Low
Preserve plugin initialization order

The mockPlugins collection is added using addAll() which may not preserve insertion
order if mockPlugins is not order-preserving. Since plugin initialization order is
now critical (as noted in the comment), ensure mockPlugins is also order-preserving
or document the ordering contract.

test/framework/src/main/java/org/opensearch/test/InternalTestCluster.java [522-523]

 Set<Class<? extends Plugin>> plugins = new LinkedHashSet<>(nodeConfigurationSource.nodePlugins());
-plugins.addAll(mockPlugins);
+// Preserve order when adding mockPlugins to maintain deterministic plugin initialization
+mockPlugins.forEach(plugins::add);
Suggestion importance[1-10]: 3

__

Why: While the concern about order preservation is valid, addAll() on a LinkedHashSet already preserves insertion order from the source collection. The suggested change using forEach is functionally equivalent but less idiomatic. The suggestion addresses a non-issue since addAll() maintains order when the target is a LinkedHashSet.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 9188043: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from 9188043 to d419b2a Compare May 19, 2026 09:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d646dd4.

PathLineSeverityDescription
plugins/arrow-flight-rpc/build.gradle25highNew dependency added: `compileOnly project(':libs:opensearch-arrow-spi')`. Per mandatory flagging rule, all dependency additions must be flagged regardless of apparent legitimacy. This is an internal monorepo project reference rather than an external registry artifact.
sandbox/plugins/analytics-backend-datafusion/build.gradle51highMultiple new dependencies added: `compileOnly project(':libs:opensearch-arrow-spi')`, `compileOnly project(':plugins:arrow-base')`, and `testImplementation project(':plugins:arrow-base')`. Per mandatory flagging rule, all dependency additions must be flagged. These are internal monorepo project references.
sandbox/plugins/analytics-engine/build.gradle66highNew dependency added (`compileOnly project(':libs:opensearch-arrow-spi')`) and dependency scope changed from `testRuntimeOnly` to `testImplementation` for `project(':plugins:arrow-base')`. Per mandatory flagging rule, all dependency changes must be flagged.
plugins/arrow-base/src/main/java/org/opensearch/arrow/allocator/ArrowNativeAllocator.java86low`ensureForTesting()` is a `public static` method in production code that forcibly closes any existing static `INSTANCE` singleton and replaces it unconditionally. Documented as test-only but accessible to any caller at runtime, including plugin code; misuse could disrupt the node-wide allocator lifecycle outside of tests.

The table above displays the top 10 most important findings.

Total: 4 | Critical: 0 | High: 3 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from d419b2a to 10f3617 Compare May 19, 2026 10:00
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 498b502

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 498b502: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to reject the request, throw OpenSearchRejectedException rather than oom-ing. Also lets try to see how we can wire up circuit-breaker for this. Maybe circuit-breaker stats is something we leverage for tracking memory used

@bowenlan-amzn
Copy link
Copy Markdown
Member

This PR seems to be the good place to get some alignment on the memory assignment across different components in the system.
Let's take 64G RAM as example.

64 GB RAM
├── JVM Heap (16 GB / 25%)
│   ├── Indexing pressure, Writer buffer
│   ├── Transport Netty
│   └── etc. ...
|
└── Off-Heap (48 GB / 75%)
    │
    ├── node.native_memory.limit (40 G)
    │   │
    │   ├── Java Arrow Root (12G / 30%)
    │   │   ├── POOL_FLIGHT  (2G / 5%)
    │   │   │   └── flight/server, flight/client — batch ser/deser
    │   │   ├── POOL_INGEST  (4G / 10%)
    │   │   │   └── ArrowBufferPool children (per-VSR, pool_limit/divisor)
    │   │   └── POOL_QUERY   (2G / 5%)
    │   │       └── analytics-search-service → per-query children (256MB each)
    │   │
    │   └── Rust Memory Pool (28G / 70%)
    │       └── DataFusion Rust pool (datafusion.memory_pool_limit_bytes)
    │           ├── hash tables, sort buffers, aggs, intermediate batches
    │           └── spill staging (datafusion.spill_memory_limit_bytes)
    │
    └── Rest (no knobs, at least 8G)
        ├── Lucene mmap (~2 GB)
        └── Page Cache (whatever's left)

I think the Java Arrow memory are mostly used for intermediate data transfer, while Rust memory are used for query execution. That's one reason to provide more to Rust side.

@Bukhtawar @alchemist51 @mch2 @expani

@alchemist51 alchemist51 added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label May 19, 2026
@alchemist51 alchemist51 reopened this May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit d646dd4

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for d646dd4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from d646dd4 to 53c0e36 Compare May 19, 2026 20:10
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 53c0e36

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e1f3c5b

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for e1f3c5b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from e1f3c5b to 041dc0e Compare May 19, 2026 20:26
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 041dc0e

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 041dc0e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Member

@bowenlan-amzn bowenlan-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overral looks good! Thanks for refactoring the allocator related changes in arrow-base.
Left some comments that should be easy to accommodate. Approving now.

Comment thread server/src/main/java/org/opensearch/plugins/Plugin.java
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 40688ad

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 40688ad: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 66f047b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3a0373c

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from 3a0373c to 6645315 Compare May 21, 2026 19:33
DefaultPlanExecutor created its coordinator allocator from POOL_QUERY in
its ctor, but as a Guice-bound singleton it has no close hook. On node
shutdown ArrowBasePlugin closes the root allocator chain, which walks
descendants and finds POOL_QUERY → coordinator still attached. The pool
allocator's close-with-outstanding-children check then throws:

  java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.
  Allocator(ROOT) → Allocator(query) → Allocator(coordinator)

This was visible as 14 failures across CoordinatorTransportStressIT (5),
MemoryGuardIT (8), and WindowSqlIT.classMethod (1) — all in
:sandbox:qa:analytics-engine-coordinator:internalClusterTest. The
CoordinatorTransportStressIT subset reproduced on pre-merge HEAD too,
confirming the leak pre-dates the latest origin/main merge.

Fix: introduce a tiny CoordinatorAllocatorHandle (AutoCloseable) and
move the allocator's creation into AnalyticsPlugin.createComponents.
The plugin now owns the allocator's lifetime and closes it from
Plugin.close() — the same shape AnalyticsSearchService already follows
for its own POOL_QUERY child. DefaultPlanExecutor consumes the handle
via Guice injection and stops worrying about lifecycle.

Plugins are closed in reverse iteration order in Node.close()
(server/src/main/.../Node.java:2228), so AnalyticsPlugin.close() runs
before ArrowBasePlugin.close() — the coordinator child is released
before the pool/root teardown begins.

Verified: :sandbox:qa:analytics-engine-coordinator:internalClusterTest
goes from 14 failures → 0 failures (82 tests, 61 skipped — pre-existing
@AwaitsFix/@ignore).

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6645315

@gaurav-amz gaurav-amz force-pushed the unified-allocator-followup branch from 6645315 to a9ff751 Compare May 21, 2026 19:35
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a9ff751

@gaurav-amz
Copy link
Copy Markdown
Contributor Author

Closing and reopening to trigger the build again.

@gaurav-amz gaurav-amz closed this May 21, 2026
@gaurav-amz gaurav-amz reopened this May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a9ff751

@gaurav-amz gaurav-amz closed this May 21, 2026
@gaurav-amz gaurav-amz reopened this May 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit a9ff751

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for a9ff751: SUCCESS

VSRRotationBenchmark references ArrowNativeAllocator (the new unified
allocator API), but the benchmarks subproject only declared a transitive
api dep on parquet-data-format. parquet-data-format declares arrow-base
as compileOnly (not exported), so benchmarks couldn't see it at compile.

CI failure:
  VSRRotationBenchmark.java:82: error: package org.opensearch.arrow.allocator does not exist
      private org.opensearch.arrow.allocator.ArrowNativeAllocator nativeAllocator;

Mirror parquet-data-format's pattern: declare arrow-base as compileOnly.
Runtime continues to work via the plugin's classloader.

Signed-off-by: Gaurav Singh <snghsvn@amazon.com>
Signed-off-by: Gaurav Singh <snghsvn@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b033e5c

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1bd0be2

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 1bd0be2: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit dcddc3c

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for dcddc3c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaurav-amz gaurav-amz closed this May 22, 2026
@gaurav-amz gaurav-amz reopened this May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit dcddc3c

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for dcddc3c: SUCCESS

@gaurav-amz
Copy link
Copy Markdown
Contributor Author

Follow up PR changes.

  1. Native allocator stats under native memory.
  2. Refactoring of native allocator stats.

@Bukhtawar Bukhtawar merged commit 609ac5c into opensearch-project:main May 22, 2026
43 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants