Skip to content

Fixed empty index search failure#21754

Draft
vinaykpud wants to merge 1 commit into
opensearch-project:mainfrom
vinaykpud:feature/empty_index
Draft

Fixed empty index search failure#21754
vinaykpud wants to merge 1 commit into
opensearch-project:mainfrom
vinaykpud:feature/empty_index

Conversation

@vinaykpud
Copy link
Copy Markdown
Contributor

@vinaykpud vinaykpud commented May 20, 2026

Description

Fixes Schema error: No field named … when querying an index whose shard has zero parquet files. With no on-disk data, the Rust side's infer_schema cannot produce a schema, so any column reference in the substrait plan fails to bind.

Fix: when a shard is empty, send a MapperService-derived Arrow schema across the FFM boundary and register a zero-batch MemTable instead of a ListingTable. The plan validates and the query returns 0 rows with the correct columns. Non-empty shards are unaffected and stay on the existing infer_schema path.

Changes

  • MapperServiceArrowSchema — Java helper that maps OpenSearch field types (keyword/text → Utf8, long → Int64, double → Float64, date → Timestamp, etc.) to an Arrow Schema.
  • ShardScanInstructionHandler — encodes the schema as Arrow IPC bytes and passes them through; sends an empty buffer when no mapping exists.
  • NativeBridge#createSessionContext — extends the FFM signature with (schema_ipc_ptr, schema_ipc_len); legacy callers pass (NULL, 0).
  • Rust df_create_session_context / create_session_context — when the shard's object_metas is empty and IPC bytes are supplied, decode via StreamReader and register a zero-batch MemTable; otherwise unchanged.
  • Indexed path passes None (always has on-disk files).

Tests

  • Rust unit test covering encode → decode → empty MemTableSELECT returns 0 rows.
  • Updated existing Java tests for the new bridge signature.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Reviewer Guide 🔍

(Review updated until commit 9bdf702)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Resource Leak

If iterator() is called but the returned iterator is never consumed (no call to next()), the VectorSchemaRoot is never created and close() becomes a no-op. However, if next() is called once, lastBatch holds a VectorSchemaRoot that the caller is expected to close. The comment on line 86-87 states the caller takes ownership, but close() on line 56-58 still attempts to close lastBatch. If the caller already closed it, line 94 silently ignores the exception. This creates ambiguity: either the stream owns the batch (and should always close it) or the caller owns it (and the stream should never close it). The current hybrid approach risks double-close attempts or, if the caller forgets to close, a leak when the stream is closed before the batch is consumed.

@Override
public void close() {
    if (iteratorInstance != null) {
        iteratorInstance.closeLastBatch();
    }
}

private static final class EmptyIterator implements Iterator<EngineResultBatch> {
    private final Schema schema;
    private final BufferAllocator allocator;
    private boolean batchEmitted;
    private VectorSchemaRoot lastBatch;

    EmptyIterator(Schema schema, BufferAllocator allocator) {
        this.schema = schema;
        this.allocator = allocator;
    }

    @Override
    public boolean hasNext() {
        return batchEmitted == false;
    }

    @Override
    public EngineResultBatch next() {
        if (hasNext() == false) {
            throw new NoSuchElementException();
        }
        VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator);
        root.setRowCount(0);
        lastBatch = root;
        batchEmitted = true;
        // Caller takes ownership of the VSR's lifecycle (Flight transport closes after wire write,
        // row-path collector closes after reading). Same contract as DatafusionResultStream.next().
        return new EmptyArrowBatch(root);
    }

    void closeLastBatch() {
        if (lastBatch != null) {
            try {
                lastBatch.close();
            } catch (Exception ignored) {
                // best-effort; the consumer may have already transferred ownership.
            }
            lastBatch = null;
        }
Possible NPE

resolved.plan.getOutputFields() is called without null-checking resolved.plan first. If resolved.plan is null, this line throws a NullPointerException. The subsequent check outputFields != null only guards against a null return from getOutputFields(), not a null plan object.

List<FieldSpec> outputFields = resolved.plan.getOutputFields();

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

PR Code Suggestions ✨

Latest suggestions up to 9bdf702

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Release reader before early return

The gatedReader is acquired but never released when taking the empty-shard
short-circuit path. This creates a resource leak because FragmentResources expects
the caller to manage the lifecycle, but the early return bypasses normal cleanup.
Wrap the reader release in a try-catch or ensure it's closed before returning.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchService.java [168-172]

 if (outputFields != null && gatedReader.get().catalogSnapshot().getSegments().isEmpty()) {
     Schema arrowSchema = RelDataTypeArrowSchema.toArrow(outputFields);
     EngineResultStream emptyStream = new EmptyEngineResultStream(arrowSchema, allocator);
-    return new FragmentResources(gatedReader, null, emptyStream, null);
+    gatedReader.close();
+    return new FragmentResources(null, null, emptyStream, null);
 }
Suggestion importance[1-10]: 9

__

Why: This identifies a critical resource leak where gatedReader is acquired but never released when taking the empty-shard short-circuit path. The FragmentResources constructor receives the reader, but if it's closed before passing, the first parameter should be null to avoid double-close issues.

High
General
Prevent reuse after consumption

Multiple calls to iterator() after close() will return a closed iterator instance,
potentially causing unexpected behavior. After close() is called, subsequent
iterator() calls should either throw an exception or create a fresh iterator to
maintain consistent state.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/EmptyEngineResultStream.java [47-52]

 @Override
 public Iterator<EngineResultBatch> iterator() {
     if (iteratorInstance == null) {
         iteratorInstance = new EmptyIterator(schema, allocator);
+    } else if (iteratorInstance.batchEmitted) {
+        throw new IllegalStateException("Iterator already consumed or stream closed");
     }
     return iteratorInstance;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that calling iterator() after consumption could return a stale iterator. However, the EngineResultStream contract typically expects single-use iterators, and the test at line 54 explicitly validates iterator caching. The suggested check would break this contract.

Low
Use standard boolean negation

Use the standard boolean comparison !batchEmitted instead of batchEmitted == false
for better readability and consistency with Java conventions.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/EmptyEngineResultStream.java [73-75]

 @Override
 public boolean hasNext() {
-    return batchEmitted == false;
+    return !batchEmitted;
 }
Suggestion importance[1-10]: 2

__

Why: While !batchEmitted is more idiomatic than batchEmitted == false, this is a minor style preference with negligible impact on code quality or functionality. The existing code is clear and correct.

Low

Previous suggestions

Suggestions up to commit 0266f70
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle resource cleanup on error

The gatedReader is accessed without proper resource management in the early-return
path. If an exception occurs during RelDataTypeArrowSchema.toArrow() or
EmptyEngineResultStream construction, the gatedReader will leak because it's
returned in FragmentResources but never closed on error. Wrap the empty-shard logic
in a try-catch block that closes gatedReader on failure.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/AnalyticsSearchService.java [167-172]

 List<FieldSpec> outputFields = resolved.plan.getOutputFields();
 if (outputFields != null && gatedReader.get().catalogSnapshot().getSegments().isEmpty()) {
-    Schema arrowSchema = RelDataTypeArrowSchema.toArrow(outputFields);
-    EngineResultStream emptyStream = new EmptyEngineResultStream(arrowSchema, allocator);
-    return new FragmentResources(gatedReader, null, emptyStream, null);
+    try {
+        Schema arrowSchema = RelDataTypeArrowSchema.toArrow(outputFields);
+        EngineResultStream emptyStream = new EmptyEngineResultStream(arrowSchema, allocator);
+        return new FragmentResources(gatedReader, null, emptyStream, null);
+    } catch (Exception e) {
+        gatedReader.close();
+        throw e;
+    }
 }
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a resource leak risk. If RelDataTypeArrowSchema.toArrow() or EmptyEngineResultStream construction throws an exception, the gatedReader will not be closed, causing a resource leak. The improved code properly wraps the logic in a try-catch block to ensure cleanup.

Medium
General
Ensure thread-safe iterator initialization

The iterator() method is not thread-safe. If multiple threads call iterator()
concurrently, they could create multiple EmptyIterator instances, violating the
contract that requires a single shared iterator. This could lead to resource leaks
or inconsistent state. Synchronize access or use atomic operations to ensure
thread-safe lazy initialization.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/EmptyEngineResultStream.java [47-52]

 @Override
-public Iterator<EngineResultBatch> iterator() {
+public synchronized Iterator<EngineResultBatch> iterator() {
     if (iteratorInstance == null) {
         iteratorInstance = new EmptyIterator(schema, allocator);
     }
     return iteratorInstance;
 }
Suggestion importance[1-10]: 6

__

Why: While the suggestion identifies a potential thread-safety issue, the severity depends on the actual usage context. The EngineResultStream contract typically assumes single-threaded access per stream instance. However, adding synchronization is a reasonable defensive measure to prevent potential race conditions during lazy initialization.

Low
Ensure thread-safe batch emission flag

The hasNext() method is not thread-safe and could be called concurrently with
next(). If two threads call hasNext() simultaneously when batchEmitted is false,
both could see true and proceed to call next(), causing the second thread to throw
NoSuchElementException. Mark batchEmitted as volatile or synchronize access to
ensure visibility across threads.

sandbox/plugins/analytics-engine/src/main/java/org/opensearch/analytics/exec/EmptyEngineResultStream.java [73-75]

+private volatile boolean batchEmitted;
+
 @Override
 public boolean hasNext() {
     return batchEmitted == false;
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid thread-safety concern about the batchEmitted flag. However, iterator instances are typically not shared across threads in the EngineResultStream contract. Making batchEmitted volatile is a low-cost defensive measure, though the actual risk depends on usage patterns not visible in the PR.

Low
Suggestions up to commit b1041be
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate length cast to prevent overflow

Casting schema_ipc_len (i64) to usize without validation can cause truncation or
overflow on 32-bit platforms or with negative values. Validate that schema_ipc_len
is non-negative and fits in usize before the cast to prevent undefined behavior or
panics.

sandbox/plugins/analytics-backend-datafusion/rust/src/ffm.rs [531-535]

 let schema_ipc_bytes: Option<Vec<u8>> = if schema_ipc_len > 0 && !schema_ipc_ptr.is_null() {
-    Some(std::slice::from_raw_parts(schema_ipc_ptr, schema_ipc_len as usize).to_vec())
+    let len = usize::try_from(schema_ipc_len)
+        .map_err(|_| "schema_ipc_len out of range for usize")?;
+    Some(std::slice::from_raw_parts(schema_ipc_ptr, len).to_vec())
 } else {
     None
 };
Suggestion importance[1-10]: 7

__

Why: This addresses a real safety concern where casting i64 to usize without validation could cause truncation on 32-bit platforms or with negative values. Using try_from prevents potential undefined behavior in FFI boundaries, making this a valuable safety improvement.

Medium
General
Add null-safety for schema conversion

The code assumes ArrowSchemaIpc.toBytes never returns null, but if it throws an
exception or returns null, arrowSchemaIpc could be null when passed to
NativeBridge.createSessionContext. Add null-safety by wrapping the conversion in a
try-catch or adding an explicit null check after the call.

sandbox/plugins/analytics-backend-datafusion/src/main/java/org/opensearch/be/datafusion/ShardScanInstructionHandler.java [63-64]

 Schema arrowSchema = MapperServiceArrowSchema.fromMapperService(context.getMapperService());
-byte[] arrowSchemaIpc = arrowSchema.getFields().isEmpty() ? new byte[0] : ArrowSchemaIpc.toBytes(arrowSchema);
+byte[] arrowSchemaIpc = new byte[0];
+if (!arrowSchema.getFields().isEmpty()) {
+    byte[] converted = ArrowSchemaIpc.toBytes(arrowSchema);
+    arrowSchemaIpc = (converted != null) ? converted : new byte[0];
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion adds defensive null-checking for ArrowSchemaIpc.toBytes, which could improve robustness. However, without evidence that toBytes can return null or throw exceptions in practice, this is a minor defensive improvement rather than fixing a critical bug.

Low
Simplify unwrap for clearer invariant

The condition checks schema_ipc_bytes is Some and non-empty, but the .expect()
message could be clearer. If the logic changes, the expect could panic unexpectedly.
Replace .expect() with .unwrap() or use pattern matching to make the invariant
explicit and avoid potential panic confusion.

sandbox/plugins/analytics-backend-datafusion/rust/src/session_context.rs [151-154]

 if shard_view.object_metas.is_empty() && schema_ipc_bytes.as_ref().map(|b| !b.is_empty()).unwrap_or(false) {
     ...
-    let bytes = schema_ipc_bytes.as_ref().expect("checked non-empty above");
+    let bytes = schema_ipc_bytes.as_ref().unwrap();
Suggestion importance[1-10]: 2

__

Why: The suggestion replaces .expect() with .unwrap(), which provides less debugging information. The existing .expect() with a descriptive message is actually better practice for maintainability and debugging, making this suggestion counterproductive.

Low

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for b1041be: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.38%. Comparing base (2696b95) to head (9bdf702).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21754      +/-   ##
============================================
- Coverage     73.48%   73.38%   -0.10%     
+ Complexity    75084    75064      -20     
============================================
  Files          6016     6016              
  Lines        341072   341072              
  Branches      49091    49091              
============================================
- Hits         250626   250286     -340     
- Misses        70494    70836     +342     
+ Partials      19952    19950       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vinaykpud vinaykpud force-pushed the feature/empty_index branch from 5477f32 to 0266f70 Compare May 21, 2026 00:59
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5477f32

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0266f70

@vinaykpud vinaykpud force-pushed the feature/empty_index branch from 0266f70 to 61de9fb Compare May 21, 2026 01:02
Signed-off-by: Vinay Krishna Pudyodu <vinkrish.neo@gmail.com>
@vinaykpud vinaykpud force-pushed the feature/empty_index branch from 61de9fb to 9bdf702 Compare May 21, 2026 01:03
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9bdf702

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 9bdf702: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant