Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 26, 2025

OR filters with multiple indexed fields return the same document multiple times when it matches multiple clauses through different indexes. For example:

NitriteCollection items = db.getCollection("items");
items.createIndex(IndexOptions.indexOptions(IndexType.NON_UNIQUE), "field_a");
items.createIndex(IndexOptions.indexOptions(IndexType.NON_UNIQUE), "field_b");

Document doc = Document.createDocument();
doc.put("field_a", "A");
doc.put("field_b", "B");
items.insert(doc);

Filter orFilter = Filter.or(
    FluentFilter.where("field_a").eq("A"),
    FluentFilter.where("field_b").eq("B")
);

DocumentCursor cursor = items.find(orFilter);
// Returns 2 results instead of 1 - same document appears twice

Changes

  • ReadOperations.java: Apply DistinctStream unconditionally for OR filters with subplans to deduplicate by NitriteId
  • IssueTest.java: Add test case validating single document matching multiple OR clauses returns once
  • CollectionFindByCompoundIndexTest.java: Update tests expecting duplicate behavior to reflect correct OR semantics

The fix ensures OR filters implement proper set union semantics where each document appears at most once regardless of how many conditions it satisfies.

Original prompt

This section details on the original issue you should resolve

<issue_title>Multiple indexes make records appear multiple times in result</issue_title>
<issue_description>Using 'or' with filters for fields in diffent indexes will cause the matching document to be produced once for each index.

NitriteCollection items = db.getCollection("items");
items.createIndex(IndexOptions.indexOptions(IndexType.NON_UNIQUE), "field_a");
items.createIndex(IndexOptions.indexOptions(IndexType.NON_UNIQUE), "field_b");
  	
Document doc = Document.createDocument();
doc.put("field_a", "A");
doc.put("field_b", "B");
items.insert(doc);

Filter aFilter = FluentFilter.where("field_a").eq("A");
Filter bFilter = FluentFilter.where("field_b").eq("B");
  	
Filter orFilter = Filter.or(aFilter, bFilter);

DocumentCursor cursor = items.find(orFilter);
Iterator<Document> docIter = cursor.iterator();
  	
List<Long> matches = new ArrayList<>();
while (docIter.hasNext()) {
    Document match = docIter.next();
    long id = match.getId().getIdValue();
    matches.add(id);
}
assertEquals("Single document must yiels single match", 1, matches.size());</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 26, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI changed the title [WIP] Fix multiple indexes causing duplicate records in result set Fix OR filters returning duplicate documents when using multiple indexes Oct 26, 2025
Copilot AI requested a review from anidotnet October 26, 2025 12:25
Changed the expected cursor size from 5 to 3 in CollectionFindByCompoundIndexTest for both mvstore and rocksdb adapters to reflect the correct number of results returned by the query.
@anidotnet anidotnet marked this pull request as ready for review October 27, 2025 06:19
Copilot AI review requested due to automatic review settings October 27, 2025 06:19
@anidotnet anidotnet merged commit 3616f9f into main Oct 27, 2025
15 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where OR filters with multiple indexed fields incorrectly return duplicate documents when a single document matches multiple OR clauses through different indexes. The fix ensures proper set union semantics by always deduplicating results for OR filters.

  • Applied DistinctStream unconditionally for OR filters with subplans in ReadOperations.java
  • Added regression test in IssueTest.java to validate the fix
  • Updated existing tests that incorrectly expected duplicate behavior

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
nitrite/src/main/java/org/dizitart/no2/collection/operation/ReadOperations.java Removed conditional check and always apply DistinctStream for OR filters to prevent duplicates
nitrite/src/test/java/org/dizitart/no2/integration/collection/IssueTest.java Added test case reproducing the reported issue with OR filters across multiple indexes
nitrite/src/test/java/org/dizitart/no2/integration/collection/CollectionFindByCompoundIndexTest.java Updated expected result counts from 5 to 3 and 3 to 2 to reflect correct deduplication behavior
nitrite-rocksdb-adapter/src/test/java/org/dizitart/no2/integration/collection/CollectionFindByCompoundIndexTest.java Updated expected result count from 5 to 3 for OR filter test
nitrite-mvstore-adapter/src/test/java/org/dizitart/no2/integration/collection/CollectionFindByCompoundIndexTest.java Updated expected result count from 5 to 3 for OR filter test

Comment on lines +110 to 112
assertEquals(1, cursor.toList().stream().filter(d ->
d.get("firstName", String.class).equals("fn3")
&& d.get("lastName", String.class).equals("ln2")).count());
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling cursor.toList() twice (lines 106 and 110) iterates through the cursor multiple times and creates multiple list copies. Since cursors are typically forward-only, cache the list in a variable and reuse it for both filter operations to improve performance.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple indexes make records appear multiple times in result

2 participants