-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LuceneOnFaiss - Part1] Added building blocks for memory optimized search. #2581
Merged
0ctopus13prime
merged 1 commit into
opensearch-project:lucene-on-faiss
from
0ctopus13prime:lucene-on-faiss-part1
Mar 7, 2025
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
54 changes: 54 additions & 0 deletions
54
src/main/java/org/opensearch/knn/memoryoptsearch/VectorSearcher.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.memoryoptsearch; | ||
|
||
import org.apache.lucene.search.KnnCollector; | ||
import org.apache.lucene.util.Bits; | ||
|
||
import java.io.Closeable; | ||
import java.io.IOException; | ||
|
||
/** | ||
* This searcher performs vector search on non-Lucene index, for example FAISS index. | ||
* Two search APIs will be compatible with Lucene, taking {@link KnnCollector} and {@link Bits}. | ||
* In its implementation, it must collect top vectors that is similar to the given query. Make sure to transform the result to similarity | ||
* value if internally calculates distance between. | ||
*/ | ||
public interface VectorSearcher extends Closeable { | ||
/** | ||
* Return the k nearest neighbor documents as determined by comparison of their vector values for | ||
* this field, to the given vector, by the field's similarity function. The score of each document | ||
* is derived from the vector similarity in a way that ensures scores are positive and that a | ||
* larger score corresponds to a higher ranking. | ||
* | ||
* <p>The search is allowed to be approximate, meaning the results are not guaranteed to be the | ||
* true k closest neighbors. For large values of k (for example when k is close to the total | ||
* number of documents), the search may also retrieve fewer than k documents. | ||
* | ||
* @param target the vector-valued float vector query | ||
* @param knnCollector a KnnResults collector and relevant settings for gathering vector results | ||
* @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null} | ||
* if they are all allowed to match. | ||
*/ | ||
void search(float[] target, KnnCollector knnCollector, Bits acceptDocs) throws IOException; | ||
|
||
/** | ||
* Return the k nearest neighbor documents as determined by comparison of their vector values for | ||
* this field, to the given vector, by the field's similarity function. The score of each document | ||
* is derived from the vector similarity in a way that ensures scores are positive and that a | ||
* larger score corresponds to a higher ranking. | ||
* | ||
* <p>The search is allowed to be approximate, meaning the results are not guaranteed to be the | ||
* true k closest neighbors. For large values of k (for example when k is close to the total | ||
* number of documents), the search may also retrieve fewer than k documents. | ||
* | ||
* @param target the vector-valued byte vector query | ||
* @param knnCollector a KnnResults collector and relevant settings for gathering vector results | ||
* @param acceptDocs {@link Bits} that represents the allowed documents to match, or {@code null} | ||
* if they are all allowed to match. | ||
*/ | ||
void search(byte[] target, KnnCollector knnCollector, Bits acceptDocs) throws IOException; | ||
} |
26 changes: 26 additions & 0 deletions
26
src/main/java/org/opensearch/knn/memoryoptsearch/VectorSearcherFactory.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.memoryoptsearch; | ||
|
||
import org.apache.lucene.store.Directory; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* Factory to create {@link VectorSearcher}. | ||
* Provided parameters will have {@link Directory} and a file name where implementation can rely on it to open an input stream. | ||
*/ | ||
public interface VectorSearcherFactory { | ||
/** | ||
* Create a non-null {@link VectorSearcher} with given Lucene's {@link Directory}. | ||
* | ||
* @param directory Lucene's Directory. | ||
* @param fileName Logical file name to load. | ||
* @return It must return a non-null {@link VectorSearcher} | ||
* @throws IOException | ||
*/ | ||
VectorSearcher createVectorSearcher(Directory directory, String fileName) throws IOException; | ||
} |
41 changes: 41 additions & 0 deletions
41
src/main/java/org/opensearch/knn/memoryoptsearch/faiss/FaissMemoryOptimizedSearcher.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.memoryoptsearch.faiss; | ||
|
||
import org.apache.lucene.search.KnnCollector; | ||
import org.apache.lucene.store.IndexInput; | ||
import org.apache.lucene.util.Bits; | ||
import org.opensearch.knn.memoryoptsearch.VectorSearcher; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* This searcher directly reads FAISS index file via the provided {@link IndexInput} then perform vector search on it. | ||
*/ | ||
public class FaissMemoryOptimizedSearcher implements VectorSearcher { | ||
private final IndexInput indexInput; | ||
|
||
public FaissMemoryOptimizedSearcher(IndexInput indexInput) { | ||
this.indexInput = indexInput; | ||
} | ||
|
||
@Override | ||
public void search(float[] target, KnnCollector knnCollector, Bits acceptDocs) throws IOException { | ||
// TODO(KDY) : This will be covered in subsequent parts. | ||
throw new UnsupportedOperationException("Not implemented yet"); | ||
} | ||
|
||
@Override | ||
public void search(byte[] target, KnnCollector knnCollector, Bits acceptDocs) throws IOException { | ||
// TODO(KDY) : This will be covered in subsequent parts. | ||
throw new UnsupportedOperationException("Not implemented yet"); | ||
} | ||
|
||
@Override | ||
public void close() throws IOException { | ||
indexInput.close(); | ||
} | ||
} |
32 changes: 32 additions & 0 deletions
32
...in/java/org/opensearch/knn/memoryoptsearch/faiss/FaissMemoryOptimizedSearcherFactory.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
/* | ||
* Copyright OpenSearch Contributors | ||
* SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package org.opensearch.knn.memoryoptsearch.faiss; | ||
|
||
import org.apache.lucene.store.Directory; | ||
import org.apache.lucene.store.IOContext; | ||
import org.apache.lucene.store.IndexInput; | ||
import org.apache.lucene.store.ReadAdvice; | ||
import org.opensearch.knn.memoryoptsearch.VectorSearcher; | ||
import org.opensearch.knn.memoryoptsearch.VectorSearcherFactory; | ||
|
||
import java.io.IOException; | ||
|
||
/** | ||
* This factory returns {@link VectorSearcher} that performs vector search directly on FAISS index. | ||
* Note that we pass `RANDOM` as advice to prevent the underlying storage from performing read-ahead. Since vector search naturally accesses | ||
* random vector locations, read-ahead does not improve performance. By passing the `RANDOM` context, we explicitly indicate that | ||
* this searcher will access vectors randomly. | ||
*/ | ||
public class FaissMemoryOptimizedSearcherFactory implements VectorSearcherFactory { | ||
@Override | ||
public VectorSearcher createVectorSearcher(final Directory directory, final String fileName) throws IOException { | ||
final IndexInput indexInput = directory.openInput( | ||
fileName, | ||
new IOContext(IOContext.Context.DEFAULT, null, null, ReadAdvice.RANDOM) | ||
); | ||
return new FaissMemoryOptimizedSearcher(indexInput); | ||
} | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not log4j?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slf4j is a general logging framework without tied dependency over a specific logger framework like Log4j, Logback etc. It's a facade for different logging frameworks. When there's a log framework upgrade or change, it won't need any changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let just use what we are using in the plugin to avoid conflicts for future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it conflicts? opensearch core is using slf4j with log4j
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I say conflict, I mean mainly if someone trying add a logger they will be confused whether to use slf4j or log4j. So people will have conflicts in mind what to pick. It mainly about consistency in the code. There is no specific reason from my side, for me its all about consistency in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will update in the next rev.
before raising a new PR, could you share your thoughts on the code?
If you leave comments, I will factor them into the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should start using Slf4j and clean code by replacing log4j to Slf4j !! There are couple of benefits we have 1. We are aligning to core and Lucene , Lucene also uses Slf4j , and in future either if core replaces that with any other inline framework !! We get for free !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case it should a be part of a separate GH and not scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update the annotation in the next rev! :)