Skip to content

Conversation

DNimmala5
Copy link

@DNimmala5 DNimmala5 commented Jun 26, 2025

Description

This code adds a Flat Index Builder, which is the first part of a larger effort to optimize remote index building in OpenSearch k-NN.

As part of my intern project, this component builds a FAISS IndexFlat locally in k-NN right after the vector data has been sent to the Remote Vector Index Builder (RVIB). A JNI function is used to pass all the vector data from the JVM to native memory, where the flat index is created. A pointer to the native index is returned to Java so it can later be used when reconstructing the full index.

This is a foundational step in reducing redundant network transfer between k-NN and RVIB. In the optimized flow, the vector data doesn't need to be sent back, which can cut transfer size by up to 70%.
This pull request introduces the ability to build a flat FAISS index (IndexFlat) directly from a set of vectors, along with comprehensive tests and integration into the OpenSearch KNN framework. The key changes include the addition of a new method for index creation, updates to JNI bindings, and integration into the Java layer for OpenSearch.

This is not a complete implementation, as 2 more components need to be added in order to complete the vector transfer size reduction. This draft PR of my PoC is being raised for review and automated testing.

Core Functionality: Building Flat FAISS Indexes

  • IndexService Implementation: Added a method buildFlatIndexFromVectors in IndexService to create flat FAISS indexes (IndexFlatL2 or IndexFlatIP) from vectors. This includes input validation and memory management.
  • JNI Bindings: Added JNI bindings in FaissService and JNIService to expose the buildFlatIndexFromVectors functionality to the Java layer. [1] [2] [3]

Testing Enhancements

  • Unit Tests: Added unit tests for buildFlatIndexFromVectors in faiss_index_service_test.cpp, covering successful index creation, input validation, and vector preservation.
  • JNI Tests: Added tests in faiss_wrapper_unit_test.cpp to validate JNI integration, including error handling for null vectors and invalid dimensions.

Integration with OpenSearch

  • Remote Index Build Strategy: Integrated buildFlatIndexFromVectors into RemoteIndexBuildStrategy for building flat indexes during remote index creation. This includes vector extraction and metric type determination. [1] [2]

Codebase Updates

  • Header and Source Updates: Updated headers (faiss_index_service.h, faiss_wrapper.h) and included necessary FAISS dependencies (IndexFlat) in source files. [1] [2] [3]
  • Documentation: Added detailed comments and JavaDoc for new methods to ensure clarity and maintainability. [1] [2]

Related Issues

Linked k-NN feature

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@DNimmala5
Copy link
Author

Sorry for the excessive commits, should have made a new clean branch with one commit. Will fix for next time.

Copy link
Contributor

@finnroblin finnroblin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few small comments. Thanks Dev!

}

String metricType = "L2";
Object spaceType = indexInfo.getParameters().get("space_type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for cosine similarity? I'm not sure if the space type is changed to IP or not in that case

jfloat* vectors = env->GetFloatArrayElements(vectorsJ, nullptr);
std::vector<float> cppVectors(vectors, vectors + totalLength);

faiss::MetricType metric = (strcmp(metricTypeC, "IP") == 0) ? faiss::METRIC_INNER_PRODUCT : faiss::METRIC_L2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a helper in jni_util for this?

throw new IllegalStateException("No vectors to index");
}

// First vector, need to access first before getting values needed for loop
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine into one loop?

index = new faiss::IndexFlatL2(dim);
}

index->add(numVectors, vectors.data());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to call add_with_ids?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - lets try to use existing mechanisms. You might need the id mapping during reconstruction.

}

String metricType = "L2";
Object spaceType = indexInfo.getParameters().get("space_type");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this logic to IndexUtil or reuse a similar method if present.

index = new faiss::IndexFlatL2(dim);
}

index->add(numVectors, vectors.data());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - lets try to use existing mechanisms. You might need the id mapping during reconstruction.

@jmazanec15
Copy link
Member

@kotwanikunal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants