Skip to content

[BLOG] Concurrent vector graph construction in OpenSearch (via jVector) #3864

@sam-herman

Description

@sam-herman

Describe the blog post

How we created a unique capability in OpenSearch Vector Indexing Engine to leverage the full power of concurrency for graph index construction.
With the introduction of the latest version of the jVector plugin to OpenSearch we introduced concurrent graph index construction.
It leverages a lock free non-blocking mechanism for nearly perfect linear scalability when constructing vector graph index. Up until now, the concurrency of graph construction was inherently limited and capped by the number of concurrent Lucene segments created within each shard. However, this implies that we are paying a significant penalty by being forced to create multiple small batches with the construction of each one capped by a single thread. This not only makes for a slower ingestion, but makes us use smaller batches that later force more costly merges.

In this blog post we are going to describe the concurrent lock free architecture of jVector and how we leverage it in OpenSearch to accelerate vector ingestion in the KNN engine.
We will try to answer important questions, such as:

  1. How concurrent graph building is even possible?
  2. How is it done in jVector?
  3. Are there any implications on graph accuracy?
  4. What additional resource management are important when dealing with concurrent construction?

Expected Title

Concurrent vector graph construction in OpenSearch (via jVector)

Authors Name

Samuel Herman

Authors Email

[email protected]

Target Draft Date

07/18/2025

Blog Post Category

technical

Target Publication Date

No response

Additional Info

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions