-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
Description
Description
The parameter perThreadHardLimitMB cannot be larger than 2GB, which means a single thread cannot write segments larger than 2GB.
Refer: https://lucene.apache.org/core/9_9_0/core/org/apache/lucene/index/IndexWriterConfig.html#setRAMPerThreadHardLimitMB(int)
This issue proposes to make this parameter configurable above the 2GB limit, so that each thread can write a bigger segment.
When indexing high dimensional vector data, each segment has its own HNSW graph. So more segments mean more graphs to search per shard and more graph rebuild work during merges. With this change, a single indexing thread can flush fewer, and larger segments, which is generally more resource-efficient for vector-heavy workloads.