-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Is your feature request related to a problem? Please describe
For e-commerce search on Amazon, we have a custom search engine built on Lucene that leverages Lucene's powerful segment based replication. This has proven to be an excellent design choice for the high QPS search use-case of our service. Documents are indexed once and simply replicated across multiple searchers (replicas), we can physically isolate indexing+merging from search, support quick failovers and point in time restores etc. Basically, all the benefits that OpenSearch also gets with segment replication.
For our replication setup, our indexers publish periodic replication checkpoints to s3 every N
seconds. These checkpoints contain the new segments created since the last checkpoint, published in a single commit. Replicas periodically fetch the new checkpoints from s3 and refresh their searchers (using Lucene's SearcherManager
).
Like any distributed system, replication is prone to a number of failure modes, from network issues to misbehaving nodes. One such issue we've observed over the past few months, happens when we end up with very large checkpoints. These arise if there is a network glitch and we accumulate segments for >N
seconds before publishing a checkpoint, or if there was a burst in indexing traffic and suddenly we have a lot of docs indexed within the checkpoint window. When replicas refresh on these large checkpoints, they observe a big surge in page faults, which causes thrashing on the in-flight hot queries and degrades search performance.
To add resiliency against such incidents, we recently introduced "Adaptive Refresh" for searcher managers in Lucene. Instead of refreshing on the entire checkpoint in one fell swoop, this change allows searchers to intelligently process through ‘safe to refresh' commit points, and absorb the large checkpoint without excessive page faults.
It seems to me that segment replicated clusters in OpenSearch could also be made more resilient by integrating with Lucene's Adaptive Refresh. This RFC is to explore the path forward for adding this support to OpenSearch.
Links to Lucene Issue and PR:
- NRT replication should make it possible/easy to use bite-sized commits apache/lucene#14219
- Support adaptive refresh in Searcher Managers. apache/lucene#14443
Describe the solution you'd like
We've already merged changes in Lucene to support adaptive refresh in searcher managers (see apache/lucene#14443). It allows us now to define a RefreshCommitSupplier
that can select the best "safe" commit for searchers to refresh on. We would define this supplier in OpenSearch and use it within the searcher managers.
I believe we would need to modify OpenSearchReaderManager
to start supporting adaptive refresh. Might need to add similar support to OpenSearch from scratch, since it is a final class? Also, segment replication in OpenSearch likely has it's own set of nuances that we need to handle. Would like to hear from the community on whether this change makes sense, and what are some OpenSearch segment replication specific details that need attention.
Related component
Indexing:Replication
Describe alternatives you've considered
No response
Additional context
No response