-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Bump floor segment size to 16MB. #14189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB (`LogByteSizeMergePolicy`) to 16MB in Lucene 11. My motivation is that such small segment sizes don't make index structures actually helpful vs. linear scans, so we should avoid them. Furthermore, there has been progress on merging rules for segments below the floor size, in particular merge policies no longer perform quadratic merging (apache#900) so this change will not make indexing/merging absurdly slow if an application flushes tiny segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could make this change in 10.x too? It is not really a bwc break, improving merge selection...
Thanks for the feedback, I was hesitating. Let's pull this in 10.2 then. |
For reference, this is roughly a 10x increase of the floor segment size, so given that |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
That's a good point @jpountz. Reading it initially, I thought reduction will be much more, but this does make sense. Gets rid of the 2MB segments in the bottom most tier! |
Any reason this PR is not merged and got marked as stale? |
None other than me being a bit anxious about side-effects, e.g. this floor segment size also affects the behavior of merge-on-full-flush. But it shouldn't be a big deal. I'll merge shortly! |
This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB (`LogByteSizeMergePolicy`) to 16MB. My motivation is that such small segment sizes don't make index structures actually helpful vs. linear scans, so we should avoid them. Furthermore, there has been progress on merging rules for segments below the floor size, in particular merge policies no longer perform quadratic merging (#900) so this change will not make indexing/merging absurdly slow if an application flushes tiny segments.
This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB (`LogByteSizeMergePolicy`) to 16MB. My motivation is that such small segment sizes don't make index structures actually helpful vs. linear scans, so we should avoid them. Furthermore, there has been progress on merging rules for segments below the floor size, in particular merge policies no longer perform quadratic merging (apache#900) so this change will not make indexing/merging absurdly slow if an application flushes tiny segments.
This bumps the floor segment size from 2MB (
TieredMergePolicy
) / 1.6MB (LogByteSizeMergePolicy
) to 16MB.My motivation is that such small segment sizes don't make index structures actually helpful vs. linear scans, so we should avoid them. Furthermore, there has been progress on merging rules for segments below the floor size, in particular merge policies no longer perform quadratic merging (#900) so this change will not make indexing/merging absurdly slow if an application flushes tiny segments.