Skip to content

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Feb 1, 2025

This bumps the floor segment size from 2MB (TieredMergePolicy) / 1.6MB (LogByteSizeMergePolicy) to 16MB.

My motivation is that such small segment sizes don't make index structures actually helpful vs. linear scans, so we should avoid them. Furthermore, there has been progress on merging rules for segments below the floor size, in particular merge policies no longer perform quadratic merging (#900) so this change will not make indexing/merging absurdly slow if an application flushes tiny segments.

This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB
(`LogByteSizeMergePolicy`) to 16MB in Lucene 11.

My motivation is that such small segment sizes don't make index
structures actually helpful vs. linear scans, so we should avoid them.
Furthermore, there has been progress on merging rules for segments below
the floor size, in particular merge policies no longer perform quadratic
merging (apache#900) so this change will not make indexing/merging absurdly
slow if an application flushes tiny segments.
@jpountz jpountz added this to the 11.0.0 milestone Feb 1, 2025
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could make this change in 10.x too? It is not really a bwc break, improving merge selection...

@jpountz
Copy link
Contributor Author

jpountz commented Feb 3, 2025

Thanks for the feedback, I was hesitating. Let's pull this in 10.2 then.

@jpountz
Copy link
Contributor Author

jpountz commented Feb 3, 2025

For reference, this is roughly a 10x increase of the floor segment size, so given that TieredMergePolicy defaults to 10 segments per tier, indexes should have about 10 fewer segments after this change.

Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Feb 18, 2025
@jainankitk
Copy link
Contributor

For reference, this is roughly a 10x increase of the floor segment size, so given that TieredMergePolicy defaults to 10 segments per tier, indexes should have about 10 fewer segments after this change.

That's a good point @jpountz. Reading it initially, I thought reduction will be much more, but this does make sense. Gets rid of the 2MB segments in the bottom most tier!

@jainankitk
Copy link
Contributor

Any reason this PR is not merged and got marked as stale?

@jpountz
Copy link
Contributor Author

jpountz commented Feb 26, 2025

None other than me being a bit anxious about side-effects, e.g. this floor segment size also affects the behavior of merge-on-full-flush. But it shouldn't be a big deal. I'll merge shortly!

@github-actions github-actions bot removed the Stale label Feb 27, 2025
@jpountz jpountz merged commit f28a8ae into apache:main Feb 27, 2025
6 checks passed
@jpountz jpountz deleted the bump_floor_seg_size branch February 27, 2025 20:28
jpountz added a commit that referenced this pull request Feb 27, 2025
This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB
(`LogByteSizeMergePolicy`) to 16MB.

My motivation is that such small segment sizes don't make index
structures actually helpful vs. linear scans, so we should avoid them.
Furthermore, there has been progress on merging rules for segments below
the floor size, in particular merge policies no longer perform quadratic
merging (#900) so this change will not make indexing/merging absurdly
slow if an application flushes tiny segments.
hanbj pushed a commit to hanbj/lucene that referenced this pull request Mar 12, 2025
This bumps the floor segment size from 2MB (`TieredMergePolicy`) / 1.6MB
(`LogByteSizeMergePolicy`) to 16MB.

My motivation is that such small segment sizes don't make index
structures actually helpful vs. linear scans, so we should avoid them.
Furthermore, there has been progress on merging rules for segments below
the floor size, in particular merge policies no longer perform quadratic
merging (apache#900) so this change will not make indexing/merging absurdly
slow if an application flushes tiny segments.
@jpountz jpountz modified the milestones: 11.0.0, 10.2.0 Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants