Skip to content

fix(compaction): skip collect rowids for unindexed fragment or stable row ids#6495

Open
zhangyue19921010 wants to merge 5 commits intolance-format:mainfrom
zhangyue19921010:compaction_skip_collect
Open

fix(compaction): skip collect rowids for unindexed fragment or stable row ids#6495
zhangyue19921010 wants to merge 5 commits intolance-format:mainfrom
zhangyue19921010:compaction_skip_collect

Conversation

@zhangyue19921010
Copy link
Copy Markdown
Contributor

@zhangyue19921010 zhangyue19921010 commented Apr 13, 2026

In compaction, collecting and transferring fragment rowids / row_addrs is expensive.

If the fragments in a compaction task do not belong to any index, then this data does not require immediate remapping

therefore does not require collecting the old rowids

@github-actions github-actions bot added bug Something isn't working java labels Apr 13, 2026
zhangyue19921010 added 2 commits April 13, 2026 23:33
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 96.21212% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/optimize.rs 96.21% 3 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Comment on lines +1135 to +1137
if has_indexed_fragments {
return Ok(true);
}
Copy link
Copy Markdown
Contributor

@pratik0316 pratik0316 Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I am missing some context, just curious why do we always need to capture the row_addr column in case the fragments have an index?

Copy link
Copy Markdown
Contributor Author

@zhangyue19921010 zhangyue19921010 Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indexed fragments need row addr capture because compaction changes the physical addresses that existing indexes point to(unstable row IDs).To keep the index valid, compaction needs a mapping() from old addresses to new addresses. The captured _rowaddr column gives the old addresses of the rows that survived the rewrite.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clear! thanks for the explanation @zhangyue19921010

@zhangyue19921010 zhangyue19921010 changed the title fix(compaction): skip collect rowids for unindexed fragment fix(compaction): skip collect rowids for unindexed fragment or stable row ids Apr 14, 2026
@zhangyue19921010
Copy link
Copy Markdown
Contributor Author

Hi @Xuanwo, would you mind taking a look? I’d really appreciate your help. Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants