Skip to content

perf: fix O(N²) version column scan with deletion vectors#6375

Merged
BubbleCal merged 4 commits intolance-format:mainfrom
pengw0048:fix/version-column-perf-with-deletions
Apr 2, 2026
Merged

perf: fix O(N²) version column scan with deletion vectors#6375
BubbleCal merged 4 commits intolance-format:mainfrom
pengw0048:fix/version-column-perf-with-deletions

Conversation

@pengw0048
Copy link
Copy Markdown
Contributor

Problem

Scanning _row_created_at_version or _row_last_updated_at_version is extremely slow on fragments with deletion vectors — 53 seconds for 1M rows (vs 0.03s for a regular data column on the same table).

This makes the delta() API (get_inserted_rows() / get_updated_rows()) unusable on any table that has had deletions.

Root Cause

In apply_row_id_and_deletes() (lance-table/src/utils/stream.rs), the version column is built by:

sequence.versions()
    .skip(r.start as usize)
    .take((r.end - r.start) as usize)

versions() creates a fresh iterator from the start of the RLE sequence. For each range in the selection, skip(r.start) walks through all prior elements — O(rows) per range. With deletion vectors creating many small ranges, this becomes O(rows × ranges).

Fix

Replace with version_values_for_selection() that:

  • Fast-paths single-run fragments (common case: all rows same version) with vec![version; count] — O(1)
  • Binary search over precomputed run offsets for O(log(runs)) per position in the multi-run case

Benchmark

1M rows, 33% deleted, _row_created_at_version scan:

Before After
Time 53s 0.02s
Complexity O(rows × ranges) O(rows × log(runs))

Minimal Reproduction

import lance, pyarrow as pa, numpy as np, time

uri = "/tmp/test_version_col_perf.lance"
lance.write_dataset(
    pa.table({"val": np.random.randint(0, 1000, 1_000_000)}),
    uri, mode="overwrite", enable_stable_row_ids=True,
)
ds = lance.dataset(uri)
ds.delete("val < 333")
ds = lance.dataset(uri)

t0 = time.time()
ds.scanner(columns=["_row_created_at_version"]).to_table()
print(f"_row_created_at_version with deletions: {time.time()-t0:.2f}s")

t0 = time.time()
ds.scanner(columns=["val"]).to_table()
print(f"normal column with deletions: {time.time()-t0:.2f}s")

When scanning `_row_created_at_version` or `_row_last_updated_at_version`
on fragments with deletion vectors, the previous implementation called
`sequence.versions().skip(r.start).take(...)` for each range in the
selection. Since `versions()` creates a fresh iterator from the start,
`skip(r.start)` walks through all prior elements — O(rows) per range,
leading to O(rows × ranges) total work.

Replace with `version_values_for_selection()` that:
- Fast-paths single-run fragments (common case) with `vec![version; count]`
- Uses binary search over precomputed run offsets for O(log(runs)) per
  position in the multi-run case

Benchmark (1M rows, 33% deleted):
- Before: 53s
- After: 0.02s
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 96.17834% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-table/src/utils/stream.rs 96.17% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@pengw0048 pengw0048 force-pushed the fix/version-column-perf-with-deletions branch from 7181475 to 27c9ce3 Compare April 2, 2026 04:40
@pengw0048 pengw0048 marked this pull request as ready for review April 2, 2026 04:41
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great improvements!

for pos in r.start..r.end {
let pos = pos as usize;
if pos >= total_len {
versions.push(1); // fallback
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we will get an error at https://github.com/lance-format/lance/pull/6375/changes#diff-e3ed3f2ff36ff45b28ec19c75b4b9fadcfd857e2ced681ceb22b07fdc4e705b4R360 if we don't have enough rows, but this would hide the error

continue;
}
// Binary search for the run containing this position
let run_idx = match run_offsets.binary_search(&pos) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need one more test to cover this path

…n test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pengw0048 pengw0048 requested a review from BubbleCal April 2, 2026 13:43
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go once we have all greens

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pengw0048
Copy link
Copy Markdown
Contributor Author

@BubbleCal could you approve the workflows again? Fixed the formatting and the windows test failure seemed unrelated.

@BubbleCal BubbleCal merged commit 8bd9b90 into lance-format:main Apr 2, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants