Skip to content

fix: return error instead of OOM on corrupt binary page offsets#6392

Open
westonpace wants to merge 2 commits intolance-format:mainfrom
westonpace:fix/binary-page-corruption-oom
Open

fix: return error instead of OOM on corrupt binary page offsets#6392
westonpace wants to merge 2 commits intolance-format:mainfrom
westonpace:fix/binary-page-corruption-oom

Conversation

@westonpace
Copy link
Copy Markdown
Member

Summary

  • When reading a corrupt lance file where binary page offsets are non-monotonic (e.g. due to multipart upload part reordering), the subtraction val - prev in IndicesNormalizer::extend would underflow, producing a huge value used as a byte-range length, triggering an OOM allocation that kills the process.
  • Use checked_sub to detect the underflow and return a descriptive error message instead, indicating the file data has been corrupted.

Context

This was discovered investigating a real-world corruption where two adjacent 5MB multipart upload parts were swapped when writing to an S3 Express directory bucket. The swapped parts caused the binary (string) column's cumulative offset indices to be non-monotonic at the part boundary, leading to the underflow.

Test plan

  • Verified the fix compiles cleanly (cargo check -p lance-encoding)
  • Verified against a real corrupt lance file: returns a clear error instead of OOM
  • Existing tests pass

🤖 Generated with Claude Code

When reading a corrupt lance file where binary page offsets are
non-monotonic (e.g. due to multipart upload part reordering), the
subtraction `val - prev` would underflow, producing a huge value used
as a byte-range length and triggering an OOM allocation.

Use checked_sub to detect the underflow and return a descriptive error
instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added the bug Something isn't working label Apr 2, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 41.66667% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...encoding/src/previous/encodings/physical/binary.rs 41.66% 6 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant