Skip to content

fix: branch_identfier unstable for legacy branches#6390

Open
majin1102 wants to merge 1 commit intomainfrom
codex/stable-synthetic-branch-identifier
Open

fix: branch_identfier unstable for legacy branches#6390
majin1102 wants to merge 1 commit intomainfrom
codex/stable-synthetic-branch-identifier

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

@majin1102 majin1102 commented Apr 2, 2026

Problem

Legacy branches, i.e. branches whose BranchContents were written without a persisted branch_identifier, currently deserialize through BranchIdentifier::none(). That fallback generates a fresh random UUID on each read, so the same unchanged branch can surface a different branch_identifier across repeated loads.

This makes branch identity unstable in both Python and Java for legacy datasets. On the Python side, branches.list() / branches_ordered() expose branch_identifier directly, so callers that diff, cache, or snapshot branch metadata can observe false changes even when the branch itself has not changed. On the Java side, the same legacy branch can also appear with a different identifier across refreshes, which makes equality-style comparisons unstable as well.

Summary

  • stabilize fallback branch identifiers for legacy branch metadata by replacing the missing-identifier sentinel with a deterministic synthetic UUID during branch metadata reads
  • keep the fallback logic localized to Rust branch metadata loading so Python and Java continues returning stable branch_identifier values without API shape changes
  • add a lightweight Rust regression test that exercises BranchContents::from_path on in-memory branch metadata and verifies stable repeated reads plus distinct identifiers for different branch names

@github-actions github-actions bot added bug Something isn't working python labels Apr 2, 2026
@majin1102 majin1102 changed the title fix: stabilize synthetic branch identifier fallback fix: branch_identfier unstable for legacy branches Apr 2, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 97.18310% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/refs.rs 97.18% 0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@majin1102 majin1102 marked this pull request as ready for review April 2, 2026 16:59
@majin1102 majin1102 added the java label Apr 3, 2026
@majin1102
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4095bcaff0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant