Replies: 3 comments 3 replies
-
This sounds reasonable to me. Tbh this could be problematic for issues like clock skew, but I think we don't have a better way here, and a locally computed timestamp is likely sufficient for 99.9% use cases. As a part of the community guidelines, this is a spec change, I think we need to raise a dedicated voting thread. |
Beta Was this translation helpful? Give feedback.
-
+1 for adding them. For branch, if we are using it more heavily, I am a bit more worried if it could lead to transient issues if the branch dataset creation succeeds but branch content is lost. If we just atomically treat the creation of branch datasets as the source of truth and don't have branch content, it could avoid this issue, and we can directly store metadata inside the branch dataset itself. But that makes listing branches more complicated.
My personal preference is that we just use string string map, and you can develop custom structure if necessary. If you are storing a ton of info, it's likely better if you store it as a separated file and just store the pointer to that file in the metadata anyway. |
Beta Was this translation helpful? Give feedback.
-
|
Assuming we already have a metadata map in Below is a possible reasoning process I worked through with ChatGPT. This could be a dedicated discussion and seperated work after we introduce this metadata field. But this might be a long shot(and I have to admit this seems heavy) Our main goal should be to make Today, branch lifecycle changes can touch both:
That creates a few failure modes:
Proposed approachWe should treat
with values such as:
State transitionsCreate
Only branches in Delete
A branch in Retry / conflict handlingIf an operation sees an existing branch ref in For
This gives us a clean rule:
The same general model applies to Why this helpsThis solves the main lifecycle consistency problems:
Most importantly, this lets us treat CAS requirementThis design depends on CAS for correctness. We need CAS when updating an existing branch ref, otherwise a stale writer can overwrite a newer lifecycle transition. In particular, CAS is needed for:
Without CAS, SummaryThe proposal is to use This gives us a small but explicit control-plane model for create/delete, and a clear path to making |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Use Cases
Tags as memory checkpoints
We use tags as checkpoints for memory state.
That lets us:
checkoutandrestoreBranches for session and agent isolation
We use branches to isolate memory across:
This model is also promising for collaborative and multi-agent workflows.
Gaps We Ran Into
1. Tags do not expose timestamps
Tags currently do not include creation or update timestamps.
For checkpoint / backup workflows, timestamps are important because they let users understand when a backup was created without maintaining a separate mapping outside Lance.
Right now, we have to store that information elsewhere.
I proposed a change for this here: #6331
2. Branches and tags do not have a metadata field
In our case, we need to associate a branch with higher-level application concepts such as:
Today, that binding has to live outside Lance, which means the system ends up with two sources of truth:
A metadata field on branches and tags, even a simple custom string field, would make these references much easier to use in real applications.
I opened an issue for this here: #6337
I also explored an implementation in this PR: #6364
3.
branch_identifieris not exposed in Java and Python bindingsOn the Rust side,
BranchContentsincludes abranch_identifierfield, which is useful for understanding branch relationships.That information matters if we want to build product-level branch tooling, for example:
At the moment, this field is not exposed in the Java or Python bindings. That was an omission on my side when I originally added it.
I fixed that in this PR: #6360
Why This Matters
These features are not just format-level details. They affect whether branches and tags can serve as application-level reference primitives.
For agent memory systems, references become much more useful when they can carry:
Request
Beyond the Lance format itself, it would also be valuable to bring branch support, or more broadly the full reference mechanism, into LanceDB.
Beta Was this translation helpful? Give feedback.
All reactions