Making Lance branches and tags more usable for agent memory #6361

majin1102 · 2026-03-31T13:08:58Z

majin1102
Mar 31, 2026
Collaborator

Use Cases

Tags as memory checkpoints

We use tags as checkpoints for memory state.

That lets us:

preserve known-good snapshots
restore earlier memory states when needed
support time-travel style debugging with checkout and restore

Branches for session and agent isolation

We use branches to isolate memory across:

different sessions of the same agent
multiple agents within the same deployment
experimental or temporary memory states

This model is also promising for collaborative and multi-agent workflows.

Gaps We Ran Into

1. Tags do not expose timestamps

Tags currently do not include creation or update timestamps.

For checkpoint / backup workflows, timestamps are important because they let users understand when a backup was created without maintaining a separate mapping outside Lance.

Right now, we have to store that information elsewhere.

I proposed a change for this here: #6331

2. Branches and tags do not have a metadata field

In our case, we need to associate a branch with higher-level application concepts such as:

session ID
agent ID
workflow purpose

Today, that binding has to live outside Lance, which means the system ends up with two sources of truth:

the actual branch / tag
an external metadata store

A metadata field on branches and tags, even a simple custom string field, would make these references much easier to use in real applications.

I opened an issue for this here: #6337
I also explored an implementation in this PR: #6364

3. `branch_identifier` is not exposed in Java and Python bindings

On the Rust side, BranchContents includes a branch_identifier field, which is useful for understanding branch relationships.

That information matters if we want to build product-level branch tooling, for example:

branch lineage views
graph-style inspection

At the moment, this field is not exposed in the Java or Python bindings. That was an omission on my side when I originally added it.

I fixed that in this PR: #6360

Why This Matters

These features are not just format-level details. They affect whether branches and tags can serve as application-level reference primitives.

For agent memory systems, references become much more useful when they can carry:

time information
lightweight metadata
branch relationship information across bindings

Request

Beyond the Lance format itself, it would also be valuable to bring branch support, or more broadly the full reference mechanism, into LanceDB.

jackye1995 · 2026-04-01T04:43:38Z

jackye1995
Apr 1, 2026
Maintainer

For checkpoint / backup workflows, timestamps are important because they let users understand when a backup was created without maintaining a separate mapping outside Lance.
Right now, we have to store that information elsewhere.

This sounds reasonable to me. Tbh this could be problematic for issues like clock skew, but I think we don't have a better way here, and a locally computed timestamp is likely sufficient for 99.9% use cases. As a part of the community guidelines, this is a spec change, I think we need to raise a dedicated voting thread.

0 replies

jackye1995 · 2026-04-01T04:50:29Z

jackye1995
Apr 1, 2026
Maintainer

Branches and tags do not have a metadata field

+1 for adding them.

For branch, if we are using it more heavily, I am a bit more worried if it could lead to transient issues if the branch dataset creation succeeds but branch content is lost. If we just atomically treat the creation of branch datasets as the source of truth and don't have branch content, it could avoid this issue, and we can directly store metadata inside the branch dataset itself. But that makes listing branches more complicated.

We could make this metadata structured, for example as JSON, so it can hold richer metadata instead of just plain text.

My personal preference is that we just use string string map, and you can develop custom structure if necessary. If you are storing a ton of info, it's likely better if you store it as a separated file and just store the pointer to that file in the metadata anyway.

2 replies

majin1102 Apr 1, 2026
Collaborator Author

My personal preference is that we just use string string map

+1

majin1102 Apr 1, 2026
Collaborator Author

I am a bit more worried if it could lead to transient issues if the branch dataset creation succeeds but branch content is lost. If we just atomically treat the creation of branch datasets as the source of truth and don't have branch content, it could avoid this issue, and we can directly store metadata inside the branch dataset itself

For our memory use case, we need to resolve the correct branch before opening a table or dataset, based on identifiers like session key or agent ID.

Because of that, the table metadata inside of manifest might not be a natural fit for the dataset manifest alone. If the metadata is only available after opening a branch, then the application would have to inspect branches first just to figure out which branch to open.

I think the BranchContents is somehow necessary as an index before we open the real branch dataset(This is related if we shall use UUID as the real path of branches as metioned in #5009). But I agree that the we need some more mechenisms to make things consistency.

majin1102 · 2026-04-01T10:07:48Z

majin1102
Apr 1, 2026
Collaborator Author

Assuming we already have a metadata map in BranchContents, I’d like to explore ways to make branch metadata more consistent, ideally without introducing complex logic or extra coordination mechanisms unless they are absolutely required.

Below is a possible reasoning process I worked through with ChatGPT. This could be a dedicated discussion and seperated work after we introduce this metadata field. But this might be a long shot(and I have to admit this seems heavy)

Our main goal should be to make BranchContents the single source of truth for the full branch lifecycle.

Today, branch lifecycle changes can touch both:

branch data under tree/<branch>
branch metadata in _refs/branches/<branch>.json

That creates a few failure modes:

a branch directory can exist without valid branch metadata
branch metadata can exist even though branch creation did not finish
branch deletion can be partially applied across metadata and data
retries cannot tell whether they are continuing the same operation or racing with a different one
the final outcome may depend on timing instead of an explicit branch-level state machine

Proposed approach

We should treat BranchContents as the control-plane source of truth, and use the existing branch metadata map to reserve one internal key:

__lance_state

with values such as:

creating
ready
deleting

State transitions

Create

Resolve the requested source snapshot
Write branch ref with:
- __lance_state=creating
Materialize tree/<branch>
CAS-update the ref to __lance_state=ready

Only branches in ready are visible to normal branch APIs.

Delete

Read the branch ref
CAS-update it from ready to deleting
Remove tree/<branch>
Finalize removal of the ref

A branch in deleting is no longer visible to normal branch APIs.

Retry / conflict handling

If an operation sees an existing branch ref in creating or deleting, it should use the ref state to decide whether to retry, take over, or fail.

For creating:

Compare the requested source against the existing BranchContents
If they differ, return conflict
If they match, treat it as a retry of the same branch creation
If the attempt is stale based on create_at, allow takeover
Takeover must CAS-update the ref before continuing

This gives us a clean rule:

same branch name + same source snapshot => retry is allowed
same branch name + different source snapshot => conflict

The same general model applies to deleting: retries and recovery should be driven by the ref state, not inferred from directory presence alone.

Why this helps

This solves the main lifecycle consistency problems:

branch visibility no longer depends on partially written data under tree/<branch>
incomplete create/delete operations are represented explicitly in the ref
retries become well-defined
branch identity no longer depends on timing
create and delete follow the same control-plane model

Most importantly, this lets us treat BranchContents as the single source of truth for branch lifecycle state.

CAS requirement

This design depends on CAS for correctness.

We need CAS when updating an existing branch ref, otherwise a stale writer can overwrite a newer lifecycle transition. In particular, CAS is needed for:

creating -> ready
retry/takeover of an existing creating branch
ready -> deleting
any future mutation of an existing branch ref

Without CAS, BranchContents cannot safely act as the single source of truth under concurrency.

Summary

The proposal is to use BranchContents itself as the branch lifecycle state machine, with a reserved internal metadata key such as __lance_state to track lifecycle state.

This gives us a small but explicit control-plane model for create/delete, and a clear path to making BranchContents the single source of truth, as long as ref updates are protected by CAS.

1 reply

jackye1995 Apr 1, 2026
Maintainer

I like that, except for the fact that we don't want to depend on put-if-match-etag feature in object store. If we just depend on put-if-exists + versioning trick + listing the latest, essentially like how the table manifest works, would that be acceptable perf wise?

I feel this pattern is common enough that we could just centralize the logic somehow, and also add the latest hint optimization so it applies to all the same architecture.

Making Lance branches and tags more usable for agent memory #6361

Uh oh!

Uh oh!

majin1102 Mar 31, 2026 Collaborator

Use Cases

Tags as memory checkpoints

Branches for session and agent isolation

Gaps We Ran Into

1. Tags do not expose timestamps

2. Branches and tags do not have a metadata field

3. branch_identifier is not exposed in Java and Python bindings

Why This Matters

Request

Replies: 3 comments · 3 replies

Uh oh!

jackye1995 Apr 1, 2026 Maintainer

Uh oh!

jackye1995 Apr 1, 2026 Maintainer

Uh oh!

Uh oh!

majin1102 Apr 1, 2026 Collaborator Author

Uh oh!

Uh oh!

majin1102 Apr 1, 2026 Collaborator Author

Uh oh!

Uh oh!

majin1102 Apr 1, 2026 Collaborator Author

Proposed approach

State transitions

Create

Delete

Retry / conflict handling

Why this helps

CAS requirement

Summary

Uh oh!

Uh oh!

jackye1995 Apr 1, 2026 Maintainer

majin1102
Mar 31, 2026
Collaborator

3. `branch_identifier` is not exposed in Java and Python bindings

Replies: 3 comments 3 replies

jackye1995
Apr 1, 2026
Maintainer

jackye1995
Apr 1, 2026
Maintainer

majin1102 Apr 1, 2026
Collaborator Author

majin1102 Apr 1, 2026
Collaborator Author

majin1102
Apr 1, 2026
Collaborator Author

jackye1995 Apr 1, 2026
Maintainer