Formalizing the ManifestStore Concept #5849

jackye1995 · 2026-01-29T06:28:46Z

jackye1995
Jan 29, 2026
Maintainer

Background

Lance currently has an ExternalManifestStore trait defined in lance-table/src/io/commit/external_manifest.rs that provides an abstraction for external storage with put-if-not-exists semantics. This trait is used in the commit path to coordinate concurrent writes, ensuring only one writer wins when multiple transactions target the same version.

The current trait definition:

#[async_trait]
pub trait ExternalManifestStore: std::fmt::Debug + Send + Sync {
    /// Get the manifest path for a given base_uri and version
    async fn get(&self, base_uri: &str, version: u64) -> Result<String>;

    /// Get the latest version of a dataset at the base_uri
    async fn get_latest_version(&self, base_uri: &str) -> Result<Option<(u64, String)>>;

    /// Put the manifest path if the version does not already exist
    async fn put_if_not_exists(&self, base_uri: &str, version: u64, path: &str, size: u64, e_tag: Option<String>) -> Result<()>;

    /// Put the manifest path if the version already exists (for finalization)
    async fn put_if_exists(&self, base_uri: &str, version: u64, path: &str, size: u64, e_tag: Option<String>) -> Result<()>;

    /// Delete the manifest information for given base_uri
    async fn delete(&self, _base_uri: &str) -> Result<()>;
}

Today, we have a single production implementation using DynamoDB (DynamoDBExternalManifestStore), which is accessed via the s3+ddb:// URI scheme.

This is also a part of our table spec: https://lance.org/format/table/transaction/#external-manifest-store, but we had a few discussions about potentially deprecating it, and I even proposed for example manifest scheme v3 to allow DynamoDB-based solution to move to storage-only manifest listing without performance regression on S3 Express.

Motivation

Over the past few months, I have identified three use cases that would benefit from not removing this concept, but actually formalizing this concept as a first-class ManifestStore abstraction, and I would like to share with the community:

1. Supporting Custom Manifest Store Implementations

Context: The Tencent Cloud Object Storage team has reached out with a requirement. Their PUT-IF-NOT-EXISTS feature (conditional writes) is only available for newly created buckets. Existing bucket users cannot leverage this feature, so they would like to implement a custom solution using the external manifest store pattern.

This is likely a common scenario for other cloud providers or self-hosted object storage systems that don't support atomic conditional operations. Having a well-documented, stable ManifestStore interface would enable these users to plug in their own coordination mechanism.

2. Multi-Table Transactions in Directory Namespace

Context: In the Directory Namespace implementation (particularly the partitioned namespace proposal), we've explored how to atomically commit changes to multiple tables (partitions) simultaneously.

The Directory Namespace already uses a __manifest Lance table to track table metadata (name, location, version, etc.). This is essentially a ManifestStore! With features like MergeInsert and unenforced primary key deduplication, we can extend this pattern to atomically increment version numbers for multiple tables in a single operation.

A formalized ManifestStore concept could provide:

CreateTableVersion(table_uri, version, manifest_path) - single table commit
CreateTableVersions([(table_uri_1, version_1, manifest_path_1), ...]) - batch/atomic multi-table commit that translates to a MergeInsert

This would enable use cases like:

Atomic partition updates in partitioned tables
Coordinated schema evolution across related tables
Transactional writes to multiple tables

3. Catalog Integration (e.g., Polaris)

Context: For better ecosystem integration with catalog systems (Iceberg REST Catalog / Polaris, Unity Catalog, etc.), I have been exploring how catalogs can intercept and coordinate commit requests.

The idea is that a catalog could expose APIs like:

ListTableVersions(table_id)
GetTableVersion(table_id, version)
CreateTableVersion(table_id, version, manifest_location)

In this model, the catalog essentially acts as a ManifestStore for Lance tables, enabling:

Centralized version control and conflict resolution
Integration with catalog-level transactions
Unified governance and access control
Cross-format coordination (e.g., when a single catalog manages both Iceberg and Lance tables)

Alternative considered: I also explored adopting a catalog-aware commitTable approach similar to Iceberg, where Lance would use non-incremental metadata pointers (UUIDs) instead of incremental version numbers. More details at #5229. However, this approach was concluded to be non-ideal for the following reasons:

Diverged commit paths: It would create two fundamentally different commit mechanisms - one for catalog-managed tables and one for standalone tables. This increases complexity and maintenance burden.
Transaction conflict resolution: Lance's conflict resolution relies on being able to list and compare older versions. Having random latest manifest pointer does not work.
Version semantics: Lance's incremental version numbers provide clear semantics for time-travel, rollback, and version comparison. Non-incremental pointers would require additional metadata to maintain these semantics. And as we can see in Iceberg, this results in inflated manifest (metadata file) size, which is what we want to avoid.

The ManifestStore approach preserves Lance's version semantics while still allowing catalogs to intercept and coordinate commits.

Proposal

In short, I propose we introduce a formal ManifestStore trait. We can at least provide 2 implementations: DirectoryManifestStore and DynamodbManifestStore, and there can be other implementations like TencentCosManifestStore, PolarisManifestStore.

I have not thought too much about how to make it truly pluggable yet. It feels like it falls into the same bucket of either we add all dependencies and implementations in rust, or we need some sort of C interface.

Curious what people think about this!

westonpace · 2026-01-29T13:32:53Z

westonpace
Jan 29, 2026
Maintainer

What's the relationship between ManifestStore and CommitHandler? Is CommitHandler already a formal concept? What is DirectoryManifestStore?

My understanding is...

The base trait is CommitHandler which is "something that can perform put-if-not-exists on a manifest and, when done, the manifest is in object storage".
The ExternalManifestStore is for object storage that cannot perform put-if-not-exists. In this case we atomically store a manifest pointer elsewhere (e.g. dynamodb) and then non-atomically store the manifest in object storage.
There is an ExternalCommitHandler which implements the CommitHandler trait given an ExternalManifestStore. This relies on the external store's ability to atomically store the manifest pointer to implement the CommitHandler trait without relying on atomic primtives from object storage.

So maybe my question is:

If a user is using a ManifestStore do we still store the manifest in object storage? Or is the ManifestStore replacing the object storage for the manifest? For example, with dynamodb, we never actually put the manifest itself into dynamodb, we only store pointers to object storage in dynamodb.

If I look at your trait I think the answer is "the ManifestStore only stores pointers to the manifest" but I want to double check.

6 replies

westonpace Feb 3, 2026
Maintainer

This helps to clarify. If I understand then the goal then would also need to include changing the rust implementation to use the new manifest store trait instead of the commit handler trait. We would be replacing the commit handler with the manifest store. The commit handler trait could then go away. Does that sound correct?

ztorchan Feb 4, 2026

Yes, that is what i think.

jackye1995 Feb 12, 2026
Maintainer Author

Thank you so much @ztorchan for explaining this! Sorry I was debating with myself too long for this and was hesitated to post a response.

@ztorchan explained very well what I was thinking when I wrote it. I feel a unified interface could be beneficial instead of having 2 paths today that distinguishes the storage-based approach vs an approach that treats external systems as a second class citizen.

However, my personal struggle against myself is that if we do a generalized manifest store approach, then we lose a lot of good storage-only properties, especially about portability. I was trying to scope out how it would actually look like, and I ended up with just another interface like Iceberg Catalog that is very heavy with discovery and commit logic. I think we have achieved a good balance so far with the current storage-only dataset + Lance Namespace that does very simple basic operations and can easily provide implementation through almost any system that supports creating/removing/getting/listing objects. If we move to a generic manifest store, we basically lose that.

At this point, my mind is more leaning towards keeping external manifest store concept. When I was writing the proposal, I kind of mixed my idea to refactor a manifest store with the user requirements. Technically, external manifest store should be sufficient for satisfying the use cases I have described. We can start with just adding additional implementations of the ExternalManifestStore trait in rust, and see if there are additional need to develop other bindings. We might want to do a few more improvements, for example, we should define concepts of a scheme for the external manifest store, so that user can do for example s3+ns:// to run a Namespace-backed external manifest store.

What do we think? @westonpace @ztorchan

westonpace Feb 13, 2026
Maintainer

I ended up with just another interface like Iceberg Catalog that is very heavy with discovery and commit logic.

I think it would help me to understand more what this means? What does it mean to be heavy with discovery and commit logic?

I think we have achieved a good balance so far with the current storage-only dataset + Lance Namespace that does very simple basic operations and can easily provide implementation through almost any system that supports creating/removing/getting/listing objects. If we move to a generic manifest store, we basically lose that.

Are you saying the current design forces external stores to be implemented in such a way that they can easily support different filesystems? Is the concern here than a unified interface would make it too easy for an implementation to tie itself into the details of one specific storage backend?

jackye1995 Feb 20, 2026
Maintainer Author

See my final implementation in #5968 for more details.

What does it mean to be heavy with discovery and commit logic?

I mean you basically have a different way to commit the dataset for each implementation, which basically means technically you have a different format for each catalog integration.

In external manifest store case, we are still centered around the fact that storage is the source of truth. We use the external manifest store to commit and resolve latest version, but there is a ground truth that there will eventually be a storage-only version of the table that is readable without the external manifest store.

Are you saying the current design forces external stores to be implemented in such a way that they can easily support different filesystems? Is the concern here than a unified interface would make it too easy for an implementation to tie itself into the details of one specific storage backend?

yes

With that being said, I modified the external manifest store contract to be a bit more relaxed. Previously it must follow the mechanism of:

produce a staging manifest file
put_if_not_exist the manifest file to the external manifest store at a given version
write the actual version manifest to object store based on the manifest naming scheme
put_if_exists the final manifest with the right naming scheme into external manifest store
delete the staged manifest file

Note that everything starting step 3 are best effort. If the reader founds anything in 3-5 fails, it would try to do that at read time.

I think this is a bit too prescriptive. It should be okay as long as the whole mechanism can guarantee that the manifest is eventually put at the right location based on the manifest naming scheme. So I added a more generic put function to the trait, rather than forcing the manifest store to go through exactly the mechanism above. The mechanism above is the default implementation of the put function but can be overwritten.

jackye1995 · 2026-02-13T00:18:35Z

jackye1995
Feb 13, 2026
Maintainer Author

With that being said, I feel we might want to rename CommitHandler, it was one of the confusing thing to me when I initially started to work on the repo, as it handles a lot more than just the commit path. Maybe we should call it ManifestHandler or ManifestCoordinator.

Also I see there is a code path in python today that does a commit_lock feature on top of commit_handler to expose a lock mechanism that python can provide an implementation. I am not super sure if that is still used, maybe we should consider deprecating it and consolidate to work only through the defined schemes like ddb or ns.

@wjones127 curious what you think since you originally added those

0 replies

wojiaodoubao · 2026-03-05T15:17:16Z

wojiaodoubao
Mar 5, 2026

This proposal is really great! Let me explain why it is particularly critical for the partitioned namespace use case.

In the partitioned table scenario, there is a fundamental problem: when multiple tables are written at the same time, we need to support atomic commits across those tables. The current idea is to leverage the atomic merge_insert capability of the __manifest table in ManifestNamespace.

In the previous approach, each table would first perform a detached commit individually, and then ManifestNamespace would update the __manifest table to finalize the partitioned namespace commit. However, this approach has an annoying issue: once the tables leave the control of partitioned namespace, the partitioned table becomes unreadable.

The new approach uses LanceNamespaceExternalManifestStore to manage commits for all partition tables. Internally, ManifestNamespace::CreateTableVersions will handle atomic multi-table commits of partitioned namespace.

This approach solves both problems at the same time:

It enables atomic commits across multiple tables.
With the help of LanceNamespaceExternalManifestStore, partitioned tables can still be read correctly even outside the partitioned namespace.

0 replies

Formalizing the ManifestStore Concept #5849

Uh oh!

jackye1995 Jan 29, 2026 Maintainer

Background

Motivation

1. Supporting Custom Manifest Store Implementations

2. Multi-Table Transactions in Directory Namespace

3. Catalog Integration (e.g., Polaris)

Proposal

Replies: 3 comments · 6 replies

Uh oh!

Uh oh!

westonpace Jan 29, 2026 Maintainer

Uh oh!

westonpace Feb 3, 2026 Maintainer

Uh oh!

ztorchan Feb 4, 2026

Uh oh!

Uh oh!

jackye1995 Feb 12, 2026 Maintainer Author

Uh oh!

westonpace Feb 13, 2026 Maintainer

Uh oh!

Uh oh!

jackye1995 Feb 20, 2026 Maintainer Author

Uh oh!

jackye1995 Feb 13, 2026 Maintainer Author

Uh oh!

wojiaodoubao Mar 5, 2026

jackye1995
Jan 29, 2026
Maintainer

Replies: 3 comments 6 replies

westonpace
Jan 29, 2026
Maintainer

westonpace Feb 3, 2026
Maintainer

jackye1995 Feb 12, 2026
Maintainer Author

westonpace Feb 13, 2026
Maintainer

jackye1995 Feb 20, 2026
Maintainer Author

jackye1995
Feb 13, 2026
Maintainer Author

wojiaodoubao
Mar 5, 2026