Formalizing the ManifestStore Concept #5849
Replies: 3 comments 6 replies
-
|
What's the relationship between My understanding is...
So maybe my question is: If a user is using a If I look at your trait I think the answer is "the |
Beta Was this translation helpful? Give feedback.
-
|
With that being said, I feel we might want to rename Also I see there is a code path in python today that does a @wjones127 curious what you think since you originally added those |
Beta Was this translation helpful? Give feedback.
-
|
This proposal is really great! Let me explain why it is particularly critical for the partitioned namespace use case. In the partitioned table scenario, there is a fundamental problem: when multiple tables are written at the same time, we need to support atomic commits across those tables. The current idea is to leverage the atomic merge_insert capability of the __manifest table in ManifestNamespace. In the previous approach, each table would first perform a detached commit individually, and then ManifestNamespace would update the __manifest table to finalize the partitioned namespace commit. However, this approach has an annoying issue: once the tables leave the control of partitioned namespace, the partitioned table becomes unreadable. The new approach uses LanceNamespaceExternalManifestStore to manage commits for all partition tables. Internally, ManifestNamespace::CreateTableVersions will handle atomic multi-table commits of partitioned namespace. This approach solves both problems at the same time:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Lance currently has an
ExternalManifestStoretrait defined inlance-table/src/io/commit/external_manifest.rsthat provides an abstraction for external storage with put-if-not-exists semantics. This trait is used in the commit path to coordinate concurrent writes, ensuring only one writer wins when multiple transactions target the same version.The current trait definition:
Today, we have a single production implementation using DynamoDB (
DynamoDBExternalManifestStore), which is accessed via thes3+ddb://URI scheme.This is also a part of our table spec: https://lance.org/format/table/transaction/#external-manifest-store, but we had a few discussions about potentially deprecating it, and I even proposed for example manifest scheme v3 to allow DynamoDB-based solution to move to storage-only manifest listing without performance regression on S3 Express.
Motivation
Over the past few months, I have identified three use cases that would benefit from not removing this concept, but actually formalizing this concept as a first-class
ManifestStoreabstraction, and I would like to share with the community:1. Supporting Custom Manifest Store Implementations
Context: The Tencent Cloud Object Storage team has reached out with a requirement. Their PUT-IF-NOT-EXISTS feature (conditional writes) is only available for newly created buckets. Existing bucket users cannot leverage this feature, so they would like to implement a custom solution using the external manifest store pattern.
This is likely a common scenario for other cloud providers or self-hosted object storage systems that don't support atomic conditional operations. Having a well-documented, stable
ManifestStoreinterface would enable these users to plug in their own coordination mechanism.2. Multi-Table Transactions in Directory Namespace
Context: In the Directory Namespace implementation (particularly the partitioned namespace proposal), we've explored how to atomically commit changes to multiple tables (partitions) simultaneously.
The Directory Namespace already uses a
__manifestLance table to track table metadata (name, location, version, etc.). This is essentially a ManifestStore! With features like MergeInsert and unenforced primary key deduplication, we can extend this pattern to atomically increment version numbers for multiple tables in a single operation.A formalized ManifestStore concept could provide:
CreateTableVersion(table_uri, version, manifest_path)- single table commitCreateTableVersions([(table_uri_1, version_1, manifest_path_1), ...])- batch/atomic multi-table commit that translates to a MergeInsertThis would enable use cases like:
3. Catalog Integration (e.g., Polaris)
Context: For better ecosystem integration with catalog systems (Iceberg REST Catalog / Polaris, Unity Catalog, etc.), I have been exploring how catalogs can intercept and coordinate commit requests.
The idea is that a catalog could expose APIs like:
ListTableVersions(table_id)GetTableVersion(table_id, version)CreateTableVersion(table_id, version, manifest_location)In this model, the catalog essentially acts as a ManifestStore for Lance tables, enabling:
Alternative considered: I also explored adopting a catalog-aware
commitTableapproach similar to Iceberg, where Lance would use non-incremental metadata pointers (UUIDs) instead of incremental version numbers. More details at #5229. However, this approach was concluded to be non-ideal for the following reasons:The ManifestStore approach preserves Lance's version semantics while still allowing catalogs to intercept and coordinate commits.
Proposal
In short, I propose we introduce a formal
ManifestStoretrait. We can at least provide 2 implementations:DirectoryManifestStoreandDynamodbManifestStore, and there can be other implementations likeTencentCosManifestStore,PolarisManifestStore.I have not thought too much about how to make it truly pluggable yet. It feels like it falls into the same bucket of either we add all dependencies and implementations in rust, or we need some sort of C interface.
Curious what people think about this!
Beta Was this translation helpful? Give feedback.
All reactions