Clearly separate Lance Namespace and Lance Catalog #6109

jackye1995 · 2026-03-05T19:06:52Z

jackye1995
Mar 5, 2026
Maintainer

The Lance Namespace spec has been evolving very rapidly, and now includes a lot of things. So far, it has been confusing to readers about what exactly that is, for very good reasons. At this point, it has become a basket of different things, as listed in the spec itself:

The Lance Namespace spec consists of four main parts:

Client Spec: A consistent abstraction that adapts to various catalog specs,
allowing users to access and operate on a collection of tables in a multimodal lakehouse.
This is the core reason why we call it "Namespace" rather than "Catalog" -
namespace can mean catalog, schema, metastore, database, metalake, etc.,
and the spec provides a unified interface across all of them.
Native Catalog Specs: Natively maintained catalog specs that are compliant with the Lance Namespace client spec:
- Directory Namespace Spec: A storage-only catalog spec that requires no external metadata service dependencies —
  tables are organized directly on storage (local filesystem, S3, GCS, Azure, etc.)
- REST Namespace Spec: A REST-based catalog spec ideal for data infrastructure teams that want to develop
  their own custom handling in their specific enterprise environments.
Implementation Specs: Defines how a given catalog spec integrates with the client spec.
It details how an object in a Lance Namespace maps to an object in the specific catalog spec,
and how each operation in Lance Namespace is fulfilled by the catalog spec.
The implementation specs for Directory and REST namespaces are part of the native Lance Namespace spec.
Implementation specs for other catalog specs
(e.g. Apache Polaris, Unity Catalog, Apache Hive Metastore, Apache Iceberg REST Catalog)
are considered integrations - anyone can provide additional implementation specs outside Lance Namespace,
and they can be owned by external parties without needing to go through the Lance community voting process to be adopted.
Partitioning Spec: Defines a storage format for partitioned namespaces built on the Directory Namespace.
It enables organizing data into physically separated units (partitions) that share a common schema,
with support for partition evolution, pruning, and multi-partition transactions.

Why it is what it is today

This is kind of an organic development, because the original goal of namespace spec is indeed to connect to all the existing catalog/metastore/metalake/... but as a part of this, we also have developed a storage only catalog and a REST version for customer adoption and customization, and we just put it in the namespace spec itself so far.

At the same time, because we use OpenAPI to generate data models for the namespace clients, the REST model essentially serves the double-purpose of both use it for rest catalog and also for namespace client.

Proposal

I propose we separate out the concept of Lance Catalog and Lance Namespace, in the following architecture:

So we clearly separate out what Lance Catalog and Lance Namespace means:

Lance Catalog: Lance natively maintained 2 catalog implementations, so that you can store a list of Lance tables
Lance Namespace: Client Spec for standardizing catalog/metastore/database/namespace/schema interactions, so Lance can work with not just Lance directory and REST catalogs, but all different catalog specs out there.

From project structure perspective, directory and rest implementations are in lance repo directly, so I don't see we have to make much change for it.

A few implcations:

For the shared OpenAPI model between REST Catalog and Namespace client, we will basically say that the Namespace client takes a dependency on the REST Catalog models.
The partitioning spec is basically a feature of the Lance Directory Catalog, from this separation.

I think this would help avoid a lot of confusions and make the whole lakehouse format architecture more clear.

Curious what people think about this

beinan · 2026-03-06T00:22:41Z

beinan
Mar 6, 2026

Love it. I’m also thinking it might make sense to add global lock support at the catalog layer—what do you think?

1 reply

jackye1995 Mar 6, 2026
Maintainer Author

pessimistic lock is something we are trying to avoid. What is your main use case? Recently I am also encountering some cases like I want some maintenance procedure to run in exact sequence and coordinate better with writes, but I am trying to achieve that through information in transaction properties. Maybe if the use case is similar, I can share more of my current work and thinking

wojiaodoubao · 2026-03-06T04:07:28Z

wojiaodoubao
Mar 6, 2026

The partitioning spec is basically a feature of the Lance Directory Catalog, from this separation.
Is my understanding correct?

Since the partition spec is a feature of the Lance Directory Catalog, does that imply: LanceDirectoryCatalog needs to add partition-related methods, such as: resolve_or_create_partition_table, plan_scan, etc.

At the engine integration layer (for example, in lance-spark), partitioned tables will ultimately be handled through a LanceDirectoryCatalog instance.

For example:

class PartitionedLanceDataset {
  LanceDirectoryCatalog partitionCatalog;
}

For the shared OpenAPI model between REST Catalog and Namespace client, we will basically say that the Namespace client takes a dependency on the REST Catalog models.

Let me first explain my understanding:

The current lance-namespace project will likely be split into two projects: lance-namespace and lance-catalog, where lance-namespace depends on lance-catalog.
lance-catalog will be defined through OpenAPI. The lance-namespace will reuse the object models defined in lance-catalog to implement lance-namespace.
(not sure) lance-namespace impl will hold a lance-catalog instance, it is used for parsing database namespaces and tables.

However, there is one thing I haven’t fully figured out yet. The abstraction of lance-namespace is that it supports an arbitrarily deep namespace hierarchy, where all leaf nodes are tables. After splitting out lance-catalog, how should the hierarchical abstractions of lance-namespace and lance-catalog be defined?

The definition of lance-namespace will probably remain unchanged. It still needs to support unlimited namespace depth to remain flexible and support concepts such as metalake.

Should lance-catalog be standardized to a two-level abstraction of database and table? Or should lance-catalog, like lance-namespace, also support arbitrary intermediate nodes with tables as leaf nodes?

In the lance-namespace tree structure, the upper-level nodes are managed by lance-namespace, while some intermediate nodes become lance-catalog nodes, and the entire subtree starting from a lance-catalog node is managed by lance-catalog?

From the perspective of the partition spec, if we restrict lance-catalog to only database and table levels, there may be some conflicts. Since we have decided that partitioning is a feature of DirectoryNamespace, does that mean:

We need to map the partition spec namespace to the database level in lance-catalog?
Should we stop using intermediate namespaces to represent partition fields?

Semantically, this feels slightly awkward. If we use namespace, it is easier to understand because namespace is an abstract concept. However, if we use database to represent the partition spec namespace, it may introduce some confusion.

1 reply

jackye1995 Mar 6, 2026
Maintainer Author

The current lance-namespace project will likely be split into two projects: lance-namespace and lance-catalog, where lance-namespace depends on lance-catalog.

I don't think we need to split projects again. I think Lance Namespace defines how the catalog is used - the "client interface", and directory and REST catalog defines how the catalog is implemented - the "server logic".

This is like you have defined ANSI-SQL, and you can choose your database to speak ANSI-SQL directly, or you have your own database dialect + an ANSI mode, and we choose to develop 2 databases that directly speaks ANSI-SQL.

The proposed divide is mostly on the lakehouse format spec side. From code perspective, we can still have DirectoryNamespace and RestNamespace, and technically they are already separated correctly.

Since the partition spec is a feature of the Lance Directory Catalog, does that imply: LanceDirectoryCatalog needs to add partition-related methods, such as: resolve_or_create_partition_table, plan_scan, etc.

I should correct my wording on that - "Partition spec is built on top of the Lance Directory Catalog", that's probably a better way to say it.

Should lance-catalog be standardized to a two-level abstraction of database and table? Or should lance-catalog, like lance-namespace, also support arbitrary intermediate nodes with tables as leaf nodes?

I think it will remain as is to support arbitrary intermediate nodes. If people want exactly 3 levels of catalog -> database -> table, or 4 levels of catalog -> database -> schema -> table, I think we can add configs to enable such feature in directory catalog.

Semantically, this feels slightly awkward. If we use namespace, it is easier to understand because namespace is an abstract concept.

I think we will continue to use namespace, nothing changes for the ongoing partitioning effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clearly separate Lance Namespace and Lance Catalog #6109

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Clearly separate Lance Namespace and Lance Catalog #6109

Uh oh!

Uh oh!

jackye1995 Mar 5, 2026 Maintainer

Why it is what it is today

Proposal

Replies: 2 comments · 2 replies

Uh oh!

beinan Mar 6, 2026

Uh oh!

Uh oh!

jackye1995 Mar 6, 2026 Maintainer Author

Uh oh!

wojiaodoubao Mar 6, 2026

Uh oh!

jackye1995 Mar 6, 2026 Maintainer Author

jackye1995
Mar 5, 2026
Maintainer

Replies: 2 comments 2 replies

beinan
Mar 6, 2026

jackye1995 Mar 6, 2026
Maintainer Author

wojiaodoubao
Mar 6, 2026

jackye1995 Mar 6, 2026
Maintainer Author