fix(maintenance): open namespace-backed Lance tables via catalog entry#204
Merged
Merged
Conversation
Root cause: maintenance table functions (compact / cleanup /
optimize_index / set_auto_cleanup / show_auto_cleanup) bind by passing
the first argument through ResolveLanceDatasetUri() and then call
LanceOpenDataset(bind_data.dataset_uri) at execute time.
For namespace-backed Lance tables, DefaultCatalogResolver returns
LanceTableEntry::DatasetUri(), which for REST namespaces is the
"virtual" handle 'endpoint + "/" + table_id' (e.g.
'http://host:2333/cat$sch$tbl'). lance-io cannot open that handle
directly -- the real cos:// / file:// location is only reachable through
the namespace REST API via LanceOpenDatasetForTable().
As a result, OPTIMIZE / VACUUM LANCE / SHOW MAINTENANCE / ALTER TABLE
SET AUTO_CLEANUP / ALTER TABLE UNSET AUTO_CLEANUP all fail with
'IO Error: Failed to open Lance dataset: http://...' whenever the
target is a REST namespace table. Directory namespaces are unaffected
because their DatasetUri() is already the real on-disk path.
Fix: at execute time, re-resolve the original first-argument literal as
a Lance catalog entry and route through LanceOpenDatasetForTable(),
which handles the namespace REST API path. Fall back to the old
LanceOpenDataset(dataset_uri) when the input is a plain path literal so
the existing 'open by path' contract is preserved.
Changes:
- Promote TryResolveLanceTableEntry() from a static helper in
lance_search.cpp to a public symbol in lance_common.{hpp,cpp} so the
search and maintenance code paths share one canonical implementation.
The shared version uses OnEntryNotFound::RETURN_NULL and only catches
ParserException, so genuine errors (auth failures, network errors,
catalog corruption) are no longer silently swallowed by 'catch(...)'.
- lance_maintenance.cpp: store the original first-argument literal
alongside dataset_uri in LanceMaintenanceBindData,
LanceAutoCleanupSetBindData, and LanceAutoCleanupShowBindData. At
execute time, try TryOpenDatasetForMaintenanceInput() first
(TryResolveLanceTableEntry + LanceOpenDatasetForTable) and only fall
back to LanceOpenDataset(dataset_uri) when the input does not name a
Lance catalog entry.
- The displayed Target column now reflects the resolved real URI
returned by LanceOpenDatasetForTable() instead of the virtual
namespace handle.
- Add test/sql/namespace_rest_maintenance.test exercising compact,
cleanup, set / unset / show auto_cleanup against a REST namespace
table. Gated by LANCE_TEST_NAMESPACE=1, mirroring
namespace_rest_attach.test.
Compatibility:
- Path-based callers (e.g. OPTIMIZE 'cos://bucket/x.lance') are
unchanged: TryResolveLanceTableEntry short-circuits on inputs
containing '/', '\\', or '://' and the executor falls back to the
original LanceOpenDataset() path.
- Directory-namespace callers
(e.g. OPTIMIZE dir_ns.main.tbl) are unchanged: dir-namespace tables
are not IsNamespaceBacked(), so LanceOpenDatasetForTable() forwards
to LanceOpenDataset(table.DatasetUri()) -- the same real path that
the previous code was already using.
Xuanwo
approved these changes
May 12, 2026
Collaborator
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Root cause: maintenance table functions (compact / cleanup / optimize_index / set_auto_cleanup / show_auto_cleanup) bind by passing the first argument through ResolveLanceDatasetUri() and then call LanceOpenDataset(bind_data.dataset_uri) at execute time.
For namespace-backed Lance tables, DefaultCatalogResolver returns LanceTableEntry::DatasetUri(), which for REST namespaces is the "virtual" handle 'endpoint + "/" + table_id' (e.g. 'http://host:2333/cat$sch$tbl'). lance-io cannot open that handle directly -- the real cos:// / file:// location is only reachable through the namespace REST API via LanceOpenDatasetForTable().
As a result, OPTIMIZE / VACUUM LANCE / SHOW MAINTENANCE / ALTER TABLE SET AUTO_CLEANUP / ALTER TABLE UNSET AUTO_CLEANUP all fail with 'IO Error: Failed to open Lance dataset: http://...' whenever the target is a REST namespace table. Directory namespaces are unaffected because their DatasetUri() is already the real on-disk path.
Fix: at execute time, re-resolve the original first-argument literal as a Lance catalog entry and route through LanceOpenDatasetForTable(), which handles the namespace REST API path. Fall back to the old LanceOpenDataset(dataset_uri) when the input is a plain path literal so the existing 'open by path' contract is preserved.
Changes:
Compatibility: