Skip to content

fix(maintenance): open namespace-backed Lance tables via catalog entry#204

Merged
Xuanwo merged 1 commit into
lance-format:mainfrom
mikewhb:main_fix_maintenance
May 12, 2026
Merged

fix(maintenance): open namespace-backed Lance tables via catalog entry#204
Xuanwo merged 1 commit into
lance-format:mainfrom
mikewhb:main_fix_maintenance

Conversation

@mikewhb
Copy link
Copy Markdown
Contributor

@mikewhb mikewhb commented May 11, 2026

Root cause: maintenance table functions (compact / cleanup / optimize_index / set_auto_cleanup / show_auto_cleanup) bind by passing the first argument through ResolveLanceDatasetUri() and then call LanceOpenDataset(bind_data.dataset_uri) at execute time.

For namespace-backed Lance tables, DefaultCatalogResolver returns LanceTableEntry::DatasetUri(), which for REST namespaces is the "virtual" handle 'endpoint + "/" + table_id' (e.g. 'http://host:2333/cat$sch$tbl'). lance-io cannot open that handle directly -- the real cos:// / file:// location is only reachable through the namespace REST API via LanceOpenDatasetForTable().

As a result, OPTIMIZE / VACUUM LANCE / SHOW MAINTENANCE / ALTER TABLE SET AUTO_CLEANUP / ALTER TABLE UNSET AUTO_CLEANUP all fail with 'IO Error: Failed to open Lance dataset: http://...' whenever the target is a REST namespace table. Directory namespaces are unaffected because their DatasetUri() is already the real on-disk path.

Fix: at execute time, re-resolve the original first-argument literal as a Lance catalog entry and route through LanceOpenDatasetForTable(), which handles the namespace REST API path. Fall back to the old LanceOpenDataset(dataset_uri) when the input is a plain path literal so the existing 'open by path' contract is preserved.

Changes:

  • Promote TryResolveLanceTableEntry() from a static helper in lance_search.cpp to a public symbol in lance_common.{hpp,cpp} so the search and maintenance code paths share one canonical implementation. The shared version uses OnEntryNotFound::RETURN_NULL and only catches ParserException, so genuine errors (auth failures, network errors, catalog corruption) are no longer silently swallowed by 'catch(...)'.
  • lance_maintenance.cpp: store the original first-argument literal alongside dataset_uri in LanceMaintenanceBindData, LanceAutoCleanupSetBindData, and LanceAutoCleanupShowBindData. At execute time, try TryOpenDatasetForMaintenanceInput() first (TryResolveLanceTableEntry + LanceOpenDatasetForTable) and only fall back to LanceOpenDataset(dataset_uri) when the input does not name a Lance catalog entry.
  • The displayed Target column now reflects the resolved real URI returned by LanceOpenDatasetForTable() instead of the virtual namespace handle.
  • Add test/sql/namespace_rest_maintenance.test exercising compact, cleanup, set / unset / show auto_cleanup against a REST namespace table. Gated by LANCE_TEST_NAMESPACE=1, mirroring namespace_rest_attach.test.

Compatibility:

  • Path-based callers (e.g. OPTIMIZE 'cos://bucket/x.lance') are unchanged: TryResolveLanceTableEntry short-circuits on inputs containing '/', '\', or '://' and the executor falls back to the original LanceOpenDataset() path.
  • Directory-namespace callers (e.g. OPTIMIZE dir_ns.main.tbl) are unchanged: dir-namespace tables are not IsNamespaceBacked(), so LanceOpenDatasetForTable() forwards to LanceOpenDataset(table.DatasetUri()) -- the same real path that the previous code was already using.

Root cause: maintenance table functions (compact / cleanup /
optimize_index / set_auto_cleanup / show_auto_cleanup) bind by passing
the first argument through ResolveLanceDatasetUri() and then call
LanceOpenDataset(bind_data.dataset_uri) at execute time.

For namespace-backed Lance tables, DefaultCatalogResolver returns
LanceTableEntry::DatasetUri(), which for REST namespaces is the
"virtual" handle 'endpoint + "/" + table_id' (e.g.
'http://host:2333/cat$sch$tbl'). lance-io cannot open that handle
directly -- the real cos:// / file:// location is only reachable through
the namespace REST API via LanceOpenDatasetForTable().

As a result, OPTIMIZE / VACUUM LANCE / SHOW MAINTENANCE / ALTER TABLE
SET AUTO_CLEANUP / ALTER TABLE UNSET AUTO_CLEANUP all fail with
'IO Error: Failed to open Lance dataset: http://...' whenever the
target is a REST namespace table. Directory namespaces are unaffected
because their DatasetUri() is already the real on-disk path.

Fix: at execute time, re-resolve the original first-argument literal as
a Lance catalog entry and route through LanceOpenDatasetForTable(),
which handles the namespace REST API path. Fall back to the old
LanceOpenDataset(dataset_uri) when the input is a plain path literal so
the existing 'open by path' contract is preserved.

Changes:
- Promote TryResolveLanceTableEntry() from a static helper in
  lance_search.cpp to a public symbol in lance_common.{hpp,cpp} so the
  search and maintenance code paths share one canonical implementation.
  The shared version uses OnEntryNotFound::RETURN_NULL and only catches
  ParserException, so genuine errors (auth failures, network errors,
  catalog corruption) are no longer silently swallowed by 'catch(...)'.
- lance_maintenance.cpp: store the original first-argument literal
  alongside dataset_uri in LanceMaintenanceBindData,
  LanceAutoCleanupSetBindData, and LanceAutoCleanupShowBindData. At
  execute time, try TryOpenDatasetForMaintenanceInput() first
  (TryResolveLanceTableEntry + LanceOpenDatasetForTable) and only fall
  back to LanceOpenDataset(dataset_uri) when the input does not name a
  Lance catalog entry.
- The displayed Target column now reflects the resolved real URI
  returned by LanceOpenDatasetForTable() instead of the virtual
  namespace handle.
- Add test/sql/namespace_rest_maintenance.test exercising compact,
  cleanup, set / unset / show auto_cleanup against a REST namespace
  table. Gated by LANCE_TEST_NAMESPACE=1, mirroring
  namespace_rest_attach.test.

Compatibility:
- Path-based callers (e.g. OPTIMIZE 'cos://bucket/x.lance') are
  unchanged: TryResolveLanceTableEntry short-circuits on inputs
  containing '/', '\\', or '://' and the executor falls back to the
  original LanceOpenDataset() path.
- Directory-namespace callers
  (e.g. OPTIMIZE dir_ns.main.tbl) are unchanged: dir-namespace tables
  are not IsNamespaceBacked(), so LanceOpenDatasetForTable() forwards
  to LanceOpenDataset(table.DatasetUri()) -- the same real path that
  the previous code was already using.
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

@Xuanwo Xuanwo merged commit afedc0c into lance-format:main May 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants