Skip to content

refactor!: cleanup namespace related APIs#6186

Merged
jackye1995 merged 12 commits intolance-format:mainfrom
jackye1995:latest-so
Apr 2, 2026
Merged

refactor!: cleanup namespace related APIs#6186
jackye1995 merged 12 commits intolance-format:mainfrom
jackye1995:latest-so

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Mar 13, 2026

  1. Remove storage_options_provider in python and java, because to make managed verisoning work, we have updated the codebase to pass namespace and table ID into the python and java binding layer. It becomes unnecessary for us to do a language specific storage_options_provider and then bind that to rust, because we can directly construct the rust StorageOptionsProvider using binded namespace client.
  2. rename the following: namespace to namespace_client, namespace_impl to namespace_client_impl, namespace_properties to namespace_client_properties, namespace which means the namespace path to namespace_path. This is done for all code in rust, python, java. This rename is based on community feedback, and aims at clarifying the concept of Namespace Client SDK and its implementations vs the namespace path like ["ns1", "ns2"].
  3. add vend_input_storage_options and ops_metrics_enabled so that we can now use DirectoryNamespace directory for testing all these changes made, without the need to rely on an extra tracking namespace. Update all tests accordingly to use the new feature.
  4. fix the known bug that python and java binding for non-native namespace client implementation is not fully working with managed versioning due to binding level model conversion.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

PR Review: refactor!: cleanup namespace related APIs

Overall this is a clean rename + deprecation PR. A few issues worth addressing:

P1: Double deprecation warnings for storage_options_provider

The storage_options_provider deprecation warning is emitted both in dataset() (in __init__.py) and in LanceDataset.__init__(). When calling dataset(storage_options_provider=...), the user will see the same warning twice. Consider only warning in the public entry point (dataset()) and not in LanceDataset.__init__, or gate the inner warning.

P1: Minor doc inconsistency in write_fragments

In fragment.py, the deprecation notice for storage_options_provider in write_fragments says "When using namespace and table_id" — should be namespace_client and table_id to match the new API.


🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

table_id: Optional[List[str]] = None,
storage_options_provider: Optional[Any] = None,
# Deprecated parameters
namespace: Optional[LanceNamespace] = None,
Copy link
Copy Markdown
Member

@westonpace westonpace Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humorous observation: technically moving arguments without a * in the argument list is a breaking change but if anyone was using this argument as a positional argument at this point then I worry

Comment on lines +102 to +103
storage_options_provider: Optional[Any] = None,
# Deprecated parameters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storage_options_provider: Optional[Any] = None,
# Deprecated parameters
# Deprecated parameters
storage_options_provider: Optional[Any] = None,

@@ -431,14 +431,38 @@ def __init__(
read_params: Optional[Dict[str, Any]] = None,
session: Optional[Session] = None,
storage_options_provider: Optional[Any] = None,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to deprecated parameters section?

Comment on lines +3709 to +3717
if storage_options_provider is not None:
warnings.warn(
"The 'storage_options_provider' parameter is deprecated and will be "
"removed in version 5.0.0. When using 'namespace_client' and "
"'table_id', the storage options provider is created automatically. "
"Pass 'storage_options' instead for static or initial storage options.",
DeprecationWarning,
stacklevel=2,
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning is repeated all over the place. In a pre-AI world I'd probably say we should make a helper method. Now I'll just say it might be a good idea.


**Only valid in CREATE mode**. Will raise an error if used with
APPEND/OVERWRITE modes.
namespace_client : optional, LanceNamespace
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we document these things in the site docs anywhere? I don't see any changes in this PR to those.

@jackye1995 jackye1995 force-pushed the latest-so branch 2 times, most recently from de2108c to 1c2894a Compare March 31, 2026 17:57
jackye1995 and others added 6 commits March 31, 2026 14:33
…ent in file module

Update LanceFileReader, LanceFileWriter, and LanceFileSession to use
namespace_client and table_id instead of storage_options_provider.
This aligns with the dataset module's approach where the storage options
provider is automatically created from the namespace client in Rust.

- Update Rust file.rs to accept namespace_client and table_id
- Update Python file.py classes to use new parameters
- Update .pyi type stubs
- Update dataset.py to pass namespace_client/table_id to LanceFileSession

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
…f storage_options_provider

Update test_namespace_integration.py to use the new namespace_client
and table_id parameters for file operations instead of creating a
NamespaceStorageOptionsProvider.

- Rename test_file_writer_with_storage_options_provider to test_file_writer_with_namespace_client
- Rename test_file_reader_with_storage_options_provider to test_file_reader_with_namespace_client
- Rename test_file_session_with_storage_options_provider to test_file_session_with_namespace_client
- Remove NamespaceStorageOptionsProvider test helper class

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Fully remove the Python-level StorageOptionsProvider class and related
code. Users should now use namespace_client and table_id parameters
instead, which internally creates the appropriate storage options
provider in Rust.

Removed:
- StorageOptionsProvider Protocol class from lance.io
- PyStorageOptionsProvider and PyStorageOptionsProviderWrapper from Rust
- py_object_to_storage_options_provider function
- with_provider and with_initial_and_provider methods from PyStorageOptionsAccessor
- storage_options_provider parameter from dataset commit functions

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
The Java SDK uses namespaceClient and tableId approach instead of
exposing StorageOptionsProvider to users. The JNI layer internally
creates LanceNamespaceStorageOptionsProvider from these parameters.

Removed:
- JavaStorageOptionsProvider struct and StorageOptionsProvider impl
- storage_options_provider_obj parameter from extract_write_params
- Related imports and exports

The storage_options.rs module is now empty as storage options providers
are created directly in Rust from namespace clients.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Add namespace_client and table_id parameters to LanceFragment.create
for automatic credential refresh via namespace. This makes the API
consistent with write_fragments which already supports these parameters.

Fragment writers only need namespace_client for credential refresh,
not for managed versioning, since they don't commit transactions.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
jackye1995 and others added 6 commits April 1, 2026 13:26
When a namespace provider (like DirectoryNamespace) doesn't vend
credentials, it returns None from fetch_storage_options(). This is
valid - it means "use default/environment credentials".

Previously, this caused an error "Provider returned no storage options".
Now we return an empty StorageOptions map, which signals to use
default credentials from the environment.

This fixes the case where DirectoryNamespace is used with S3 and
environment credentials (AWS_ACCESS_KEY_ID, etc.) should be used.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Implements the query_table method for DirectoryNamespace, enabling
server-side query execution. The implementation:

- Loads the dataset for the specified table
- Applies column projection from the request
- Applies filter expressions (SQL WHERE clause)
- Applies limit/offset for pagination
- Returns results serialized as Arrow IPC file format

This enables the REST namespace server backed by DirectoryNamespace
to handle query_table requests from lancedb clients.

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
Extends the DirectoryNamespace query_table implementation to support:

Vector search:
- Query vector (single or multi-vector)
- Vector column selection
- Distance type (L2, cosine, dot, hamming)
- Index parameters: nprobes, ef, refine_factor
- Distance bounds (lower_bound, upper_bound)
- bypass_vector_index, prefilter, fast_search flags

Full text search:
- String query with optional column filter

Other:
- with_row_id flag to return _rowid column

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
…b native implementation

- Change nprobes to use minimum_nprobes() instead of nprobes() to match lancedb behavior
- Add support for column_aliases using project_with_transform()

Generated with [Claude Code](https://claude.ai/code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
@jackye1995
Copy link
Copy Markdown
Contributor Author

I ended up doing all the backwards incompatible changes in both python and java, because to keep backwards compatibility it becomes too complicated, it is just not worth doing it just to keep it around for a single version.

@jackye1995 jackye1995 merged commit d13d7d8 into lance-format:main Apr 2, 2026
28 checks passed
jackye1995 added a commit to lancedb/lancedb that referenced this pull request Apr 3, 2026
…ation (#3205)

1. Refactored every client (Rust core, Python, Node/TypeScript) so
“namespace” usage is explicit: code now keeps namespace paths
(namespace_path) separate from namespace clients (namespace_client).
Connections propagate the client, table creation routes through it, and
managed versioning defaults are resolved from namespace metadata. Python
gained LanceNamespaceDBConnection/async counterparts, and the
namespace-focused tests were rewritten to match the clarified API
surface.
2. Synchronized the workspace with Lance 5.0.0-beta.3 (see
lance-format/lance#6186 for the upstream
namespace refactor), updating Cargo/uv lockfiles and ensuring all
bindings align with the new namespace semantics.
3. Added a namespace-backed code path to lancedb.connect() via new
keyword arguments (namespace_client_impl, namespace_client_properties,
plus the existing pushdown-ops flag). When those kwargs are supplied,
connect() delegates to connect_namespace, so users can opt into
namespace clients without changing APIs. (The async helper will gain
parity in a later change)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants