refactor!: cleanup namespace related APIs#6186
refactor!: cleanup namespace related APIs#6186jackye1995 merged 12 commits intolance-format:mainfrom
Conversation
PR Review: refactor!: cleanup namespace related APIsOverall this is a clean rename + deprecation PR. A few issues worth addressing: P1: Double deprecation warnings for
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
python/python/lance/__init__.py
Outdated
| table_id: Optional[List[str]] = None, | ||
| storage_options_provider: Optional[Any] = None, | ||
| # Deprecated parameters | ||
| namespace: Optional[LanceNamespace] = None, |
There was a problem hiding this comment.
Humorous observation: technically moving arguments without a * in the argument list is a breaking change but if anyone was using this argument as a positional argument at this point then I worry
python/python/lance/__init__.py
Outdated
| storage_options_provider: Optional[Any] = None, | ||
| # Deprecated parameters |
There was a problem hiding this comment.
| storage_options_provider: Optional[Any] = None, | |
| # Deprecated parameters | |
| # Deprecated parameters | |
| storage_options_provider: Optional[Any] = None, |
python/python/lance/dataset.py
Outdated
| @@ -431,14 +431,38 @@ def __init__( | |||
| read_params: Optional[Dict[str, Any]] = None, | |||
| session: Optional[Session] = None, | |||
| storage_options_provider: Optional[Any] = None, | |||
There was a problem hiding this comment.
Move this to deprecated parameters section?
python/python/lance/dataset.py
Outdated
| if storage_options_provider is not None: | ||
| warnings.warn( | ||
| "The 'storage_options_provider' parameter is deprecated and will be " | ||
| "removed in version 5.0.0. When using 'namespace_client' and " | ||
| "'table_id', the storage options provider is created automatically. " | ||
| "Pass 'storage_options' instead for static or initial storage options.", | ||
| DeprecationWarning, | ||
| stacklevel=2, | ||
| ) |
There was a problem hiding this comment.
This warning is repeated all over the place. In a pre-AI world I'd probably say we should make a helper method. Now I'll just say it might be a good idea.
|
|
||
| **Only valid in CREATE mode**. Will raise an error if used with | ||
| APPEND/OVERWRITE modes. | ||
| namespace_client : optional, LanceNamespace |
There was a problem hiding this comment.
Do we document these things in the site docs anywhere? I don't see any changes in this PR to those.
de2108c to
1c2894a
Compare
…ent in file module Update LanceFileReader, LanceFileWriter, and LanceFileSession to use namespace_client and table_id instead of storage_options_provider. This aligns with the dataset module's approach where the storage options provider is automatically created from the namespace client in Rust. - Update Rust file.rs to accept namespace_client and table_id - Update Python file.py classes to use new parameters - Update .pyi type stubs - Update dataset.py to pass namespace_client/table_id to LanceFileSession Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…f storage_options_provider Update test_namespace_integration.py to use the new namespace_client and table_id parameters for file operations instead of creating a NamespaceStorageOptionsProvider. - Rename test_file_writer_with_storage_options_provider to test_file_writer_with_namespace_client - Rename test_file_reader_with_storage_options_provider to test_file_reader_with_namespace_client - Rename test_file_session_with_storage_options_provider to test_file_session_with_namespace_client - Remove NamespaceStorageOptionsProvider test helper class Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Fully remove the Python-level StorageOptionsProvider class and related code. Users should now use namespace_client and table_id parameters instead, which internally creates the appropriate storage options provider in Rust. Removed: - StorageOptionsProvider Protocol class from lance.io - PyStorageOptionsProvider and PyStorageOptionsProviderWrapper from Rust - py_object_to_storage_options_provider function - with_provider and with_initial_and_provider methods from PyStorageOptionsAccessor - storage_options_provider parameter from dataset commit functions Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
The Java SDK uses namespaceClient and tableId approach instead of exposing StorageOptionsProvider to users. The JNI layer internally creates LanceNamespaceStorageOptionsProvider from these parameters. Removed: - JavaStorageOptionsProvider struct and StorageOptionsProvider impl - storage_options_provider_obj parameter from extract_write_params - Related imports and exports The storage_options.rs module is now empty as storage options providers are created directly in Rust from namespace clients. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Add namespace_client and table_id parameters to LanceFragment.create for automatic credential refresh via namespace. This makes the API consistent with write_fragments which already supports these parameters. Fragment writers only need namespace_client for credential refresh, not for managed versioning, since they don't commit transactions. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
When a namespace provider (like DirectoryNamespace) doesn't vend credentials, it returns None from fetch_storage_options(). This is valid - it means "use default/environment credentials". Previously, this caused an error "Provider returned no storage options". Now we return an empty StorageOptions map, which signals to use default credentials from the environment. This fixes the case where DirectoryNamespace is used with S3 and environment credentials (AWS_ACCESS_KEY_ID, etc.) should be used. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Implements the query_table method for DirectoryNamespace, enabling server-side query execution. The implementation: - Loads the dataset for the specified table - Applies column projection from the request - Applies filter expressions (SQL WHERE clause) - Applies limit/offset for pagination - Returns results serialized as Arrow IPC file format This enables the REST namespace server backed by DirectoryNamespace to handle query_table requests from lancedb clients. Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
Extends the DirectoryNamespace query_table implementation to support: Vector search: - Query vector (single or multi-vector) - Vector column selection - Distance type (L2, cosine, dot, hamming) - Index parameters: nprobes, ef, refine_factor - Distance bounds (lower_bound, upper_bound) - bypass_vector_index, prefilter, fast_search flags Full text search: - String query with optional column filter Other: - with_row_id flag to return _rowid column Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
…b native implementation - Change nprobes to use minimum_nprobes() instead of nprobes() to match lancedb behavior - Add support for column_aliases using project_with_transform() Generated with [Claude Code](https://claude.ai/code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
|
I ended up doing all the backwards incompatible changes in both python and java, because to keep backwards compatibility it becomes too complicated, it is just not worth doing it just to keep it around for a single version. |
…ation (#3205) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see lance-format/lance#6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)
storage_options_providerin python and java, because to make managed verisoning work, we have updated the codebase to pass namespace and table ID into the python and java binding layer. It becomes unnecessary for us to do a language specificstorage_options_providerand then bind that to rust, because we can directly construct the rustStorageOptionsProviderusing binded namespace client.namespacetonamespace_client,namespace_impltonamespace_client_impl,namespace_propertiestonamespace_client_properties,namespacewhich means the namespace path tonamespace_path. This is done for all code in rust, python, java. This rename is based on community feedback, and aims at clarifying the concept of Namespace Client SDK and its implementations vs the namespace path like["ns1", "ns2"].vend_input_storage_optionsandops_metrics_enabledso that we can now use DirectoryNamespace directory for testing all these changes made, without the need to rely on an extra tracking namespace. Update all tests accordingly to use the new feature.