Skip to content

Conversation

@bossbeagle1509
Copy link

Support special stores in blosc2.open() and add context managers

This PR enhances blosc2.open() to support opening DictStore and EmbedStore files directly via file extensions. It also implements context manager support for these stores and refactors the open() function to improve code maintainability.

Key Changes:

  1. Unified blosc2.open() API:

    • Added support for detecting and opening DictStore files with extensions .b2z and .b2d
    • Added support for detecting and opening EmbedStore files with extension .b2e
    • Users can now use blosc2.open("data.b2z") instead of importing specific store classes
  2. Refactoring (Code Quality):

    • Refactored src/blosc2/schunk.py:open() to address high cyclomatic complexity (Ruff C901) triggered by changes introduced by this commit
    • Extracted logic into private helper functions:
      • _open_special_store(): Handles extension-based dispatch for special stores
      • _set_default_dparams(): Centralizes default decompression parameter logic
      • _process_opened_object(): Handles post-open logic for Proxy and LazyArray
  3. Context Manager Support:

    • Implemented __enter__(), __exit__(), and close() methods for DictStore and EmbedStore
    • Enables usage of the with statement for safer resource management (e.g., with blosc2.open("file.b2z") as store:)
  4. Bug Fixes:

    • Fixed an infinite recursion bug where DictStore and EmbedStore internally called blosc2.open(), creating a loop. These now use the lower-level blosc2_ext.open()
  5. Testing:

    • Added test_open_context_manager() to tests/test_dict_store.py and tests/test_embed_store.py to verify the new functionality
    • All existing tests continue to pass

Copy link
Member

@FrancescAlted FrancescAlted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good for a first iteration. I wonder why you did not include TreeStore here.

Also, it would be nice to overhaul the docs and use this new handy capability where it would be applicable. A specific section in some tutorial about this would be fine too.

Thanks for your time!

@bossbeagle1509
Copy link
Author

Thank you for your feedback, and thank you for taking the time to review my PR.

The TreeStore completely slipped my mind because I saw that the DictStore and the EmbedStore were covering the specified file formats.

I'm not sure as to how I should decide when to open it as a TreeStore or DictStore. Off the top of my head, I could either:

  • Check the keys to see if there is some hierarchy showing in them

    def is_treestore(path):
          """Heuristically detect if a store is likely a TreeStore."""
          with blosc2.DictStore(path, mode="r") as store:
              keys = list(store.keys())
              
              # Check if any key has hierarchical structure
              has_hierarchy = any('/' in key and key.count('/') > 1 for key in keys)
              
              # Check for vlmeta keys (TreeStore specific pattern)
              has_vlmeta = any(key.endswith("/__vlmeta__") for key in store._estore.keys())
              
              return has_hierarchy or has_vlmeta
  • Or add an extra piece of metadata to vlmeta like so

    import blosc2
    
    # When creating a TreeStore
    with blosc2.TreeStore("my_store.b2z", mode="w") as tstore:
        tstore.vlmeta["__store_type__"] = "TreeStore"
        # rest of the code...
    
    # When opening an unknown store
    def open_store(path, mode="r"):
        """Open a store and detect if it's a TreeStore or DictStore."""
        # First, try opening as DictStore (parent class)
        store = blosc2.DictStore(path, mode=mode)
        
        # Check for TreeStore marker
        try:
            if store._estore and "/__vlmeta__" in store._estore:
                vlmeta_schunk = store._estore["/__vlmeta__"]
                if "__store_type__" in vlmeta_schunk.vlmeta:
                    if vlmeta_schunk.vlmeta["__store_type__"] == "TreeStore":
                        store.close()
                        return blosc2.TreeStore(path, mode=mode)
        except:
            pass
        
        return store

    I think the main disadvantage of the this method is that any TreeStores from before this release won't have this new metadata and so the library won't be able to reliably open them with this new strategy. So perhaps my first suggestion is better?

What do you think is a better fit? Or perhaps you have another perspective on how to approach this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants