Skip to content

[Feature Request] Add glob/iglob API for pattern-based file listing across different storage backends #1634

@shenshanf

Description

@shenshanf

Currently, mmengine.fileio.list_dir_or_file doesn't support glob pattern matching when listing files. While Python's built-in glob.glob exists, it only works with local filesystem and cannot be used with other storage backends.

Proposed Solution

Add two new API functions:

def glob(pattern, *, recursive=False, backend_args=None):
    """Return a list of paths matching a pathname pattern.
    """
    pass

def iglob(pattern, *, recursive=False, backend_args=None):
    """Return an iterator yielding paths matching a pathname pattern.
    """
    pass

Example usage:

from mmengine.fileio import glob

# List all jpg files in a directory
files = glob('s3://bucket/path/*.jpg', backend_args={'access_key': '...'})

# Recursively find all .png files 
files = glob('local/path/**/*.png', recursive=True)

Current workaround requires manual filtering:

from mmengine.fileio import list_dir_or_file
import fnmatch

files = list_dir_or_file('s3://path/', list_dir=False)
jpg_files = [f for f in files if fnmatch.fnmatch(f, '*.jpg')] 

Having a backend-agnostic glob implementation would:

  1. Provide consistent pattern matching across different storage backends
  2. Simplify file filtering without manual pattern matching
  3. Match the functionality users expect from standard file operations
  4. Improve code readability when working with specific file patterns

Would appreciate feedback on this proposal. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions