Skip to content

Conversation

@XinyuZeng
Copy link

Also include a uv.lock for easier reproduction.

XinyuZeng and others added 2 commits November 8, 2025 16:17
This commit adds VLA, LeRobot loaders and a comprehensive benchmarking script to
evaluate loading performance across different robotics data formats (VLA, HDF5,
RLDS, LeRobot/HuggingFace). The VLA loader includes both shuffled (with
multiprocessing) and non-shuffled variants for flexible data loading workflows.

Key additions:
- VLALoader: Shuffled loader with multiprocessing and prefetch buffer
- NonShuffleVLALoader: Sequential loader for deterministic iteration
- LeRobotLoader: Support for HuggingFace-format datasets
- benchmarks/openx.py: Performance benchmarking across formats
- examples: Format conversion utilities (RLDS->VLA, VLA->HDF5)
- HDF5Loader: Added split parameter for train/val splits

The benchmark script measures loading times, average trajectory sizes, and
per-batch performance metrics with configurable batch sizes and format selection.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
for batch_num, data in enumerate(loader):
if batch_num >= self.num_batches:
break
# self._recursively_load_data(data)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I do not fully understand the reason of this function here, is it just for ensuring the data is correctly loaded (for debugging usage)?

logger = logging.getLogger(__name__)


class LeRobotLoader(BaseLoader):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is from the mkv branch. Unlike other loaders which includes random shuffle, I think the LeRobotLoader does not includes shuffling. Maybe we should add it?

super(HDF5Loader, self).__init__(path)
self.files = glob.glob(self.path, recursive=True)

# Handle split parameter similar to VLA loader
Copy link
Author

@XinyuZeng XinyuZeng Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from the code in mkv branch. For HDF5 and VLA, I assume there is a directory partition for train and test. e.g., ls robodm/vla/nyu_door_opening_surprising_effectiveness/ will get two directories train and test. Similar for HDF5.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check the versions are the ones we want, probably also update pyproject.toml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant