-
-
Notifications
You must be signed in to change notification settings - Fork 17
Bring benchmark code back to the latest code #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit adds VLA, LeRobot loaders and a comprehensive benchmarking script to evaluate loading performance across different robotics data formats (VLA, HDF5, RLDS, LeRobot/HuggingFace). The VLA loader includes both shuffled (with multiprocessing) and non-shuffled variants for flexible data loading workflows. Key additions: - VLALoader: Shuffled loader with multiprocessing and prefetch buffer - NonShuffleVLALoader: Sequential loader for deterministic iteration - LeRobotLoader: Support for HuggingFace-format datasets - benchmarks/openx.py: Performance benchmarking across formats - examples: Format conversion utilities (RLDS->VLA, VLA->HDF5) - HDF5Loader: Added split parameter for train/val splits The benchmark script measures loading times, average trajectory sizes, and per-batch performance metrics with configurable batch sizes and format selection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
| for batch_num, data in enumerate(loader): | ||
| if batch_num >= self.num_batches: | ||
| break | ||
| # self._recursively_load_data(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I do not fully understand the reason of this function here, is it just for ensuring the data is correctly loaded (for debugging usage)?
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class LeRobotLoader(BaseLoader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is from the mkv branch. Unlike other loaders which includes random shuffle, I think the LeRobotLoader does not includes shuffling. Maybe we should add it?
| super(HDF5Loader, self).__init__(path) | ||
| self.files = glob.glob(self.path, recursive=True) | ||
|
|
||
| # Handle split parameter similar to VLA loader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from the code in mkv branch. For HDF5 and VLA, I assume there is a directory partition for train and test. e.g., ls robodm/vla/nyu_door_opening_surprising_effectiveness/ will get two directories train and test. Similar for HDF5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check the versions are the ones we want, probably also update pyproject.toml
Also include a
uv.lockfor easier reproduction.