PointCloud-GeoLab keeps large datasets out of git. The repository contains
small synthetic demo data for CI, while real data should live under
data/external/ or another local path passed to the examples.
| Dataset | Official Source | Local Use | Notes |
|---|---|---|---|
| Stanford Bunny / Armadillo | Stanford 3D Scanning Repository | registration, reconstruction, OBB/PCA | Bunny range data and reconstruction are small enough for laptop demos; Armadillo is larger. |
| KITTI Velodyne | KITTI Raw Data | LiDAR ground removal and clustering | KITTI stores Velodyne frames as repeated float32 x y z intensity tuples in .bin. |
| ModelNet10/40 | Princeton ModelNet download page | primitive/PCA demos and optional ML | ModelNet meshes are OFF files; convert a few samples to point clouds before running examples. |
data/external/
stanford/
bunny/
bunny.ply
bunny_pair/
bunny_source.ply
bunny_target.ply
armadillo/
Armadillo.ply
kitti/
velodyne/
000000.bin
modelnet_small/
chair.off
sample.xyz
manifest.json
Do not commit these files. Commit only scripts, docs, and tiny deterministic fixtures.
Print the expected layout:
python scripts/prepare_datasets.py summaryValidate what is present locally:
python scripts/prepare_datasets.py validate --write-manifest data/external/manifest.jsonConvert one KITTI Velodyne frame to PLY:
python scripts/prepare_datasets.py convert-kitti-bin \
--input data/external/kitti/velodyne/000000.bin \
--output data/external/kitti/velodyne/000000.plySample a ModelNet OFF mesh to an XYZ point cloud:
python scripts/prepare_datasets.py convert-modelnet-off \
--input data/external/modelnet_small/chair.off \
--output data/external/modelnet_small/sample.xyz \
--points 2048Create a deterministic Bunny source/target pair from one PLY:
python scripts/prepare_datasets.py make-bunny-pair \
--input data/external/stanford/bunny/bunny.ply \
--output-dir data/external/stanford/bunny_pairWrite checksums for reproducibility:
python scripts/prepare_datasets.py checksum \
--input data/external \
--output data/external/manifest.jsonpython examples/real_bunny_registration.py \
--data-dir data/external/stanford/bunny_pair \
--output-dir outputs/real_bunny
python examples/kitti_lidar_segmentation.py \
--frame data/external/kitti/velodyne/000000.bin \
--output-dir outputs/kitti_segmentation
python examples/modelnet_primitive_demo.py \
--input data/external/modelnet_small/sample.xyz \
--output-dir outputs/modelnet_demoEach example exits with code 2 and a specific preparation hint if the expected
data is missing.
Use checksums instead of committing large files:
python scripts/prepare_datasets.py checksum \
--input data/external \
--output data/external/manifest.jsonThe manifest records relative path, byte size, and SHA256 digest. This makes it possible to reproduce benchmark inputs while keeping the repository small.