Hardik Shah1 . Jiaxu Xing2 . Nico Messikommer2 . Boyang Sun1 . Marc Pollefeys1, 3 . Davide Scaramuzza2
Computer Vision And Pattern Recognition(CVPR) Workshop 2025 - OpenSun3D
1ETH Zürich · 2University of Zürich · 3Microsoft Spatial AI Lab
Understanding how humans leverage prior knowledge to navigate unseen environments while making exploratory decisions is essential for developing autonomous robots with similar abilities. In this work, we propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning. Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions. These predictions enable the robot to efficiently select meaningful long-term navigation goals, significantly enhancing exploration in unseen environments. We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry. Our experiments show that the imagination module improves the efficiency of exploration in unseen environments, achieving a 100% completion rate for PointNav and an SPL of 67% for ObjectNav on the Structured3D Validation split. These contributions demonstrate the power of imagination driven reasoning for autonomous systems to enhance generalizable and efficient exploration
The code has been tested on:
Ubuntu: 20.04 LTS
Python: 3.12.3
CUDA: 12.6
GPU: Tesla T4 (for inference), A100 (for training)
Clone the repo and setup as follows:
$ git clone [email protected]:uzh-rpg/foresight-nav.git
$ cd foresight-nav
$ conda env create -f req.yml
$ conda activate foresight
Since we use the ViT models from the mae repository, it requires an older version of timm. Fix the timm install by changing the torch._six
import to collections.abc
as directed in this issue.
We provide all required checkpoints here. Download the checkpoints and place them in the checkpoints
directory. The directory structure should look like this:
foresight-nav
└── checkpoints
├── objectnav
│ ├── unet_cosine_sim.pth
│ └── unet_category_cosine_sim.pth
└── LSeg
└──lseg_checkpoint.pth
See DATA.MD for detailed instructions on data download, preparation and preprocessing. Bash scripts are provided to download and prepare the data. Change the DATA_DIR
variable in the scripts to your desired data directory, and add the url for the Structured3D dataset in datagen_imagination/download_struct3d.py
after following the instructions for Structured3D dataset download here.
Set your enviornment variables with:
export DATA_DIR=/path/to/your/data
export SCENE_DIR="${DATA_DIR}/Structured3D"
export UTILS_DIR="${DATA_DIR}/training_utils"
$ bash scripts/download_structured3d.sh
$ bash scripts/occupancy_datagen.sh
$ bash scripts/geosem_datagen.sh
$ bash scripts/prepare_training_utils.sh
The above scripts will download the Structured3D dataset and generate the occupancy maps and GeoSem maps. Below is a visualization of the generated maps for a scene in the Structured3D dataset.
The generated GeoSem Maps can be used to localize objects in the scene using language queries. We provide a script to visualize the localization of the query on the GeoSem map. Running the following command will prompt the user to enter a language query, and the script will visualize the localization of the query on the GeoSem map. Results will be saved in the specified directory.
$ mkdir ./langq_geosem
$ python -m datagen_imagination.geosem_map_generation.viz_openlang_heatmap \
--root_path="${SCENE_DIR}/scene_00000/GeoSemMap" \
--vis_path=./langq_geosem
![]() Raw Similarity Score Visualization as a Heatmap |
![]() Localization of the Query after processing raw scores |
We used the training script from the mae repository for training the imagination module. We have not released this yet since the mae code is licensed under a different license. We are migrating this training code to a new training setup and will release it soon! Losses, metrics and logging are already available in train_utils.
For evaluating the imagination models in an ObjectNav task on the Structured3D validation scenes, use the following command:
$ python -m evaluation.evaluate_imagine --conf='configs/evaluation_conf.yaml'
Performance of the agent is logged to wandb as the evaluation progresses, and per-scene and per-object-category metrics can be analysed in realtime. Make sure to change the wandb.init()
call in evaluation/evaluate_imagine.py with an appropriate entity/project name. Below is an example of a logged image showing a topdown view of the scene with the trajectories of different agents exploring for locating a 'TV'
- Release data generation code
- Release ObjectNav models and evaluation code
- Release training code
- Release PointNav models and evaluation code (Using just occupancy prediction)
If you have any questions regarding this project, please use the github issue tracker or contact Hardik Shah.
We thank the authors from Structured3D, RoomFormer, VLMaps and mae for open-sourcing their codebases. We also thank the authors of CrossOver for this README template!
If you find this work and/or code useful, please cite our paper:
@inproceedings{shah2025foresightnav,
title={ForesightNav: Learning Scene Imagination for Efficient Exploration},
author={Shah, Hardik and Xing, Jiaxu and Messikommer, Nico and Sun, Boyang and Pollefeys, Marc and Scaramuzza, Davide},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
year={2025}
}