🛠️ Installation

ForesightNav: Learning Scene Imagination for Efficient Exploration

Hardik Shah¹ . Jiaxu Xing² . Nico Messikommer² . Boyang Sun¹ . Marc Pollefeys^{1, 3} . Davide Scaramuzza²

Computer Vision And Pattern Recognition(CVPR) Workshop 2025 - OpenSun3D

¹ETH Zürich · ²University of Zürich · ³Microsoft Spatial AI Lab

📃 Abstract

Understanding how humans leverage prior knowledge to navigate unseen environments while making exploratory decisions is essential for developing autonomous robots with similar abilities. In this work, we propose ForesightNav, a novel exploration strategy inspired by human imagination and reasoning. Our approach equips robotic agents with the capability to predict contextual information, such as occupancy and semantic details, for unexplored regions. These predictions enable the robot to efficiently select meaningful long-term navigation goals, significantly enhancing exploration in unseen environments. We validate our imagination-based approach using the Structured3D dataset, demonstrating accurate occupancy prediction and superior performance in anticipating unseen scene geometry. Our experiments show that the imagination module improves the efficiency of exploration in unseen environments, achieving a 100% completion rate for PointNav and an SPL of 67% for ObjectNav on the Structured3D Validation split. These contributions demonstrate the power of imagination driven reasoning for autonomous systems to enhance generalizable and efficient exploration

🛠️ Installation

The code has been tested on:

Ubuntu: 20.04 LTS
Python: 3.12.3
CUDA: 12.6
GPU: Tesla T4 (for inference), A100 (for training)

📦 Setup

Clone the repo and setup as follows:

$ git clone [email protected]:uzh-rpg/foresight-nav.git
$ cd foresight-nav
$ conda env create -f req.yml
$ conda activate foresight

Since we use the ViT models from the mae repository, it requires an older version of timm. Fix the timm install by changing the torch._six import to collections.abc as directed in this issue.

⬇️ Data

Checkpoints

We provide all required checkpoints here. Download the checkpoints and place them in the checkpoints directory. The directory structure should look like this:

foresight-nav
└── checkpoints
    ├── objectnav
    │   ├── unet_cosine_sim.pth
    │   └── unet_category_cosine_sim.pth
    └── LSeg
        └──lseg_checkpoint.pth

Structured3D Data Preparation

See DATA.MD for detailed instructions on data download, preparation and preprocessing. Bash scripts are provided to download and prepare the data. Change the DATA_DIR variable in the scripts to your desired data directory, and add the url for the Structured3D dataset in datagen_imagination/download_struct3d.py after following the instructions for Structured3D dataset download here.

Set your enviornment variables with:

export DATA_DIR=/path/to/your/data
export SCENE_DIR="${DATA_DIR}/Structured3D"
export UTILS_DIR="${DATA_DIR}/training_utils"

$ bash scripts/download_structured3d.sh
$ bash scripts/occupancy_datagen.sh
$ bash scripts/geosem_datagen.sh
$ bash scripts/prepare_training_utils.sh

The above scripts will download the Structured3D dataset and generate the occupancy maps and GeoSem maps. Below is a visualization of the generated maps for a scene in the Structured3D dataset.

_{Occupancy Map}

_{Groundtruth Interior-Exterior Mask}

_{Scene Point Cloud}

_{Agent Simulation}

_{Top-Down Color Map}

_{Density Map}

_{GeoSem Map (Top-Down Semantic Segmentation)}

🚀 Querying GeoSem Maps using Language Queries

The generated GeoSem Maps can be used to localize objects in the scene using language queries. We provide a script to visualize the localization of the query on the GeoSem map. Running the following command will prompt the user to enter a language query, and the script will visualize the localization of the query on the GeoSem map. Results will be saved in the specified directory.

$ mkdir ./langq_geosem
$ python -m datagen_imagination.geosem_map_generation.viz_openlang_heatmap \
  --root_path="${SCENE_DIR}/scene_00000/GeoSemMap" \
  --vis_path=./langq_geosem

_{Raw Similarity Score Visualization as a Heatmap}

_{Localization of the Query after processing raw scores}

_{Localization of an OpenVocabulary Query}

🏋️ Training and Inference

We used the training script from the mae repository for training the imagination module. We have not released this yet since the mae code is licensed under a different license. We are migrating this training code to a new training setup and will release it soon! Losses, metrics and logging are already available in train_utils.

For evaluating the imagination models in an ObjectNav task on the Structured3D validation scenes, use the following command:

$ python -m evaluation.evaluate_imagine --conf='configs/evaluation_conf.yaml'

Performance of the agent is logged to wandb as the evaluation progresses, and per-scene and per-object-category metrics can be analysed in realtime. Make sure to change the wandb.init() call in evaluation/evaluate_imagine.py with an appropriate entity/project name. Below is an example of a logged image showing a topdown view of the scene with the trajectories of different agents exploring for locating a 'TV'

🚧 TODO List

Release data generation code
Release ObjectNav models and evaluation code
Release training code
Release PointNav models and evaluation code (Using just occupancy prediction)

📧 Contact

If you have any questions regarding this project, please use the github issue tracker or contact Hardik Shah.

🙏 Acknowledgements

We thank the authors from Structured3D, RoomFormer, VLMaps and mae for open-sourcing their codebases. We also thank the authors of CrossOver for this README template!

📄 Citation

If you find this work and/or code useful, please cite our paper:

@inproceedings{shah2025foresightnav,
  title={ForesightNav: Learning Scene Imagination for Efficient Exploration},
  author={Shah, Hardik and Xing, Jiaxu and Messikommer, Nico and Sun, Boyang and Pollefeys, Marc and Scaramuzza, Davide},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
configs		configs
datagen_imagination		datagen_imagination
dataloader		dataloader
evaluation		evaluation
extras		extras
models		models
scripts		scripts
train_utils		train_utils
.gitignore		.gitignore
DATA.md		DATA.md
LICENSE		LICENSE
README.md		README.md
req.yml		req.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ForesightNav: Learning Scene Imagination for Efficient Exploration

📃 Abstract

🛠️ Installation

📦 Setup

⬇️ Data

Checkpoints

Structured3D Data Preparation

🚀 Querying GeoSem Maps using Language Queries

🏋️ Training and Inference

🚧 TODO List

📧 Contact

🙏 Acknowledgements

📄 Citation

About

Uh oh!

Releases

Packages

Languages

License

uzh-rpg/foresight-nav

Folders and files

Latest commit

History

Repository files navigation

ForesightNav: Learning Scene Imagination for Efficient Exploration

📃 Abstract

🛠️ Installation

📦 Setup

⬇️ Data

Checkpoints

Structured3D Data Preparation

🚀 Querying GeoSem Maps using Language Queries

🏋️ Training and Inference

🚧 TODO List

📧 Contact

🙏 Acknowledgements

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages