English | 简体中文
This repository contains the official implementation of the paper DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding, developed based on the OpenMMLab framework.
DynamicVis is a dynamic visual perception foundation model for remote sensing, achieving efficient low-resource parsing of ultra-large images (2048x2048 pixels processing requires only 800MB GPU RAM) through a selective region-aware architecture and multi-instance meta-embedding learning. The model demonstrates exceptional performance across nine remote sensing downstream tasks, with ~20x computational efficiency and ~97% memory reduction compared to ViT, enabling cross-task understanding of high-resolution remote sensing imagery.
The current branch has been tested on Linux systems with PyTorch 2.x and CUDA 12.1, supporting Python 3.10+ and compatible with most CUDA versions.
If you find this project helpful, please give us a star ⭐️. Your support is our greatest motivation.
Main Features
- API interfaces and usage methods highly consistent with OpenMMLab
- Open-sourced DynamicVis models and weights of different scales as described in the paper
- Supports fine-tuning and testing for nine remote sensing downstream tasks mentioned in the paper
🌟 2025.03.20 Released DynamicVis project.
🌟 2025.03.21 Updated DynamicVis pretraining code.
🌟 2025.03.22 Updated the scene classification fine-tuning code for DynamicVis.
🌟 2025.03.22 Updated the tiny object detection fine-tuning code for DynamicVis.
🌟 2025.03.31 Updated the fine-tuning code for all tasks of DynamicVis.
🌟 2025.04.01 Uploaded the pretraining weights of DynamicVis.
- Organize DynamicVis pretraining code
- Organize fine-tuning and testing code for nine tasks in the paper
- Upload DynamicVis model weights
- Upload DynamicVis model weights without token selection
- Introduction
- Changelog
- TODO
- Table of Contents
- Installation
- Dataset Preparation
- Model Pretraining
- Model Fine-tuning
- Pretrained Weights Download
- FAQ
- Acknowledgements
- Citation
- License
- Contact
- Linux OS (Windows not supported for Mamba)
- Python 3.10+ (3.11 recommended)
- PyTorch 2.0+ (2.4 recommended)
- CUDA 11.7+ (12.1 recommended)
- MMCV 2.0+ (2.2 recommended)
- Mamba 2.2.4
We recommend using Miniconda for installation. The following commands will create a virtual environment named dynamicvis
and install PyTorch and MMCV. The default CUDA version in these instructions is 12.1. Modify accordingly if using a different CUDA version.
Step 0: Install Miniconda.
Step 1: Create and activate a virtual environment:
conda create -n dynamicvis python=3.11 -y
conda activate dynamicvis
Step 2: Install PyTorch 2.4.x.
Linux:
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
# OR
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia
Step 3: Install MMCV 2.2.x.
pip install -U openmim
mim install mmcv==2.2.0
# OR
pip install mmcv==2.2.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.4/index.html
Step 4: Install causal-conv1d and Mamba 2.2.4:
# Install causal-conv1d
wget https://github.com/Dao-AILab/causal-conv1d/releases/download/v1.5.0.post8/causal_conv1d-1.5.0.post8+cu12torch2.4cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip install causal_conv1d-1.5.0.post8+cu12torch2.4cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
# Install mamba
wget https://github.com/state-spaces/mamba/releases/download/v2.2.4/mamba_ssm-2.2.4+cu12torch2.4cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip install mamba_ssm-2.2.4+cu12torch2.4cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
Note: If you encounter issues, try using pip install
.
pip install causal-conv1d==1.5.0.post8
pip install mamba==2.2.4
# OR
# Compile and install first, and keep the whl file for future installations to avoid recompilation
# pip wheel --wheel-dir=../../software/mamba2-2.4.4/ causal-conv1d==1.5.0.post8 -i https://pypi.org/simple
# pip wheel --wheel-dir=../../software/mamba2-2.4.4/ mamba-ssm==2.2.4 -i https://pypi.org/simple
# pip install ../../software/mamba2-2.4.4/causal_conv1d-1.5.0.post8-cp311-cp311-linux_x86_64.whl
# pip install ../../software/mamba2-2.4.4/mamba_ssm-2.2.4-cp311-cp311-linux_x86_64.whl
Step 5: Install other dependencies:
pip install transformers==4.50.0
pip install -U ipdb braceexpand mat4py pycocotools shapely ftfy scipy terminaltables wandb prettytable torchmetrics importlib_metadata einops
Download or clone the DynamicVis repository。
git clone [email protected]:KyanChen/DynamicVis.git
cd DynamicVis
- Dataset: fMoW Dataset
- Download the fMoW-rgb subset:
pip install awscli
# Delete AWS configuration file
rm -rf ~/.aws
# List files
aws s3 ls --no-sign-request s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/
# Download files
aws s3 sync --no-sign-request s3://spacenet-dataset/Hosted-Datasets/fmow/fmow-rgb/ ./data/fmow-rgb/
We use WebDataset to organize large-scale pretraining data. WebDataset is a data loading library for large-scale datasets that can efficiently handle large datasets.
# Reorganize the data into Tar package format required by WebDataset
python tools_DynamicVis/tools_data/fMoW/get_fmow_train_val_data.py
${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data/UC
├── airplane
│ ├── airplane01.tif
│ ├── airplane02.tif
│ └── ...
├── ...
├── ...
├── ...
└── ...
Note: In the project folder datainfo
, we provide the split files for the datasets. You can also use the Python script to split the datasets.
- Levir-Ship Dataset, download the COCO format dataset annotations.
${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data/Levir-Ship
├── images
│ ├── train
│ │ ├── train_0.jpg
│ │ ├── ...
│ │ └── ...
│ └── val
│ │ ├── val_0.jpg
│ │ ├── ...
│ │ └── ...
│ └── test
│ │ ├── test_0.jpg
│ │ ├── ...
│ │ └── ...
├── annotations
│ ├── train.json
│ ├── val.json
│ └── test.json
Note: In the datainfo
folder of this project, we provide the instance annotations for the above datasets, which you can use directly.
${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data/NWPU
├── images
│ ├── train
│ │ ├── train_0.jpg
│ │ ├── ...
│ │ └── ...
│ └── val
│ │ ├── val_0.jpg
│ │ ├── ...
│ │ └── ...
│ └── test
│ │ ├── test_0.jpg
│ │ ├── ...
│ │ └── ...
├── annotations
│ ├── train.json
│ ├── val.json
│ └── test.json
${DATASET_ROOT} # 数据集根目录,例如:/home/username/data/Massachusetts
${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data/Massachusetts
├── train_imgs
│ ├── train_0.tif
│ ├── ...
│ └── ...
├── val_imgs
│ ├── val_0.tif
│ ├── ...
│ └── ...
├── test_imgs
│ ├── test_0.tif
│ ├── ...
│ └── ...
├── train_labels
│ ├── train_0.tif
│ ├── ...
│ └── ...
├── val_labels
│ ├── val_0.tif
│ ├── ...
│ └── ...
├── test_labels
│ ├── test_0.tif
│ ├── ...
│ └── ...
${DATASET_ROOT} # 数据集根目录,例如:/home/username/data/LEVIR-CD
${DATASET_ROOT} # Dataset root directory, e.g., /home/username/data/LEVIR-CD
├── A
│ ├── A_0.jpg
│ ├── ...
│ └── ...
├── B
│ ├── B_0.jpg
│ ├── ...
│ └── ...
├── labels
│ ├── A_0.png
│ ├── ...
│ └── ...
Note: We provide the relevant processing code for the data in the tools_DynamicVis/tools_data folder.
We provide configuration files for DynamicVis models of different parameter sizes as described in the paper. You can find them in the configs_DynamicVis/fMoW folder. The config files maintain consistent API interfaces and usage methods with OpenMMLab. Below are some key parameter explanations. For more information on the parameters, refer to the OpenMMLab documentation.
Parameter Explanation:
work_dir
: Output path for model training, generally no need to modify.data_root
: Dataset root directory, modify to the absolute path of the dataset root.code_root
: Code root directory, modify to the absolute path of the code root.batch_size
: Batch size per GPU, modify according to GPU memory size.max_epochs
: Maximum number of training epochs, generally no need to modify.val_interval
: Interval of validation set, generally no need to modify.vis_backends/WandbVisBackend
: Configuration of network-side visualization tools, after uncommenting, you need to register an account on thewandb
official website to view the visualization results during training in a web browser.load_from
: Path to the model's pretraining checkpoint, generally no need to modify.resume
: Whether to resume training from a checkpoint, generally no need to modify.default_hooks/CheckpointHook
: Configuration of model checkpoint saving during training, generally no need to modify.model/backbone
: Visual backbone of the DynamicVis model, modify according to actual situation.model/backbone/arch
: Configuration of the main network, modify according to actual situation.model/backbone/spatial_token_keep_ratios
: Spatial token retention ratio, modify according to actual situation.model/pre_neck
: FPN Neck of the DynamicVis model.model/neck
: Region feature extractor of the DynamicVis model, generally no need to modify.model/head
: Classification head of the DynamicVis model, generally no need to modify.optim_wrapper
: Configuration of the optimizer, generally no need to modify.data_preprocessor/mean/std
: Mean and standard deviation of data preprocessing, generally no need to modify.
# Single GPU
python tools_mmpretrain/train.py configs_DynamicVis/fMoW/name_to_config.py
# Multi-GPU
sh tools_mmpretrain/dist_train.sh configs_DynamicVis/fMoW/name_to_config.py ${GPU_NUM}
# Single GPU
python tools_mmpretrain/test.py configs_DynamicVis/fMoW/name_to_config.py ${CHECKPOINT_FILE}
# Multi-GPU
sh tools_mmpretrain/dist_test.sh configs_DynamicVis/fMoW/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM}
We provide configuration files for the UC Merced and AID datasets mentioned in the paper. You can find them in the UC configuration file and AID configuration file folders.
The following are some key parameter explanations other than the pretraining part of the Config.
Parameter Explanation:
pretrained_ckpt
:模型微调的预训练检查点路径,需要根据实际情况进行修改。pretrained_ckpt
: Path to the pretrained checkpoint for model fine-tuning, needs to be modified according to the actual situation.
# Single GPU
python tools_mmpretrain/train.py configs_DynamicVis/UCMerced/name_to_config.py # name_to_config.py is the config file you want to use
# Multi-GPU
sh tools_mmpretrain/dist_train.sh configs_DynamicVis/UCMerced/name_to_config.py ${GPU_NUM} # name_to_config.py is the config file you want to use, GPU_NUM is the number of GPUs used
# Single GPU
python tools_mmpretrain/test.py configs_DynamicVis/UCMerced/name_to_config.py ${CHECKPOINT_FILE} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use
# Multi-GPU
sh tools_mmpretrain/dist_test.sh configs_DynamicVis/UCMerced/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use, GPU_NUM is the number of GPUs used
We provide configuration files for the Levir-Ship dataset mentioned in the paper. You can find them in the Levir-Ship configuration file folder.
The following are some key parameter explanations other than the pretraining part of the Config.
Parameter Explanation:
pretrained_ckpt
: Path to the pretrained checkpoint for model fine-tuning, needs to be modified according to the actual situation.default_hooks/visualization
: Control whether to visualize during val and test, need to modify thedraw
andinterval
parameters according to the actual situation.
# Single GPU
python tools_mmdet/train.py configs_DynamicVis/Levir-Ship/name_to_config.py # name_to_config.py is the config file you want to use
# Multi-GPU
sh tools_mmdet/dist_train.sh configs_DynamicVis/Levir-Ship/name_to_config.py ${GPU_NUM} # name_to_config.py is the config file you want to use, GPU_NUM is the number of GPUs used
# Single GPU
python tools_mmdet/test.py configs_DynamicVis/Levir-Ship/name_to_config.py ${CHECKPOINT_FILE} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use
# Multi-GPU
sh tools_mmdet/dist_test.sh configs_DynamicVis/Levir-Ship/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use, GPU_NUM is the number of GPUs used
We provide configuration files for the NWPU and SSDD datasets mentioned in the paper. You can find them in the NWPU configuration file and SSDD configuration file folders.
The following are some key parameter explanations other than the pretraining part of the Config.
Parameter Explanation:
pretrained_ckpt
: Path to the pretrained checkpoint for model fine-tuning, needs to be modified according to the actual situation.default_hooks/visualization
: Control whether to visualize during val and test, need to modify thedraw
andinterval
parameters according to the actual situation.visualizer
: Control the parameters during visualization, such asline_width
,alpha
, etc., need to modify according to the actual situation.
# Single GPU
python tools_mmdet/train.py configs_DynamicVis/NWPU/name_to_config.py # name_to_config.py is the config file you want to use
# Multi-GPU
sh tools_mmdet/dist_train.sh configs_DynamicVis/NWPU/name_to_config.py ${GPU_NUM} # name_to_config.py is the config file you want to use, GPU_NUM is the number of GPUs used
# Single GPU
python tools_mmdet/test.py configs_DynamicVis/NWPU/name_to_config.py ${CHECKPOINT_FILE} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use
# Multi-GPU
sh tools_mmdet/dist_test.sh configs_DynamicVis/NWPU/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use, GPU_NUM is the number of GPUs used
We provide configuration files for the Massachusetts and WHU datasets mentioned in the paper. You can find them in the Massachusetts configuration file and WHU configuration file folders.
The following are some key parameter explanations other than the pretraining part of the Config.
Parameter Explanation:
pretrained_ckpt
: Path to the pretrained checkpoint for model fine-tuning, needs to be modified according to the actual situation.default_hooks/visualization
: Control whether to visualize during val and test, need to modify thedraw
andinterval
parameters according to the actual situation.visualizer
: Control the parameters during visualization, such asalpha
, etc., need to modify according to the actual situation.
# Single GPU
python tools_mmseg/train.py configs_DynamicVis/Massachusetts/name_to_config.py # name_to_config.py is the config file you want to use
# Multi-GPU
sh tools_mmseg/dist_train.sh configs_DynamicVis/Massachusetts/name_to_config.py ${GPU_NUM} # name_to_config.py is the config file you want to use, GPU_NUM is the number of GPUs used
# Single GPU
python tools_mmseg/test.py configs_DynamicVis/Massachusetts/name_to_config.py ${CHECKPOINT_FILE} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use
# Multi-GPU
sh tools_mmseg/dist_test.sh configs_DynamicVis/Massachusetts/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use, GPU_NUM is the number of GPUs used
We provide configuration files for the LEVIR-CD, WHU-CD, and OSCD datasets mentioned in the paper. You can find them in the LEVIR-CD configuration file, WHU-CD configuration file, and OSCD configuration file folders.
The following are some key parameter explanations other than the pretraining part of the Config.
Parameter Explanation:
pretrained_ckpt
: Path to the pretrained checkpoint for model fine-tuning, needs to be modified according to the actual situation.default_hooks/visualization
: Control whether to visualize during val and test, need to modify thedraw
andinterval
parameters according to the actual situation.visualizer
: Control the parameters during visualization, such asalpha
, etc., need to modify according to the actual situation.
# Single GPU
python tools_opencd/train.py configs_DynamicVis/LEVIR-CD/name_to_config.py # name_to_config.py is the config file you want to use
# Multi-GPU
sh tools_opencd/dist_train.sh configs_DynamicVis/LEVIR-CD/name_to_config.py ${GPU_NUM} # name_to_config.py is the config file you want to use, GPU_NUM is the number of GPUs used
# Single GPU
python tools_opencd/test.py configs_DynamicVis/LEVIR-CD/name_to_config.py ${CHECKPOINT_FILE} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use
# Multi-GPU
sh tools_opencd/dist_test.sh configs_DynamicVis/LEVIR-CD/name_to_config.py ${CHECKPOINT_FILE} ${GPU_NUM} # name_to_config.py is the config file you want to use, CHECKPOINT_FILE is the checkpoint file you want to use, GPU_NUM is the number of GPUs used
We provide the configuration files for image retrieval in the image retrieval configuration file folder.
You can download the pretrained weights from Hugging Face.
b
andl
represent the size of the model, corresponding tobase
andlarge
, respectively.wo-token-selection
indicates that the model does not use the selective region-aware architecture.X-epoch
indicates the weights of the model at theX
-th training epoch.
We list some common problems and their corresponding solutions here. If you find any problems missing, please feel free to submit a PR to enrich this list. If you cannot find help here, please use issue to seek help. Please fill in all the required information in the template, which will help us locate the problem more quickly.
We recommend that you do not install the MM series packages (such as MMDet), as we have included everything you need. If you install the MM series packages, you may encounter errors when running the code. If you encounter an error that the module has not been registered, please check:
- Whether the module is a package that needs to be installed, if so, install it
- Whether the MM series packages are installed, if so, uninstall them
- Whether
@MODELS.register_module()
is added before the class name, if not, add it - Whether
from .xxx import xxx
is added in__init__.py
, if not, add it - Whether
custom_imports = dict(imports=['dynamicvis'], allow_failed_imports=False)
is added in the Config file, if not, add it
If you encounter a Bad substitution
error when running dist_train.sh
, please use bash dist_train.sh
to run the script.
This project is built upon OpenMMLab. We thank the OpenMMLab developers.
If you use DynamicVis in your research, please cite:
@article{chen2025dynamicvis,
title={DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding},
author={Chen, Keyan and Liu, Chenyang and Chen, Bowen and Li, Wenyuan and Zou, Zhengxia and Shi, Zhenwei},
journal={arXiv preprint arXiv:2503.16426},
year={2025}
}
This project is licensed under the Apache 2.0 License.
For further questions❓, feel free to contact us 👬