The repository provides a collection of vision language models, benchmarks, and related applications, released as part of Project MONAI (Medical Open Network for Artificial Intelligence).
- [2024/10/31] We released the VILA-M3-3B, VILA-M3-8B, and VILA-M3-13B checkpoints on HuggingFace.
- [2024/10/24] We presented VILA-M3 and the VLM module in MONAI at MONAI Day (slides, recording)
- [2024/10/24] Interactive VILA-M3 Demo is available online!
VILA-M3 is a vision language model designed specifically for medical applications. It focuses on addressing the unique challenges faced by general-purpose vision-language models when applied to the medical domain and integrated with existing expert segmentation and classification models.
For details, see here.
Please visit the VILA-M3 Demo to try out a preview version of the model.
NOTE: The URL https://vila-m3-demo.monai.ngc.nvidia.com/
is temporarily unavailable. Please use the new URL https://20.163.25.224/
instead.
- To run the demo, we recommend building a Docker container with all the requirements.
We use a base image with cuda preinstalled.
docker build --network=host --progress=plain -t monai-m3:latest -f m3/demo/Dockerfile .
- Run the container
docker run -it --rm --ipc host --gpus all --net host monai-m3:latest bash
Note: If you want to load your own VILA checkpoint in the demo, you need to mount a folder using
-v <your_ckpts_dir>:/data/checkpoints
in yourdocker run
command. - Next, follow the steps to start the Gradio Demo.
-
Linux Operating System
-
CUDA Toolkit 12.2 (with
nvcc
) for VILA.To verify CUDA installation, run:
nvcc --version
If CUDA is not installed, use one of the following methods:
- Recommended Use the Docker image:
nvidia/cuda:12.2.2-devel-ubuntu22.04
docker run -it --rm --ipc host --gpus all --net host nvidia/cuda:12.2.2-devel-ubuntu22.04 bash
- Manual Installation (not recommended) Download the appropiate package from NVIDIA offical page
- Recommended Use the Docker image:
-
Python 3.10 Git Wget and Unzip:
To install these, run
sudo apt-get update sudo apt-get install -y wget python3.10 python3.10-venv python3.10-dev git unzip
NOTE: The commands are tailored for the Docker image
nvidia/cuda:12.2.2-devel-ubuntu22.04
. If using a different setup, adjust the commands accordingly. -
GPU Memory: Ensure that the GPU has sufficient memory to run the models:
- VILA-M3: 8B: ~18GB, 13B: ~30GB
- CXR: This expert dynamically loads various TorchXRayVision models and performs ensemble predictions. The memory requirement is roughly 1.5GB in total.
- VISTA3D: This expert model dynamically loads the VISTA3D model to segment a 3D-CT volume. The memory requirement is roughly 12GB, and peak memory usage can be higher, depending on the input size of the 3D volume.
- BRATS: (TBD)
-
Setup Environment: Clone the repository, set up the environment, and download the experts' checkpoints:
git clone https://github.com/Project-MONAI/VLM --recursive cd VLM python3.10 -m venv .venv source .venv/bin/activate make demo_m3
-
Navigate to the demo directory:
cd m3/demo
-
Start the Gradio demo:
This will automatically download the default VILA-M3 checkpoint from Hugging Face.
python gradio_m3.py
-
Alternative: Start the Gradio demo with a local checkpoint, e.g.:
python gradio_m3.py \ --source local \ --modelpath /data/checkpoints/<8B-checkpoint-name> \ --convmode llama_3
For details, see the available commmandline arguments.
- This is still a work in progress. Please refer to the README for more details.
To lint the code, please install these packages:
pip install -r requirements-ci.txt
Then run the following command:
isort --check-only --diff . # using the configuration in pyproject.toml
black . --check # using the configuration in pyproject.toml
ruff check . # using the configuration in ruff.toml
To auto-format the code, run the following command:
isort . && black . && ruff format .