Long-VITA
git clone https://github.com/VITA-MLLM/Long-VITA.git
cd Long-VITA
git submodule update --init --recursive
pip install -r requirement.txt
pip install -e .
- Downloading LLM from https://huggingface.co/Qwen/Qwen2.5-14B-Instruct.
- Downloading VIT from https://huggingface.co/OpenGVLab/InternViT-300M-448px.
This stage requires at least 8 GPUs, each with at least 96G memory. (We only test on GPUs with 96G memory, and 80G GPUs may also work.)
bash scripts/deepspeed/qwen25/finetune_qwen25_14b_intern_300m_stage1.sh 16384 `date +'%Y%m%d_%H'`0000
The above script may need some adjustment.
- Set
MODEL_NAME_OR_PATH
to the LLM Huggingface weight folder - Set
VISION_MODEL_NAME_OR_PATH
to the VIT Huggingface weight folder - Modify other variables to suit the environment.
This stage requires at least 8 GPUs, each with at least 96G memory. (We only test on GPUs with 96G memory, and 80G GPUs may also work.)
bash scripts/megatron/qwen25/finetune_qwen25_14b_intern_300m_stage2.sh 16384 `date +'%Y%m%d_%H'`0000
The above script may need some adjustment.
- Set
MODEL_NAME_OR_PATH
to the output folder of the Stage-1 - Modify other variables to suit the environment.
Not Implemented Yet
For long-context training, we only support MindSpeed and Megatron frameworks now.
Not Implemented Yet
For long-context training, we only support MindSpeed and Megatron frameworks now.
We provide the converted Huggingface weights from MindSpeed weights in
- https://huggingface.co/VITA-MLLM/Long-VITA-16K_HF
- https://huggingface.co/VITA-MLLM/Long-VITA-128K_HF
- https://huggingface.co/VITA-MLLM/Long-VITA-1M_HF
Here we implement a simple script for inference
python tools/inference_long_vita.py VITA-MLLM/Long-VITA-16K_HF
python tools/inference_long_vita.py VITA-MLLM/Long-VITA-128K_HF
python tools/inference_long_vita.py VITA-MLLM/Long-VITA-1M_HF
Evaluate with VLMEvalKit
bash VLMEvalKit/evaluation_long_vita_hf.sh