This repository contains all of our ProgLoRA code. We sincerely thank the help of Chen et al.'s repository.
- Install Package
conda create -n prog python=3.10 -y
conda activate prog
pip install --upgrade pip
pip install -e .
- Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
This repo is based on CoIN. If you meet a problem, maybe you could find some solutions in issuses.
Please download the images from the constituting dataset: ScienceQA, VQAv2, VizWiz, TextVQA, GQA, OCR-VQA, ImageNet, RefCOCO, RefCOCO+, and RefCOCOg.
| Image Source | Download Path |
|---|---|
| COCO | train2014, test2015, val2014 |
| RefCOCO | annotation |
| RefCOCO+ | annotation |
| RefCOCOg | annotation |
| ImageNet | images |
| OCR-VQA | images |
| GQA | images |
| TextVQA | train,test |
| ScienceQA | images |
| VizWiz | train, val, test |
After downloading all of them, organize the data as follows:
├── COCO2014
│ └── train2014
├── GQA
│ └── images
├── OCR-VQA
│ └── images
├── TextVQA
│ └── train_images
│ └── test_images
Then, please download the instructions: CoIN_Dataset then, organize the instructions as follows:
├── Instruction_Original
│ └── GQA
│ └── train.json
│ └── test.json
│ └── ScienceQA
│ └── train.json
│ └── test.json
├── Instruction_Type2
│ └── GQA
│ └── train.json
│ └── test.json
First, downloading the pretrained projectors in LLaVA Model_Zoo and setting pretrain_mm_mlp_adapter.
We provide the training scripts in scripts/LLaVA/Train_MOE_dynamic_share.
We have prepared the scripts to evaluate the trained model in scripts/LLaVA/Eval_dynamic_share.