|
1 |
| -# SceneSegmentation-SCRL |
2 |
| -Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation" |
| 1 | +# Scene Consistency Representation Learning for Video Scene Segmentation (CVPR2022) |
| 2 | +This is an official PyTorch implementation of SCRL, the CVPR2022 paper is available at [here](https://openaccess.thecvf.com/content/CVPR2022/html/Wu_Scene_Consistency_Representation_Learning_for_Video_Scene_Segmentation_CVPR_2022_paper.html). |
| 3 | + |
| 4 | +# Getting Started |
| 5 | + |
| 6 | +## Data Preparation |
| 7 | +### MovieNet Dataset |
| 8 | +Download MovieNet Dataset from its [Official Website](https://movienet.github.io/). |
| 9 | +### SceneSeg318 Dataset |
| 10 | +Download the Annotation of [SceneSeg318](https://drive.google.com/drive/folders/1NFyL_IZvr1mQR3vR63XMYITU7rq9geY_?usp=sharing), you can find the download instructions in [LGSS](https://github.com/AnyiRao/SceneSeg/blob/master/docs/INSTALL.md) repository. |
| 11 | + |
| 12 | +### Make Puzzles for pre-training |
| 13 | +In order to reduce the number of IO accesses and perform data augmentation (a.k.a *Scene Agnostic Clip-Shuffling* in the paper) at the same time, we suggest to stitch 16 shots into one image (puzzle) during the pre-training stage. You can make the data by yourself: |
| 14 | +``` |
| 15 | +python ./data/data_preparation.py |
| 16 | +``` |
| 17 | +And the processed data will be saved in `./compressed_shot_images/`, a puzzle example [figure](./figures/puzzle_example.jpg). |
| 18 | +<!-- Or download the processed data in [here](). --> |
| 19 | + |
| 20 | + |
| 21 | +### Load the Data into Memory [Optional] |
| 22 | +We **strongly recommend** loading data into memory to speed up pre-training, which additionally requires your device to have at least 100GB of RAM. |
| 23 | +``` |
| 24 | +mkdir /tmpdata |
| 25 | +mount tmpfs /tmpdata -t tmpfs -o size=100G |
| 26 | +cp -r ./compressed_shot_images/ /tmpdata/ |
| 27 | +``` |
| 28 | + |
| 29 | + |
| 30 | +## Initialization Weights Preparation |
| 31 | +Download the ResNet-50 weights trained on ImageNet-1k ([resnet50-19c8e357.pth](https://download.pytorch.org/models/resnet50-19c8e357.pth)), and save it in `./pretrain/` folder. |
| 32 | + |
| 33 | +## Prerequisites |
| 34 | + |
| 35 | +* python >= 3.6 |
| 36 | +* pytorch >= 1.6 |
| 37 | +* cv2 |
| 38 | +* pickle |
| 39 | +* numpy |
| 40 | +* yaml |
| 41 | +* sklearn |
| 42 | + |
| 43 | + |
| 44 | +## Usage |
| 45 | +### STEP 1: Encoder Pre-training |
| 46 | +Using the default configuration to pretrain the model. Make sure the data path is correct and the GPUs are sufficient (e.g. 8 NVIDIA V100 GPUs) |
| 47 | +``` |
| 48 | +python pretrain_main.py --config ./config/SCRL_pretrain_default.yaml |
| 49 | +``` |
| 50 | +The checkpoint, copy of config and log will be saved in `./output/`. |
| 51 | + |
| 52 | +### STEP 2: Feature Extraction |
| 53 | + |
| 54 | +``` |
| 55 | +python extract_embeddings.py $CKP_PATH --shot_img_path $SHOT_PATH --Type all --gpu-id 0 |
| 56 | +``` |
| 57 | +`$CKP_PATH` is the path of an encoder checkpoint, and `$SHOT_PATH` is the keyframe path of MovieNet. |
| 58 | +The extracted embeddings (in pickle format) and log will be saved in `./embeddings/`. |
| 59 | + |
| 60 | +### STEP 3: Video Scene Segmentation Evaluation |
| 61 | + |
| 62 | +``` |
| 63 | +cd SceneSeg |
| 64 | +
|
| 65 | +python main.py \ |
| 66 | + -train $TRAIN_PKL_PATH \ |
| 67 | + -test $TEST_PKL_PATH \ |
| 68 | + -val $VAL_PKL_PATH \ |
| 69 | + --seq-len 40 \ |
| 70 | + --gpu-id 0 |
| 71 | +``` |
| 72 | + |
| 73 | +The checkpoints and log will be saved in `./SceneSeg/output/`. |
| 74 | + |
| 75 | +## Models |
| 76 | +We provide checkpoints, logs and results under two different pre-training settings, i.e. with and without ImageNet-1K initialization, respectively. |
| 77 | + |
| 78 | +| Initialization | AP | F1 | Config File | Pre-training <br> STEP 1| Embeddings <br> STEP 2| Fine-tuning <br> STEP 3 | |
| 79 | +| :-----| :---- | :---- | :---- | :-----| :---- | :---- | |
| 80 | +| w/o ImageNet-1k | 55.16 | 51.32 | `SCRL_pretrain_without_imagenet1k.yaml` | [ckp and log](https://drive.google.com/drive/folders/1ZYg9PFRU_lt3G5qJrldkguA52T2oxErR?usp=sharing) | [embedings](https://drive.google.com/drive/folders/1uen_HP3BZu8bcrPBikkgV3j9wzUjQ0C1?usp=sharing) | [log](https://drive.google.com/drive/folders/1rJbOnVbqTdPmnh2grIkePXOmwpNELnrK?usp=sharing) | |
| 81 | +| w/ ImageNet-1k | 56.65 | 52.45 | `SCRL_pretrain_with_imagenet1k.yaml` | [ckp and log](https://drive.google.com/drive/folders/1BG5ZLqrPKKGTtDIZj8aps_QuWc6K3c3V?usp=sharing) | [embedings](https://drive.google.com/drive/folders/1NFvGhkvRxpmEJYNjRnwp3ybuHQaG25gW?usp=sharing) | [log](https://drive.google.com/drive/folders/1dE0JFi-MDua70_CgI1CvyLNRnhwLjaUV?usp=sharing) | |
| 82 | + |
| 83 | + |
| 84 | +## License |
| 85 | +Please see [LICENSE](./LICENSE) file for the details. |
| 86 | + |
| 87 | +## Acknowledgments |
| 88 | +Part of codes are borrowed from the following repositories: |
| 89 | +* [MoCo](https://github.com/facebookresearch/moco) |
| 90 | +* [LGSS](https://github.com/AnyiRao/SceneSeg) |
| 91 | + |
| 92 | +## Citation |
| 93 | +Please cite our work if it's useful for your research. |
| 94 | +``` |
| 95 | +@InProceedings{Wu_2022_CVPR, |
| 96 | + author = {Wu, Haoqian and Chen, Keyu and Luo, Yanan and Qiao, Ruizhi and Ren, Bo and Liu, Haozhe and Xie, Weicheng and Shen, Linlin}, |
| 97 | + title = {Scene Consistency Representation Learning for Video Scene Segmentation}, |
| 98 | + booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| 99 | + month = {June}, |
| 100 | + year = {2022}, |
| 101 | + pages = {14021-14030} |
| 102 | +} |
| 103 | +``` |
0 commit comments