|
| 1 | + |
| 2 | + |
| 3 | +# SparseTrack |
| 4 | +#### SparseTrack is a simply and strong multi-object tracker. |
| 5 | + |
| 6 | +[](https://paperswithcode.com/sota/multi-object-tracking-on-mot20-1?p=sparsetrack-multi-object-tracking-by) |
| 7 | + |
| 8 | +[](https://paperswithcode.com/sota/multi-object-tracking-on-mot17?p=sparsetrack-multi-object-tracking-by) |
| 9 | + |
| 10 | +> [**SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth**](https://arxiv.org/abs/2306.05238) |
| 11 | +> |
| 12 | +> Zelin Liu, Xinggang Wang, Cheng Wang, Wenyu Liu, Xiang Bai |
| 13 | +> |
| 14 | +> *[arXiv 2306.05238](https://arxiv.org/abs/2306.05238)* |
| 15 | +
|
| 16 | + |
| 17 | +## Abstract |
| 18 | +Exploring robust and efficient association methods has always been an important issue in multiple-object tracking (MOT). Although existing tracking methods have achieved impressive performance, congestion and frequent occlusions still pose challenging problems in multi-object tracking. We reveal that performing sparse decomposition on dense scenes is a crucial step to enhance the performance of associating occluded targets. To this end, we propose a pseudo-depth estimation method for obtaining the relative depth of targets from 2D images. Secondly, we design a depth cascading matching (DCM) algorithm, which can use the obtained depth information to convert a dense target set into multiple sparse target subsets and perform data association on these sparse target subsets in order from near to far. By integrating the pseudo-depth method and the DCM strategy into the data association process, we propose a new tracker, called SparseTrack. SparseTrack provides a new perspective for solving the challenging crowded scene MOT problem. Only using IoU matching, SparseTrack achieves comparable performance with the state-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. |
| 19 | + |
| 20 | +<p align="center"><img src="assets/DCM.png" width="500"/></p> |
| 21 | + |
| 22 | +## Tracking performance |
| 23 | +### Results on MOT challenge test set |
| 24 | +| Dataset | HOTA | MOTA | IDF1 | MT | ML | FP | FN | IDs | |
| 25 | +|------------|-------|-------|------|------|-------|-------|------|------| |
| 26 | +|MOT17 | 65.1 | 81.0 | 80.1 | 54.6% | 14.3% | 23904 | 81927 | 1170 | |
| 27 | +|MOT20 | 63.4 | 78.2 | 77.3 | 69.9% | 9.2% | 25108 | 86720 | 1116 | |
| 28 | + |
| 29 | + ### Comparison on DanceTrack test set |
| 30 | +| Method | HOTA | DetA | AssA | MOTA | IDF1 | |
| 31 | +|------------|-------|-------|------|------|-------| |
| 32 | +| SparseTrack | 55.5 (**+7.8**) | 78.9 (**+7.9**) | 39.1 (**+7.0**) | 91.3 (**+1.7**) | 58.3 (**+4.4**) | |
| 33 | +| ByteTrack | 47.7 | 71.0 | 32.1 | 89.6 | 53.9 | |
| 34 | + |
| 35 | +**Notes**: |
| 36 | +- All the inference experiments are performed on 1 NVIDIA GeForce RTX 3090 GPUs. |
| 37 | +- Each experiment uses the **same detector and model weights** as [ByteTrack](https://github.com/ifzhang/ByteTrack) . |
| 38 | +- SparseTrack relies on IoU distance association only and do not use any appearance embedding, learnable motion, and attention components. |
| 39 | + |
| 40 | +## Installation |
| 41 | +#### Dependence |
| 42 | +This project is an implementation version of [Detectron2](https://github.com/facebookresearch/detectron2) and requires the compilation of [OpenCV](https://opencv.org/), [Boost](https://www.boost.org). |
| 43 | + |
| 44 | +#### Compile GMC(Globle Motion Compensation) module |
| 45 | +>step 1: Downloading [pbcvt](https://github.com/Algomorph/pyboostcvconverter), copy the [python_module.cpp](https://github.com/hustvl/SparseTrack/blob/main/python_module.cpp) to the path **<[pbcvt](https://github.com/Algomorph/pyboostcvconverter)/src/>**. |
| 46 | +> |
| 47 | +>step 2: Adding the relevant OpenCV modules in the pbcvt/CMakeLists.txt file. Here's what you should do: locate the line "find_package(OpenCV COMPONENTS REQUIRED)" in the CMakeLists.txt file and replace it with "find_package(OpenCV COMPONENTS core highgui video videoio videostab REQUIRED)". |
| 48 | +> |
| 49 | +>step 3: Modifying the compilation path in the Makefile file before compiling pbcvt. The main modifications include updating the following entries:CMAKE_SOURCE_DIR, CMAKE_BINARY_DIR, cmake_progress_start. |
| 50 | +> |
| 51 | +>step 4: Compiling [pbcvt](https://github.com/Algomorph/pyboostcvconverter). |
| 52 | +> |
| 53 | +>step 5: Please copy the "pbcvt.xxxxxx.so" file compiled via [pbcvt](https://github.com/Algomorph/pyboostcvconverter) to the **<ROOT/SparseTrack/tracker/>** directory. |
| 54 | + |
| 55 | +#### Install |
| 56 | +```shell |
| 57 | +git clone https://github.com/hustvl/SparseTrack.git |
| 58 | +cd SparseTrack |
| 59 | +pip install -r requirements.txt |
| 60 | +pip install Cython |
| 61 | +pip install cython_bbox |
| 62 | +``` |
| 63 | + |
| 64 | +## Data preparation |
| 65 | +Download [MOT17](https://motchallenge.net/), [MOT20](https://motchallenge.net/), [CrowdHuman](https://www.crowdhuman.org/), [Cityperson](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md), [ETHZ](https://github.com/Zhongdao/Towards-Realtime-MOT/blob/master/DATASET_ZOO.md) and put them under ROOT/ in the following structure: |
| 66 | +``` |
| 67 | +ROOT |
| 68 | + | |
| 69 | + |——————SparseTrack(repo) |
| 70 | + | └—————mix |
| 71 | + | └——————mix_17/annotations |
| 72 | + | └——————mix_20/annotations |
| 73 | + | └——————ablation_17/annotations |
| 74 | + | └——————ablation_20/annotations |
| 75 | + |——————MOT17 |
| 76 | + | └——————train |
| 77 | + | └——————test |
| 78 | + └——————crowdhuman |
| 79 | + | └——————Crowdhuman_train |
| 80 | + | └——————Crowdhuman_val |
| 81 | + | └——————annotation_train.odgt |
| 82 | + | └——————annotation_val.odgt |
| 83 | + └——————MOT20 |
| 84 | + | └——————train |
| 85 | + | └——————test |
| 86 | + └——————Citypersons |
| 87 | + | └——————images |
| 88 | + | └——————labels_with_ids |
| 89 | + └——————ETHZ |
| 90 | + └——————eth01 |
| 91 | + └——————... |
| 92 | + └——————eth07 |
| 93 | +``` |
| 94 | +Then, you need to turn the datasets to COCO format and mix different training data: |
| 95 | +``` |
| 96 | +cd <ROOT>/SparseTrack |
| 97 | +python3 tools/convert_mot17_to_coco.py |
| 98 | +python3 tools/convert_mot20_to_coco.py |
| 99 | +python3 tools/convert_crowdhuman_to_coco.py |
| 100 | +python3 tools/convert_cityperson_to_coco.py |
| 101 | +python3 tools/convert_ethz_to_coco.py |
| 102 | +``` |
| 103 | +Creating different training mix_data: |
| 104 | +``` |
| 105 | +cd <ROOT>/SparseTrack |
| 106 | +
|
| 107 | +# training on CrowdHuman and MOT17 half train, evaluate on MOT17 half val. |
| 108 | +python3 tools/mix_data_ablation.py |
| 109 | +
|
| 110 | +# training on CrowdHuman and MOT20 half train, evaluate on MOT20 half val. |
| 111 | +python3 tools/mix_data_ablation_20.py |
| 112 | +
|
| 113 | +# training on MOT17, CrowdHuman, ETHZ, Citypersons, evaluate on MOT17 train. |
| 114 | +python3 tools/mix_data_test_mot17.py |
| 115 | +
|
| 116 | +# training on MOT20 and CrowdHuman, evaluate on MOT20 train. |
| 117 | +python3 tools/mix_data_test_mot20.py |
| 118 | +``` |
| 119 | + |
| 120 | +## Model zoo |
| 121 | +See [ByteTrack.model_zoo](https://github.com/ifzhang/ByteTrack#model-zoo). We used the publicly available ByteTrack model zoo trained on MOT17, MOT20 and ablation study for YOLOX object detection. |
| 122 | + |
| 123 | +Additionally, we conducted joint training on MOT20 train half and Crowdhuman, and evaluated on MOT20 val half. The model as follows: [yolox_x_mot20_ablation](https://drive.google.com/file/d/1F2XwyYKj1kefLPUFRHxgnpaAmEwyoocw/view?usp=drive_link) |
| 124 | + |
| 125 | +The model trained on DanceTrack can be available at [yolox_x_dancetrack](https://drive.google.com/drive/folders/1-uxcNTi7dhuDNGC5MmzXyllLzmVbzXay?usp=sharing). |
| 126 | + |
| 127 | + |
| 128 | +## Training |
| 129 | +All training is conducted on a unified script. You need to change the **VAL_JSON** and **VAL_PATH** in [register_data.py](https://github.com/hustvl/SparseTrack/blob/main/register_data.py), and then run as follows: |
| 130 | +``` |
| 131 | +# training on MOT17, CrowdHuman, ETHZ, Citypersons, evaluate on MOT17 train set. |
| 132 | +CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --num-gpus 4 --config-file mot17_train_config.py |
| 133 | +
|
| 134 | +
|
| 135 | +# training on MOT20, CrowdHuman, evaluate on MOT20 train set. |
| 136 | +CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --num-gpus 4 --config-file mot20_train_config.py |
| 137 | +``` |
| 138 | +**Notes**: |
| 139 | +For MOT20, you need to clip the bounding boxes inside the image. |
| 140 | + |
| 141 | +Add clip operation in line 138-139 in [data_augment.py](https://github.com/hustvl/SparseTrack/blob/main/datasets/data/data_augment.py), line 118-121 in [mosaicdetection.py](https://github.com/hustvl/SparseTrack/blob/main/datasets/data/datasets/mosaicdetection.py), line 213-221 in mosaicdetection.py, line 115-118 in [boxes.py](https://github.com/hustvl/SparseTrack/blob/main/utils/boxes.py). |
| 142 | + |
| 143 | +## Tracking |
| 144 | +All tracking experimental scripts are run in the following manner. You first place the model weights in the **<ROOT/SparseTrack/pretrain/>**, and change the **VAL_JSON** and **VAL_PATH** in [register_data.py](https://github.com/hustvl/SparseTrack/blob/main/register_data.py). |
| 145 | +``` |
| 146 | +# tracking on mot17 train set or test set |
| 147 | +CUDA_VISIBLE_DEVICES=0 python3 track.py --num-gpus 1 --config-file mot17_track_cfg.py |
| 148 | +
|
| 149 | +
|
| 150 | +# tracking on mot20 train set or test set |
| 151 | +CUDA_VISIBLE_DEVICES=0 python3 track.py --num-gpus 1 --config-file mot20_track_cfg.py |
| 152 | +
|
| 153 | +
|
| 154 | +# tracking on mot17 val_half set |
| 155 | +CUDA_VISIBLE_DEVICES=0 python3 track.py --num-gpus 1 --config-file mot17_ab_track_cfg.py |
| 156 | +
|
| 157 | +
|
| 158 | +# tracking on mot20 val_half set |
| 159 | +CUDA_VISIBLE_DEVICES=0 python3 track.py --num-gpus 1 --config-file mot20_ab_track_cfg.py |
| 160 | +``` |
| 161 | + |
| 162 | +## Citation --> |
| 163 | +If you find SparseTrack is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry. |
| 164 | +```bibtex |
| 165 | +@inproceedings{SparseTrack, |
| 166 | + title={SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth}, |
| 167 | + author={Liu, Zelin and Wang, Xinggang and Wang, Cheng and Liu, Wenyu and Bai, Xiang}, |
| 168 | + journal={arXiv preprint arXiv:2306.05238}, |
| 169 | + year={2023} |
| 170 | +} |
| 171 | +``` |
| 172 | + |
| 173 | +## Acknowledgements |
| 174 | +A large part of the code is borrowed from [YOLOX](https://github.com/Megvii-BaseDetection/YOLOX), [FairMOT](https://github.com/ifzhang/FairMOT), [ByteTrack](https://github.com/ifzhang/ByteTrack), [BoT-SORT](https://github.com/NirAharon/BOT-SORT), [Detectron2](https://github.com/facebookresearch/detectron2). |
| 175 | + Many thanks for their wonderful works. |
| 176 | + |
0 commit comments