Skip to content

Commit e792e41

Browse files
authored
[MOT] add bytetrack yolo configs and deploy (PaddlePaddle#5377)
* add bytetrack yolov3 ppyoloe cfgs * add bytetrack reid * fix sde bytetrack reid * fix bytetrack readme * add jdetracker mtmct * fix bytetrack reid * fix deploy track_config * fix doc readme
1 parent 9111122 commit e792e41

25 files changed

+837
-72
lines changed

configs/mot/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ pip install -r requirements.txt
5858

5959
## 模型库
6060
- 基础模型
61+
- [ByteTrack](bytetrack/README_cn.md)
6162
- [DeepSORT](deepsort/README_cn.md)
6263
- [JDE](jde/README_cn.md)
6364
- [FairMOT](fairmot/README_cn.md)

configs/mot/README_en.md

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ pip install -r requirements.txt
6060

6161
## Model Zoo
6262
- Base models
63+
- [ByteTrack](bytetrack/README.md)
6364
- [DeepSORT](deepsort/README.md)
6465
- [JDE](jde/README.md)
6566
- [FairMOT](fairmot/README.md)

configs/mot/bytetrack/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
README_cn.md

configs/mot/bytetrack/README_cn.md

+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
简体中文 | [English](README.md)
2+
3+
# ByteTrack (ByteTrack: Multi-Object Tracking by Associating Every Detection Box)
4+
5+
## 内容
6+
- [简介](#简介)
7+
- [模型库](#模型库)
8+
- [快速开始](#快速开始)
9+
- [引用](#引用)
10+
11+
## 简介
12+
[ByteTrack](https://arxiv.org/abs/2110.06864)(ByteTrack: Multi-Object Tracking by Associating Every Detection Box) 通过关联每个检测框来跟踪,而不仅是关联高分的检测框。对于低分数检测框会利用它们与轨迹片段的相似性来恢复真实对象并过滤掉背景检测框。此处提供了几个常用检测器的配置作为参考。由于训练数据集、输入尺度、训练epoch数、NMS阈值设置等的不同均会导致模型精度和性能的差异,请自行根据需求进行适配。
13+
14+
## 模型库
15+
16+
### ByteTrack在MOT-17 half Val Set上结果
17+
18+
| 检测训练数据集 | 检测器 | 输入尺度 | ReID | 检测mAP | MOTA | IDF1 | FPS | 配置文件 |
19+
| :-------- | :----- | :----: | :----:|:------: | :----: |:-----: |:----:|:----: |
20+
| MOT-17 half train | YOLOv3 | 608x608 | - | 42.7 | 49.5 | 54.8 | - |[配置文件](./bytetrack_yolov3.yml) |
21+
| MOT-17 half train | PPYOLOe | 640x640 | - | 52.9 | 50.4 | 59.7 | - |[配置文件](./bytetrack_ppyoloe.yml) |
22+
| MOT-17 half train | PPYOLOe | 640x640 |PPLCNet| 52.9 | 51.7 | 58.8 | - |[配置文件](./bytetrack_ppyoloe_pplcnet.yml) |
23+
24+
**注意:**
25+
- 模型权重下载链接在配置文件中的```det_weights``````reid_weights```,运行验证的命令即可自动下载。
26+
- ByteTrack的训练是单独的检测器训练MOT数据集,推理是组装跟踪器去评估MOT指标,单独的检测模型也可以评估检测指标。
27+
- ByteTrack的导出部署,是单独导出检测模型,再组装跟踪器运行的,参照[PP-Tracking](../../../deploy/pptracking/python/README.md)
28+
29+
30+
## 快速开始
31+
32+
### 1. 训练
33+
通过如下命令一键式启动训练和评估
34+
```bash
35+
python -m paddle.distributed.launch --log_dir=ppyoloe --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml --eval --amp --fleet
36+
```
37+
38+
### 2. 评估
39+
#### 2.1 评估检测效果
40+
```bash
41+
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml
42+
```
43+
44+
**注意:**
45+
- 评估检测使用的是```tools/eval.py```, 评估跟踪使用的是```tools/eval_mot.py```
46+
47+
#### 2.2 评估跟踪效果
48+
```bash
49+
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_yolov3.yml --scaled=True
50+
# 或者
51+
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --scaled=True
52+
# 或者
53+
CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe_pplcnet.yml --scaled=True
54+
```
55+
**注意:**
56+
- `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE YOLOv3则为False,如果使用通用检测模型则为True, 默认值是False。
57+
- 跟踪结果会存于`{output_dir}/mot_results/`中,里面每个视频序列对应一个txt,每个txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`, 此外`{output_dir}`可通过`--output_dir`设置。
58+
59+
### 3. 预测
60+
61+
使用单个GPU通过如下命令预测一个视频,并保存为视频
62+
63+
```bash
64+
CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/bytetrack/bytetrack_ppyoloe.yml --video_file={your video name}.mp4 --scaled=True --save_videos
65+
```
66+
67+
**注意:**
68+
- 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`
69+
- `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
70+
71+
72+
### 4. 导出预测模型
73+
74+
Step 1:导出检测模型
75+
```bash
76+
# 导出PPYOLe行人检测模型
77+
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/bytetrack/detector/ppyoloe_crn_l_36e_640x640_mot17half.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
78+
```
79+
80+
Step 2:导出ReID模型(可选步骤,默认不需要)
81+
```bash
82+
# 导出PPLCNet ReID模型
83+
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/deepsort/reid/deepsort_pplcnet.yml -o reid_weights=https://paddledet.bj.bcebos.com/models/mot/deepsort/deepsort_pplcnet.pdparams
84+
```
85+
86+
### 4. 用导出的模型基于Python去预测
87+
88+
```bash
89+
python deploy/pptracking/python/mot_sde_infer.py --model_dir=output_inference/ppyoloe_crn_l_36e_640x640_mot17half/ --tracker_config=tracker_config.yml --video_file={your video name}.mp4 --device=GPU --scaled=True --save_mot_txts
90+
```
91+
**注意:**
92+
- 跟踪模型是对视频进行预测,不支持单张图的预测,默认保存跟踪结果可视化后的视频,可添加`--save_mot_txts`(对每个视频保存一个txt)或`--save_mot_txt_per_img`(对每张图片保存一个txt)表示保存跟踪结果的txt文件,或`--save_images`表示保存跟踪结果可视化图片。
93+
- 跟踪结果txt文件每行信息是`frame,id,x1,y1,w,h,score,-1,-1,-1`
94+
- `--scaled`表示在模型输出结果的坐标是否已经是缩放回原图的,如果使用的检测模型是JDE的YOLOv3则为False,如果使用通用检测模型则为True。
95+
96+
97+
## 引用
98+
```
99+
@article{zhang2021bytetrack,
100+
title={ByteTrack: Multi-Object Tracking by Associating Every Detection Box},
101+
author={Zhang, Yifu and Sun, Peize and Jiang, Yi and Yu, Dongdong and Yuan, Zehuan and Luo, Ping and Liu, Wenyu and Wang, Xinggang},
102+
journal={arXiv preprint arXiv:2110.06864},
103+
year={2021}
104+
}
105+
```
+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
metric: COCO
2+
num_classes: 1
3+
4+
# Detection Dataset for training
5+
TrainDataset:
6+
!COCODataSet
7+
dataset_dir: dataset/mot/MOT17
8+
anno_path: annotations/train_half.json
9+
image_dir: images/train
10+
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
11+
12+
EvalDataset:
13+
!COCODataSet
14+
dataset_dir: dataset/mot/MOT17
15+
anno_path: annotations/val_half.json
16+
image_dir: images/train
17+
18+
TestDataset:
19+
!ImageFolder
20+
anno_path: annotations/val_half.json
21+
22+
23+
# MOTDataset for MOT evaluation and inference
24+
EvalMOTDataset:
25+
!MOTImageFolder
26+
dataset_dir: dataset/mot
27+
data_root: MOT17/images/half
28+
keep_ori_im: True # set as True in DeepSORT and ByteTrack
29+
30+
TestMOTDataset:
31+
!MOTImageFolder
32+
dataset_dir: dataset/mot
33+
keep_ori_im: True # set True if save visualization images or video
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
worker_num: 8
2+
TrainReader:
3+
sample_transforms:
4+
- Decode: {}
5+
- RandomDistort: {}
6+
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
7+
- RandomCrop: {}
8+
- RandomFlip: {}
9+
batch_transforms:
10+
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608, 640, 672, 704, 736, 768], random_size: True, random_interp: True, keep_ratio: False}
11+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
12+
- Permute: {}
13+
- PadGT: {}
14+
batch_size: 8
15+
shuffle: true
16+
drop_last: true
17+
use_shared_memory: true
18+
collate_batch: true
19+
20+
EvalReader:
21+
sample_transforms:
22+
- Decode: {}
23+
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
24+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
25+
- Permute: {}
26+
batch_size: 8
27+
28+
TestReader:
29+
inputs_def:
30+
image_shape: [3, 640, 640]
31+
sample_transforms:
32+
- Decode: {}
33+
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
34+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
35+
- Permute: {}
36+
batch_size: 1
37+
38+
39+
# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
40+
EvalMOTReader:
41+
sample_transforms:
42+
- Decode: {}
43+
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
44+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
45+
- Permute: {}
46+
batch_size: 1
47+
48+
TestMOTReader:
49+
inputs_def:
50+
image_shape: [3, 640, 640]
51+
sample_transforms:
52+
- Decode: {}
53+
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
54+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
55+
- Permute: {}
56+
batch_size: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
worker_num: 2
2+
TrainReader:
3+
inputs_def:
4+
num_max_boxes: 50
5+
sample_transforms:
6+
- Decode: {}
7+
- Mixup: {alpha: 1.5, beta: 1.5}
8+
- RandomDistort: {}
9+
- RandomExpand: {fill_value: [123.675, 116.28, 103.53]}
10+
- RandomCrop: {}
11+
- RandomFlip: {}
12+
batch_transforms:
13+
- BatchRandomResize: {target_size: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608], random_size: True, random_interp: True, keep_ratio: False}
14+
- NormalizeBox: {}
15+
- PadBox: {num_max_boxes: 50}
16+
- BboxXYXY2XYWH: {}
17+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
18+
- Permute: {}
19+
- Gt2YoloTarget: {anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]], anchors: [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45], [59, 119], [116, 90], [156, 198], [373, 326]], downsample_ratios: [32, 16, 8]}
20+
batch_size: 8
21+
shuffle: true
22+
drop_last: true
23+
mixup_epoch: 250
24+
use_shared_memory: true
25+
26+
EvalReader:
27+
inputs_def:
28+
num_max_boxes: 50
29+
sample_transforms:
30+
- Decode: {}
31+
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
32+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
33+
- Permute: {}
34+
batch_size: 8
35+
36+
TestReader:
37+
inputs_def:
38+
image_shape: [3, 608, 608]
39+
sample_transforms:
40+
- Decode: {}
41+
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
42+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
43+
- Permute: {}
44+
batch_size: 1
45+
46+
47+
# add MOTReader for MOT evaluation and inference, note batch_size should be 1 in MOT
48+
EvalMOTReader:
49+
inputs_def:
50+
num_max_boxes: 50
51+
sample_transforms:
52+
- Decode: {}
53+
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
54+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
55+
- Permute: {}
56+
batch_size: 1
57+
58+
TestMOTReader:
59+
inputs_def:
60+
image_shape: [3, 608, 608]
61+
sample_transforms:
62+
- Decode: {}
63+
- Resize: {target_size: [608, 608], keep_ratio: False, interp: 2}
64+
- NormalizeImage: {mean: [0.485, 0.456, 0.406], std: [0.229, 0.224, 0.225], is_scale: True}
65+
- Permute: {}
66+
batch_size: 1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# This config is an assembled config for ByteTrack MOT, used as eval/infer mode for MOT.
2+
_BASE_: [
3+
'detector/ppyoloe_crn_l_36e_640x640_mot17half.yml',
4+
'_base_/mot17.yml',
5+
'_base_/ppyoloe_mot_reader_640x640.yml'
6+
]
7+
weights: output/bytetrack_ppyoloe/model_final
8+
log_iter: 20
9+
snapshot_epoch: 2
10+
11+
metric: MOT # eval/infer mode
12+
num_classes: 1
13+
14+
architecture: ByteTrack
15+
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/ppyoloe_crn_l_300e_coco.pdparams
16+
ByteTrack:
17+
detector: YOLOv3 # PPYOLOe version
18+
reid: None
19+
tracker: JDETracker
20+
det_weights: https://bj.bcebos.com/v1/paddledet/models/mot/ppyoloe_crn_l_36e_640x640_mot17half.pdparams
21+
reid_weights: None
22+
23+
YOLOv3:
24+
backbone: CSPResNet
25+
neck: CustomCSPPAN
26+
yolo_head: PPYOLOEHead
27+
post_process: ~
28+
29+
# Tracking requires higher quality boxes, so NMS score_threshold will be higher
30+
PPYOLOEHead:
31+
fpn_strides: [32, 16, 8]
32+
grid_cell_scale: 5.0
33+
grid_cell_offset: 0.5
34+
static_assigner_epoch: -1 # 100
35+
use_varifocal_loss: True
36+
eval_input_size: [640, 640]
37+
loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5}
38+
static_assigner:
39+
name: ATSSAssigner
40+
topk: 9
41+
assigner:
42+
name: TaskAlignedAssigner
43+
topk: 13
44+
alpha: 1.0
45+
beta: 6.0
46+
nms:
47+
name: MultiClassNMS
48+
nms_top_k: 1000
49+
keep_top_k: 100
50+
score_threshold: 0.1 # 0.01 in original detector
51+
nms_threshold: 0.4 # 0.6 in original detector
52+
53+
# BYTETracker
54+
JDETracker:
55+
use_byte: True
56+
match_thres: 0.9
57+
conf_thres: 0.2
58+
low_conf_thres: 0.1
59+
min_box_area: 100
60+
vertical_ratio: 1.6 # for pedestrian

0 commit comments

Comments
 (0)