Skip to content

Commit 4c04ce7

Browse files
authored
keypoint petr (PaddlePaddle#7774)
* petr train ok train ok refix augsize affine size fix update msdeformable fix flip/affine fix clip add resize area add distortion debug mode fix pos_inds update edge joints update word mistake * delete extra codes;adapt transformer modify;update code format * reverse old transformer modify * integrate datasets
1 parent 2685fb5 commit 4c04ce7

27 files changed

+5215
-162
lines changed

.gitignore

+3-3
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ __pycache__/
1818

1919
# Distribution / packaging
2020
/bin/
21-
/build/
21+
*build/
2222
/develop-eggs/
23-
/dist/
23+
*dist/
2424
/eggs/
2525
/lib/
2626
/lib64/
@@ -30,7 +30,7 @@ __pycache__/
3030
/parts/
3131
/sdist/
3232
/var/
33-
/*.egg-info/
33+
*.egg-info/
3434
/.installed.cfg
3535
/*.egg
3636
/.eggs

configs/keypoint/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,10 @@ PaddleDetection 中的关键点检测部分紧跟最先进的算法,包括 Top
5656
## 模型库
5757

5858
COCO数据集
59+
5960
| 模型 | 方案 |输入尺寸 | AP(coco val) | 模型下载 | 配置文件 |
6061
| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------| ------- |
62+
| PETR_Res50 |One-Stage| 512 | 65.5 | [petr_res50.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/petr_resnet50_16x2_coco.pdparams) | [config](./petr/petr_resnet50_16x2_coco.yml) |
6163
| HigherHRNet-w32 |Bottom-Up| 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) |
6264
| HigherHRNet-w32 | Bottom-Up| 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) |
6365
| HigherHRNet-w32+SWAHR |Bottom-Up| 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) |

configs/keypoint/README_en.md

+1
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ At the same time, PaddleDetection provides a self-developed real-time keypoint d
6262
COCO Dataset
6363
| Model | Input Size | AP(coco val) | Model Download | Config File |
6464
| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- |
65+
| PETR_Res50 |One-Stage| 512 | 65.5 | [petr_res50.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/petr_resnet50_16x2_coco.pdparams) | [config](./petr/petr_resnet50_16x2_coco.yml) |
6566
| HigherHRNet-w32 | 512 | 67.1 | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml) |
6667
| HigherHRNet-w32 | 640 | 68.3 | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml) |
6768
| HigherHRNet-w32+SWAHR | 512 | 68.9 | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
use_gpu: true
2+
log_iter: 50
3+
save_dir: output
4+
snapshot_epoch: 1
5+
weights: output/petr_resnet50_16x2_coco/model_final
6+
epoch: 100
7+
num_joints: &num_joints 17
8+
pixel_std: &pixel_std 200
9+
metric: COCO
10+
num_classes: 1
11+
trainsize: &trainsize 512
12+
flip_perm: &flip_perm [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15]
13+
find_unused_parameters: False
14+
15+
#####model
16+
architecture: PETR
17+
pretrain_weights: https://bj.bcebos.com/v1/paddledet/models/pretrained/PETR_pretrained.pdparams
18+
19+
PETR:
20+
backbone:
21+
name: ResNet
22+
depth: 50
23+
variant: b
24+
norm_type: bn
25+
freeze_norm: True
26+
freeze_at: 0
27+
return_idx: [1,2,3]
28+
num_stages: 4
29+
lr_mult_list: [0.1, 0.1, 0.1, 0.1]
30+
neck:
31+
name: ChannelMapper
32+
in_channels: [512, 1024, 2048]
33+
kernel_size: 1
34+
out_channels: 256
35+
norm_type: "gn"
36+
norm_groups: 32
37+
act: None
38+
num_outs: 4
39+
bbox_head:
40+
name: PETRHead
41+
num_query: 300
42+
num_classes: 1 # only person
43+
in_channels: 2048
44+
sync_cls_avg_factor: true
45+
with_kpt_refine: true
46+
transformer:
47+
name: PETRTransformer
48+
as_two_stage: true
49+
encoder:
50+
name: TransformerEncoder
51+
encoder_layer:
52+
name: TransformerEncoderLayer
53+
d_model: 256
54+
attn:
55+
name: MSDeformableAttention
56+
embed_dim: 256
57+
num_heads: 8
58+
num_levels: 4
59+
num_points: 4
60+
dim_feedforward: 1024
61+
dropout: 0.1
62+
num_layers: 6
63+
decoder:
64+
name: PETR_TransformerDecoder
65+
num_layers: 3
66+
return_intermediate: true
67+
decoder_layer:
68+
name: PETR_TransformerDecoderLayer
69+
d_model: 256
70+
dim_feedforward: 1024
71+
dropout: 0.1
72+
self_attn:
73+
name: MultiHeadAttention
74+
embed_dim: 256
75+
num_heads: 8
76+
dropout: 0.1
77+
cross_attn:
78+
name: MultiScaleDeformablePoseAttention
79+
embed_dims: 256
80+
num_heads: 8
81+
num_levels: 4
82+
num_points: 17
83+
hm_encoder:
84+
name: TransformerEncoder
85+
encoder_layer:
86+
name: TransformerEncoderLayer
87+
d_model: 256
88+
attn:
89+
name: MSDeformableAttention
90+
embed_dim: 256
91+
num_heads: 8
92+
num_levels: 1
93+
num_points: 4
94+
dim_feedforward: 1024
95+
dropout: 0.1
96+
num_layers: 1
97+
refine_decoder:
98+
name: PETR_DeformableDetrTransformerDecoder
99+
num_layers: 2
100+
return_intermediate: true
101+
decoder_layer:
102+
name: PETR_TransformerDecoderLayer
103+
d_model: 256
104+
dim_feedforward: 1024
105+
dropout: 0.1
106+
self_attn:
107+
name: MultiHeadAttention
108+
embed_dim: 256
109+
num_heads: 8
110+
dropout: 0.1
111+
cross_attn:
112+
name: MSDeformableAttention
113+
embed_dim: 256
114+
num_levels: 4
115+
positional_encoding:
116+
name: PositionEmbedding
117+
num_pos_feats: 128
118+
normalize: true
119+
offset: -0.5
120+
loss_cls:
121+
name: Weighted_FocalLoss
122+
use_sigmoid: true
123+
gamma: 2.0
124+
alpha: 0.25
125+
loss_weight: 2.0
126+
reduction: "mean"
127+
loss_kpt:
128+
name: L1Loss
129+
loss_weight: 70.0
130+
loss_kpt_rpn:
131+
name: L1Loss
132+
loss_weight: 70.0
133+
loss_oks:
134+
name: OKSLoss
135+
loss_weight: 2.0
136+
loss_hm:
137+
name: CenterFocalLoss
138+
loss_weight: 4.0
139+
loss_kpt_refine:
140+
name: L1Loss
141+
loss_weight: 80.0
142+
loss_oks_refine:
143+
name: OKSLoss
144+
loss_weight: 3.0
145+
assigner:
146+
name: PoseHungarianAssigner
147+
cls_cost:
148+
name: FocalLossCost
149+
weight: 2.0
150+
kpt_cost:
151+
name: KptL1Cost
152+
weight: 70.0
153+
oks_cost:
154+
name: OksCost
155+
weight: 7.0
156+
157+
#####optimizer
158+
LearningRate:
159+
base_lr: 0.0002
160+
schedulers:
161+
- !PiecewiseDecay
162+
milestones: [80]
163+
gamma: 0.1
164+
use_warmup: false
165+
# - !LinearWarmup
166+
# start_factor: 0.001
167+
# steps: 1000
168+
169+
OptimizerBuilder:
170+
clip_grad_by_norm: 0.1
171+
optimizer:
172+
type: AdamW
173+
regularizer:
174+
factor: 0.0001
175+
type: L2
176+
177+
178+
#####data
179+
TrainDataset:
180+
!KeypointBottomUpCocoDataset
181+
image_dir: train2017
182+
anno_path: annotations/person_keypoints_train2017.json
183+
dataset_dir: dataset/coco
184+
num_joints: *num_joints
185+
return_mask: false
186+
187+
EvalDataset:
188+
!KeypointBottomUpCocoDataset
189+
image_dir: val2017
190+
anno_path: annotations/person_keypoints_val2017.json
191+
dataset_dir: dataset/coco
192+
num_joints: *num_joints
193+
test_mode: true
194+
return_mask: false
195+
196+
TestDataset:
197+
!ImageFolder
198+
anno_path: dataset/coco/keypoint_imagelist.txt
199+
200+
worker_num: 2
201+
global_mean: &global_mean [0.485, 0.456, 0.406]
202+
global_std: &global_std [0.229, 0.224, 0.225]
203+
TrainReader:
204+
sample_transforms:
205+
- Decode: {}
206+
- PhotoMetricDistortion:
207+
brightness_delta: 32
208+
contrast_range: [0.5, 1.5]
209+
saturation_range: [0.5, 1.5]
210+
hue_delta: 18
211+
- KeyPointFlip:
212+
flip_prob: 0.5
213+
flip_permutation: *flip_perm
214+
- RandomAffine:
215+
max_degree: 30
216+
scale: [1.0, 1.0]
217+
max_shift: 0.
218+
trainsize: -1
219+
- RandomSelect: { transforms1: [ RandomShortSideRangeResize: { scales: [[400, 1400], [1400, 1400]]} ],
220+
transforms2: [
221+
RandomShortSideResize: { short_side_sizes: [ 400, 500, 600 ] },
222+
RandomSizeCrop: { min_size: 384, max_size: 600},
223+
RandomShortSideRangeResize: { scales: [[400, 1400], [1400, 1400]]} ]}
224+
batch_transforms:
225+
- NormalizeImage: {mean: *global_mean, std: *global_std, is_scale: True}
226+
- PadGT: {pad_img: True, minimum_gtnum: 1}
227+
- Permute: {}
228+
batch_size: 2
229+
shuffle: true
230+
drop_last: true
231+
use_shared_memory: true
232+
collate_batch: true
233+
234+
EvalReader:
235+
sample_transforms:
236+
- PETR_Resize: {img_scale: [[800, 1333]], keep_ratio: True}
237+
# - MultiscaleTestResize: {origin_target_size: [[800, 1333]], use_flip: false}
238+
- NormalizeImage:
239+
mean: *global_mean
240+
std: *global_std
241+
is_scale: true
242+
- Permute: {}
243+
batch_size: 1
244+
245+
TestReader:
246+
sample_transforms:
247+
- Decode: {}
248+
- EvalAffine:
249+
size: *trainsize
250+
- NormalizeImage:
251+
mean: *global_mean
252+
std: *global_std
253+
is_scale: true
254+
- Permute: {}
255+
batch_size: 1

docs/tutorials/data/PrepareKeypointDataSet.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ MPII keypoint indexes:
8282
```
8383
{
8484
'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
85-
'joints': [
85+
'gt_joints': [
8686
[-1.0, -1.0],
8787
[-1.0, -1.0],
8888
[-1.0, -1.0],

docs/tutorials/data/PrepareKeypointDataSet_en.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ The following example takes a parsed annotation information to illustrate the co
8282
```
8383
{
8484
'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
85-
'joints': [
85+
'gt_joints': [
8686
[-1.0, -1.0],
8787
[-1.0, -1.0],
8888
[-1.0, -1.0],

0 commit comments

Comments
 (0)