Skip to content

Can I use human body key points detection pre-trained for another task? #9343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
whoisltd opened this issue Mar 31, 2025 · 7 comments
Open
1 task done
Assignees

Comments

@whoisltd
Copy link

whoisltd commented Mar 31, 2025

问题确认 Search before asking

  • 我已经搜索过问题,但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

I have a dataset for detecting key points. E.g., Detect 2 points of a pen. Can I use Hrnet with the pre-trained human body you guys provide?

I tried with around ~1k data and got bad performance. What should I do? Thank you

@BluebirdStory
Copy link
Collaborator

Yes, of course you can.

@whoisltd
Copy link
Author

whoisltd commented Mar 31, 2025

Yes, of course you can.

Yeh, I tried, but I got bad performance on that. What do you think the problem is?

The loss went down so quickly, but when I tried inference on the train, the test datasets failed.

Image

My training config for hrnet:

use_gpu: true
log_iter: 5
save_dir: /datasets/output
snapshot_epoch: 10
weights: /datasets/output/hrnet_w32_384x288/model_final
epoch: 210
num_joints: &num_joints 2
pixel_std: &pixel_std 200
metric: KeyPointTopDownCOCOEval
num_classes: 1
train_height: &train_height 384
train_width: &train_width 288
trainsize: &trainsize [*train_width, *train_height]
hmsize: &hmsize [72, 96]
flip_perm: &flip_perm [[1, 2]]
wandb:
  project: PaddleDetectionHrnet

#####model
architecture: TopDownHRNet
pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/Trunc_HRNet_W32_C_pretrained.pdparams

TopDownHRNet:
  backbone: HRNet
  post_process: HRNetPostProcess
  flip_perm: *flip_perm
  num_joints: *num_joints
  width: &width 32
  loss: KeyPointMSELoss
  flip: false

HRNet:
  width: *width
  freeze_at: -1
  freeze_norm: false
  return_idx: [0]

KeyPointMSELoss:
  use_target_weight: true


#####optimizer
LearningRate:
  base_lr: 0.005
  schedulers:
  - !PiecewiseDecay
    milestones: [50, 100]
    gamma: 0.1
  - !LinearWarmup
    start_factor: 0.01
    steps: 1000

OptimizerBuilder:
  optimizer:
    type: Adam
  regularizer:
    factor: 0.0
    type: L2


#####data
TrainDataset:
  !KeypointTopDownCocoDataset
    image_dir: images
    anno_path: annotations/train.json
    dataset_dir: /datasets/dataset_coco/keypoint_square_analog
    num_joints: *num_joints
    trainsize: *trainsize
    pixel_std: *pixel_std
    use_gt_bbox: True


EvalDataset:
  !KeypointTopDownCocoDataset
    image_dir: images
    anno_path: annotations/val.json
    dataset_dir: /datasets/dataset_coco/keypoint_square_analog
    # bbox_file: bbox.json
    num_joints: *num_joints
    trainsize: *trainsize
    pixel_std: *pixel_std
    use_gt_bbox: True
    image_thre: 0.0


TestDataset:
  !ImageFolder
    anno_path: /datasets/dataset_coco/keypoint_square_analog/annotations/val.json

worker_num: 2
global_mean: &global_mean [0.485, 0.456, 0.406]
global_std: &global_std [0.229, 0.224, 0.225]
TrainReader:
  sample_transforms:
    - TopDownAffine:
        trainsize: *trainsize
    - ToHeatmapsTopDown:
        hmsize: *hmsize
        sigma: 2
  batch_transforms:
    - NormalizeImage:
        mean: *global_mean
        std: *global_std
        is_scale: true
    - Permute: {}
  batch_size: 16
  shuffle: true
  drop_last: false

EvalReader:
  sample_transforms:
    - TopDownAffine:
        trainsize: *trainsize
  batch_transforms:
    - NormalizeImage:
        mean: *global_mean
        std: *global_std
        is_scale: true
    - Permute: {}
  batch_size: 16

TestReader:
  inputs_def:
    image_shape: [3, *train_height, *train_width]
  sample_transforms:
    - Decode: {}
    - TopDownEvalAffine:
        trainsize: *trainsize
    - NormalizeImage:
        mean: *global_mean
        std: *global_std
        is_scale: true
    - Permute: {}
  batch_size: 1

@BluebirdStory
Copy link
Collaborator

It seems that your train dataset is very limited...only 11 iterations for each epoch?
Besides, your learning rate is too small...

@whoisltd
Copy link
Author

whoisltd commented Apr 2, 2025

It seems that your train dataset is very limited...only 11 iterations for each epoch? Besides, your learning rate is too small...

Thank you. I was doing some research and thought that because HRNet is a top-down model, when inferring, I need a model to detect boxes of a pen before detecting a keypoint of 2 points on a pen image, right? Can't I infer the keypoint on a raw image with a background?

@BluebirdStory
Copy link
Collaborator

Off course you can, there are many multi-task models that can output bounding boxs and key points at the same time.
They do not need a "first bbox then key points" pipeline.

You understand what I am saying?

@whoisltd
Copy link
Author

whoisltd commented Apr 2, 2025

Off course you can, there are many multi-task models that can output bounding boxs and key points at the same time. They do not need a "first bbox then key points" pipeline.

You understand what I am saying?

Yeh, i understand it. Like yolov8-pose, right? I see its support detects bbox first then keypoints, does paddle have any pipeline like that?

If not, i think i will try bottom-up models method, hope that's fine 🤔

Edit:

I saw this model (petr_resnet50), it includes detecting bounding boxes inside, right?

Image

@BluebirdStory
Copy link
Collaborator

Yes, it includes detecting bounding boxes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants