Skip to content

Commit ff2374f

Browse files
committed
Add double anchor
1 parent 797651e commit ff2374f

File tree

6 files changed

+50
-6
lines changed

6 files changed

+50
-6
lines changed

README.md

+6
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,14 @@ semi-supervised training](http://openaccess.thecvf.com/content_CVPR_2019/papers/
105105
- [Adaptive NMS: Refining Pedestrian Detection in a Crowd](https://arxiv.org/abs/1904.03629) [[Notes](paper_notes/adaptive_nms.md)] <kbd>CVPR 2019 oral</kbd> [crowd detection, NMS]
106106
- [Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1807.08407) [[Notes](paper_notes/orcnn.md)] <kbd>ECCV 2018</kbd> [crowd detection]
107107
- [CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions](https://arxiv.org/abs/2003.09163) [[Notes](paper_notes/crowd_det.md)] <kbd>CVPR 2020 oral</kbd> [crowd detection, Megvii]
108+
- [RR-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing](https://arxiv.org/abs/2003.12729) [[Notes](paper_notes/rr_nms.md)] <kbd>CVPR 2020</kbd>
109+
- [Double Anchor R-CNN for Human Detection in a Crowd](https://arxiv.org/abs/1909.09998) [[Notes](paper_notes/double_anchor.md)] [head-body bundle]
110+
- [Review: AP vs MR](paper_notes/ap_mr.md)
111+
- [Precise Detection in Densely Packed Scenes](https://arxiv.org/abs/1904.00853) <kbd>CVPR 2019</kbd> [crowd detection, no occlusion]
108112
- [TLL: Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation](https://arxiv.org/abs/1807.01438) <kbd>ECCV 2018</kbd>
109113
- [Learning Monocular 3D Vehicle Detection without 3D Bounding Box Labels](https://arxiv.org/abs/2010.03506) [mono3D, Daniel Cremers, TUM]
114+
- [ViT: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://openreview.net/forum?id=YicbFdNTTy) [[Notes](paper_notes/vit.md)] <kbd>ICLR 2020<kbd>
115+
- [BYOL: Bootstrap your own latent: A new approach to self-supervised Learning](https://arxiv.org/abs/2006.07733) [self-supervised]
110116
- [SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
111117
Feature Extraction](https://arxiv.org/abs/2010.02893) [Monodepth, semantics, Naver labs]
112118
- [Toward Interactive Self-Annotation For Video Object Bounding Box: Recurrent Self-Learning And Hierarchical Annotation Based Framework](https://openaccess.thecvf.com/content_WACV_2020/papers/Le_Toward_Interactive_Self-Annotation_For_Video_Object_Bounding_Box_Recurrent_Self-Learning_WACV_2020_paper.pdf) <kbd>WACV 2020</kbd>

paper_notes/adaptive_nms.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ Both [RepLoss](rep_loss.md) and [Occlusion aware R-CNN](orcnn.md) proposes addit
1717
- On top of RPN for two stage detectors, taking the objectness predictions, bounding box predictions and conv features as input.
1818

1919
#### Technical details
20-
- Combining adaptive NMS and soft-NMS has minor or even negative improvements on metric MR^-2 (0.01 to 1 FPPI). Reason may be the benefit happens beyond 1 FPPI and thus does not improve metric.
20+
- [AP vs MR](ap_mr.md) in object detection.
21+
- Combining adaptive NMS and soft-NMS has minor or even negative improvements on metric MR^-2 (0.01 to 1 FPPI). Reason may be the benefit happens beyond 1 FPPI and thus does not improve metric.
2122
- Reasonable: Bare (0 to 0.1), Partial (0.1 to 0.35), Heavy (0.35 to 1).
2223

2324
#### Notes

paper_notes/crowd_det.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,9 @@ Current works are either too complex or less effective for handling highly overl
2828

2929
#### Technical details
3030
- Test on COCO to verify that there is no performance degradation rather than significant performance improvement.
31-
- AP is more sensitive to recall. MR is very sensitive to FP with high confidence.
31+
- [AP vs MR](ap_mr.md) in object detection.
32+
- AP is more sensitive to recall. MR is very sensitive to FP with high confidence.
3233

3334
#### Notes
34-
- Questions and notes on how to improve/revise the current work
35+
- [Pytorch code on Github](https://github.com/Purkialo/CrowdDet)
3536

paper_notes/crowdhuman.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,4 @@ Previous datasets are more likely to annotate crowd human as a whole ignored reg
2222
- Pervious datasets (CityPerson) annotates top of the head to the middle of the feet and generated a full bbox with fixed aspect ratio of 0.41.
2323

2424
#### Notes
25-
- Questions and notes on how to improve/revise the current work
26-
25+
- [AP vs MR](ap_mr.md) in object detection.

paper_notes/double_anchor.md

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# [Double Anchor R-CNN for Human Detection in a Crowd](https://arxiv.org/abs/1909.09998)
2+
3+
_October 2020_
4+
5+
tl;dr: Double Anchor RPN is developed to capture body and head parts in pairs.
6+
7+
#### Overall impression
8+
Crowd occlusion is challenging for two reasons:
9+
10+
- when people overlap largely with each other, semantic features of different instances also interweave and make sectors difficult to discriminate instance boundaries.
11+
- Even though detectors successfully differentiate and detect instances, they may be suppressed by NMS.
12+
13+
The intuition behind the paper is simple: compared with the human body, the head usually has a smaller scale, less overlap and a better view in real-world images, and thus is more robust to pose variations and crowd occlusions.
14+
15+
One main challenge in crowd detection is high score false positives. --> However safety-wise this does not seem to be an issue for autonomous driving.
16+
17+
#### Key ideas
18+
- **Double Anchor RPN** basically is to output two regressed offsets (for body and head) and one score.
19+
- Proposal Crossover:
20+
- Two branches: head-body branch which regresses head and body from head anchor, and body-head branch which regresses head and body from body anchor
21+
- Body proposals from head-body branch is not good. Thus perform IoU check of the body proposals between head-body branch and body-head branch, and replace body proposal from head-body branch (lower quality) with that from the body-head branch (higher quality)
22+
- Feature aggregation:
23+
- perform RoIAlign on two proposals separately, then concat
24+
- predict head bbox loc/score, and body bbox loc/score.
25+
- Joint NMS:
26+
- weighted score from both head bbox score and body bbox score.
27+
- If head IoU or body IoU exceeds certain threshold then suppress
28+
29+
#### Technical details
30+
- [AP vs MR](ap_mr.md) in object detection.
31+
- Soft-NMS maintains lots of long-tail detection results for improving recall at the expense of bringing more false positives, which leqds to negative impact on human detection especially for the metric of MR (where FP with high score is the bottleneck).
32+
- Note that in deployment, neither MR or AP is a good metric, as we have to select one working point.
33+
34+
#### Notes
35+
- [Review on Zhihu](https://zhuanlan.zhihu.com/p/95253096)
36+

paper_notes/rep_loss.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,8 @@ Visualization before NMS seems to be a powerful debugging tool.
2525
- pred bboxes are much denser than the GT boxes, a pair of two pred bboxes are more likely to have a larger overlap than a pair of one predicted box and one GT box. Thus RepBox is more likely to have outliers than in RepGT.
2626

2727
#### Technical details
28-
- Log average miss rate on False Positive Per Image (MR^-2) is usually the KPI for pedestrian detection. This looks like FROC curve. Miss rate = 1 - recall. MR score is plot on both logx and logy. The lower the better.
28+
- [AP vs MR](ap_mr.md) in object detection.
29+
- Log average miss rate on False Positive Per Image (MR^-2) is usually the KPI for pedestrian detection. This looks like FROC curve. Miss rate = 1 - recall. MR score is plot on both logx and logy. The lower the better.
2930
- Occlusion: occ > 0.1. Occ is calculated by 1 - (visible bbox area / full bbox area). Crowd occlusion: occ > 0.1, IoU > 0.1
3031
- Occlusion < 35%. [0, 10%]: bare, [10%, 35%] partial, [35%, 1): heavy. Bare and partial occlusions are **reasonable** occlusions.
3132
- FP: background (0 GT under 0.1 IoU), localization error (1 GT), and crowd error (2+ GT).

0 commit comments

Comments
 (0)