You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-4
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ Here is [a list of trustworthy sources of papers](trusty.md) in case I ran out o
39
39
## My Review Posts by Topics
40
40
I regularly update [my blog in Toward Data Science](https://medium.com/@patrickllgc).
41
41
42
-
-[Object detection in crowded scenes](??) ([related paper notes](topic_crowd_detection.md))
42
+
-[Deep-Learning based Object detection in Crowded Scenes](https://towardsdatascience.com/deep-learning-based-object-detection-in-crowded-scenes-1c9fddbd7bc4) ([related paper notes](topic_crowd_detection.md))
43
43
-[Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving](https://towardsdatascience.com/monocular-birds-eye-view-semantic-segmentation-for-autonomous-driving-ee2f771afb59)
44
44
-[Deep Learning in Mapping for Autonomous Driving](https://towardsdatascience.com/deep-learning-in-mapping-for-autonomous-driving-9e33ee951a44)
45
45
-[Monocular Dynamic Object SLAM in Autonomous Driving](https://towardsdatascience.com/monocular-dynamic-object-slam-in-autonomous-driving-f12249052bf1)
@@ -56,6 +56,8 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
-[MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007)[[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [Mono3D, pairwise relationship]
61
63
-[Argoverse: 3D Tracking and Forecasting with Rich Maps](https://arxiv.org/abs/1911.02620)[[Notes](paper_notes/argoverse.md)] <kbd>CVPR 2019</kbd> [HD maps, dataset, CV lidar]
@@ -144,7 +146,7 @@ Feature Extraction](https://arxiv.org/abs/2010.02893) [monodepth, semantics, Nav
144
146
-[Locating Objects Without Bounding Boxes](Locating Objects Without Bounding Boxes) <kbd>CVPR 2019</kbd>
-[SfMLearner: Unsupervised Learning of Depth and Ego-Motion from Video](https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner/cvpr17_sfm_final.pdf)[[Notes](paper_notes/sfm_learner.md)] <kbd>CVPR 2017</kbd>
581
583
-[Monodepth2: Digging Into Self-Supervised Monocular Depth Estimation](https://arxiv.org/abs/1806.01260)[[Notes](paper_notes/monodepth2.md)] <kbd>ICCV 2019</kbd> [Niantic]
582
584
-[DeepSignals: Predicting Intent of Drivers Through Visual Signals](https://arxiv.org/pdf/1905.01333.pdf)[[Notes](paper_notes/deep_signals.md)] <kbd>ICRA 2019</kbd> (@Uber, turn signal detection)
-[Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving](https://arxiv.org/abs/1906.06310)[[Notes](paper_notes/pseudo_lidar++.md)] <kbd>ICLR 2020</kbd>
585
587
-[MMF: Multi-Task Multi-Sensor Fusion for 3D Object Detection](http://www.cs.toronto.edu/~byang/papers/mmf.pdf)[[Notes](paper_notes/mmf.md)] <kbd>CVPR 2019</kbd> (@Uber, sensor fusion)
586
588
@@ -612,7 +614,7 @@ for Road Detection Algorithms](http://www.cvlibs.net/publications/Fritsch2013ITS
612
614
-[MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving](https://arxiv.org/pdf/1612.07695.pdf)[[Notes](paper_notes/multinet_raquel.md)]
613
615
-[Optimizing the Trade-off between Single-Stage and Two-Stage Object Detectors using Image Difficulty Prediction](https://arxiv.org/pdf/1803.08707.pdf) (Very nice illustration of 1 and 2 stage object detection)
614
616
-[Light-Head R-CNN: In Defense of Two-Stage Object Detector](https://arxiv.org/pdf/1711.07264.pdf)[[Notes](paper_notes/lighthead_rcnn.md)] (from Megvii)
615
-
-[CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://arxiv.org/pdf/1904.02948.pdf)[[Notes](paper_notes/csp_pedestrian.md)] <kbd>CVPR 2019</kbd> [center and scale prediction, anchor-free, near SOTA pedestrian]
617
+
-[CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://arxiv.org/abs/1904.02948)[[Notes](paper_notes/csp_pedestrian.md)] <kbd>CVPR 2019</kbd> [center and scale prediction, anchor-free, near SOTA pedestrian]
616
618
-[Review of Anchor-free methods (知乎Blog) 目标检测:Anchor-Free时代](https://zhuanlan.zhihu.com/p/62103812)[Anchor free深度学习的目标检测方法](https://zhuanlan.zhihu.com/p/64563186)[My Slides on CSP](https://docs.google.com/presentation/d/1_dUfxv63108bZXUnVYPIOAdEIkRZw5BR9-rOp-Ni0X0/)
617
619
-[DenseBox: Unifying Landmark Localization with End to End Object Detection](https://arxiv.org/pdf/1509.04874.pdf)
618
620
-[CornerNet: Detecting Objects as Paired Keypoints](https://arxiv.org/pdf/1808.01244.pdf)[[Notes](paper_notes/cornernet.md)] <kbd>ECCV 2018</kbd>
Copy file name to clipboardExpand all lines: paper_notes/agg_loss.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ Both [RepLoss](rep_loss.md) and [AggLoss](agg_loss.md) proposes additional penal
11
11
12
12
#### Key ideas
13
13
-**AggLoss**
14
-
- for GT subset associated with more than one anchor, enforce SL1 loss between the avg prediction of the anchors and the corresponding GT.
14
+
- If one GT bbox is associated with more than one anchors, encourages the prediction from all these anchors to be the same. It enforces SL1 loss between the avg prediction of the anchors and the corresponding GT. --> There seems to be something wrong in the paper's formulation. Shouldn't this be taking the avg of the abs (~SL1 loss) of the diff, rather than taking abs of the avg diff?
15
15
-**PORoI** (Part occlusion aware RoI pooling)
16
16
- A part based model: inductive bias to introduce prior structure information of human body with visible prediction into the network.
17
17
- The human body is divided into 5 parts, and each region U is compared with the visible region of bbox (V) to find IoU (intersection over U) to generate a binary visibility score.
Copy file name to clipboardExpand all lines: paper_notes/crowd_det.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ Current works are either too complex or less effective for handling highly overl
16
16
#### Key ideas
17
17
- Multiple instance prediction: The prediction of nearby proposals are expected to infer the **same set of instances**, rather than distinguishing individuals.
18
18
- Some cases are inherently difficult and ambiguous to detect and differentiate such as ◩ or ◪.
19
-
- this also greatly eases the learning ini crowded scene.
19
+
- this also greatly eases the learning in crowded scene.
20
20
- Each anchor predicts K (K=2) bboxes. When K=1, CrowdDet reduces to normal object detection.
21
21
- EMD (earth mover's distance) loss
22
22
- For all permutaions of matching, select the best matching one with smallest loss
Copy file name to clipboardExpand all lines: paper_notes/csp_pedestrian.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# [High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://arxiv.org/pdf/1904.02948.pdf) (center and scale prediction)
1
+
# [High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://arxiv.org/abs/1904.02948) (center and scale prediction)
Copy file name to clipboardExpand all lines: paper_notes/r2_nms.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ tl;dr: Predict both full bbox and visible region and use visible region for NMS.
8
8
This paper resembles that of [Visibility guided NMS](vg_nms.md), which focuses on crowd vehicle detection. The contribution of this paper is the introduction of the PPFE (paired proposal feature extractor) module. This takes the feature aggregation in [Double Anchor](double_anchor.md) to a new level.
9
9
10
10
#### Key ideas
11
-
- Visible parts of pedestrians by definition **suffer much less from occlusio**n, a relative low IoU thresh sufficiently removes the redundant bboxes locating the same pedestrian, and meanwhile avoids the large number of FP. --> This has the same motivation as [Double Anchor](double_anchor.md).
11
+
- Visible parts of pedestrians by definition **suffer much less from occlusion**, a relative low IoU thresh sufficiently removes the redundant bboxes locating the same pedestrian, and meanwhile avoids the large number of FP. --> This has the same motivation as [Double Anchor](double_anchor.md).
12
12
- The IoU between visible regions of two bboxes is a better indicator showing if two full body bboxes belong to the same pedestrian.
13
13
- The visible bbox and full bbox have high overlaps and is not as different as head-body bboxes, and thus can more reliably get regressed from the same anchor, as compared to [Double Anchor](double_anchor.md).
Copy file name to clipboardExpand all lines: paper_notes/rep_loss.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Visualization before NMS seems to be a powerful debugging tool.
19
19
- Occlusion: inter-class and intra-class. Intra-class occlusion is also named crowd occlusion, which happens when an object is occluded by objects of the same category.
20
20
- Repulsion terms:
21
21
- **RepGT**: Intersection over GT (to avoid prediction from cheating by increasing pred bbox) with smooth ln loss from [UnitBox](https://arxiv.org/abs/1608.01471). It penalizes overlap with non-target GT object.
22
-
- **RepBox**: the IoU region between two predicted bboxes with different designated targets needs to be small. This means the predicted bboes with diff regression targets are more likely to be merged into one after NMS.
22
+
- **RepBox**: encourages that the IoU region between two predicted bboxes with different designated targets needs to be small. This means the predicted bboxes with diff regression targets are less likely to be merged into one after NMS.
23
23
- The selection of IoU or IoG is due to their boundedness within [0, 1].
24
24
- Smooth ln loss: more robust to outliers.
25
25
- pred bboxes are much denser than the GT boxes, a pair of two pred bboxes are more likely to have a larger overlap than a pair of one predicted box and one GT box. Thus RepBox is more likely to have outliers than in RepGT.
Copy file name to clipboardExpand all lines: paper_notes/vg_nms.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
# [Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes](https://ml4ad.github.io/files/papers/Visibility%20Guided%20NMS:%20Efficient%20Boosting%20of%20Amodal%20Object%20Detection%20in%20Crowded%20Traffic%20Scenes.pdf)
1
+
# [Visibility Guided NMS: Efficient Boosting of Amodal Object Detection in Crowded Traffic Scenes](https://arxiv.org/abs/2006.08547)
2
2
3
3
_June 2020_
4
4
@@ -14,7 +14,7 @@ This is very similar to [R2 NMS](r2_nms.md) in CVPR 2020, which focuses on crowd
14
14
#### Key ideas
15
15
- Training object detector with 4 additional attributes. Thus it predicts both the visible part (pixel-based bbox) and the entire object (amodal bbox).
16
16
- VG-NMS: NMS is performed on the pixel-based bbox that describe the actually visible parts but output the amodal bboxes that belong to the indices that rare retained during pixel-based NMS.
17
-
- Pixel based modal bbox can be generated from segmentation mask or ordered amodal bbox.
17
+
- Pixel based modal bbox can be generated from segmentation mask. --> Or they could be generated from the ordering of amodal bbox based on geometric priors. For example, bbox with large ymax is closer to camera.
-[RepLoss: Repulsion Loss: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1711.07752)[[Notes](paper_notes/rep_loss.md)] <kbd>CVPR 2018</kbd> [crowd detection, Megvii]
4
-
-[Adaptive NMS: Refining Pedestrian Detection in a Crowd](https://arxiv.org/abs/1904.03629)[[Notes](paper_notes/adaptive_nms.md)] <kbd>CVPR 2019 oral</kbd> [crowd detection, NMS]
5
4
-[AggLoss: Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd](https://arxiv.org/abs/1807.08407)[[Notes](paper_notes/agg_loss.md)] <kbd>ECCV 2018</kbd> [crowd detection]
6
-
-[CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions](https://arxiv.org/abs/2003.09163)[[Notes](paper_notes/crowd_det.md)] <kbd>CVPR 2020 oral</kbd> [crowd detection, Megvii]
7
-
-[R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing](https://arxiv.org/abs/2003.12729)[[Notes](paper_notes/r2_nms.md)] <kbd>CVPR 2020</kbd>
5
+
-[Adaptive NMS: Refining Pedestrian Detection in a Crowd](https://arxiv.org/abs/1904.03629)[[Notes](paper_notes/adaptive_nms.md)] <kbd>CVPR 2019 oral</kbd> [crowd detection, NMS]
8
6
-[Double Anchor R-CNN for Human Detection in a Crowd](https://arxiv.org/abs/1909.09998)[[Notes](paper_notes/double_anchor.md)][head-body bundle]
7
+
-[R2-NMS: NMS by Representative Region: Towards Crowded Pedestrian Detection by Proposal Pairing](https://arxiv.org/abs/2003.12729)[[Notes](paper_notes/r2_nms.md)] <kbd>CVPR 2020</kbd>
-[CrowdDet: Detection in Crowded Scenes: One Proposal, Multiple Predictions](https://arxiv.org/abs/2003.09163)[[Notes](paper_notes/crowd_det.md)] <kbd>CVPR 2020 oral</kbd> [crowd detection, Megvii]
10
+
-[CSP: High-level Semantic Feature Detection: A New Perspective for Pedestrian Detection](https://arxiv.org/abs/1904.02948)[[Notes](paper_notes/csp_pedestrian.md)] <kbd>CVPR 2019</kbd> [center and scale prediction, anchor-free, near SOTA pedestrian]
0 commit comments