Skip to content

Commit d2b28e3

Browse files
committed
Add MoNet-3D
1 parent 8ea2838 commit d2b28e3

File tree

3 files changed

+10
-8
lines changed

3 files changed

+10
-8
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ semi-supervised training](http://openaccess.thecvf.com/content_CVPR_2019/papers/
9696

9797
## 2020-11 (2)
9898
- [Unsupervised Monocular Depth Learning in Dynamic Scenes](https://arxiv.org/abs/2010.16404) [[Notes](paper_notes/learn_depth_and_motion.md)] <kbd>CoRL 2020</kbd> [LearnK improved ver, Google]
99-
- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [mono3D]
99+
- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [Mono3D, pairwise relationship]
100100
- [Object-Aware Centroid Voting for Monocular 3D Object Detection](https://arxiv.org/abs/2007.09836) <kbd>IROS 2020</kbd> [mono3D]
101101
- [Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding](https://arxiv.org/abs/2005.13423) [mono3D]
102102
- [SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
@@ -296,7 +296,7 @@ Crosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/pap
296296
- [DETR: End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) [[Notes](paper_notes/detr.md)] <kbd>ECCV 2020 oral</kbd> [FAIR]
297297
- [Transformer: Attention Is All You Need](https://arxiv.org/abs/1706.03762) [[Notes](paper_notes/transformer.md)] <kbd>NIPS 2017</kbd>
298298
- [SpeedNet: Learning the Speediness in Videos](https://arxiv.org/abs/2004.06130) [[Notes](paper_notes/speednet.md)] <kbd>CVPR 2020 oral</kbd>
299-
- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] <kbd>CVPR 2020</kbd> [Mono3D]
299+
- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] <kbd>CVPR 2020</kbd> [Mono3D, pairwise relationship]
300300
- [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111) [[Notes](paper_notes/smoke.md)] <kbd>CVPRW 2020</kbd> [Mono3D, Zongmu]
301301
- [Vehicle Re-ID for Surround-view Camera System](https://drive.google.com/file/d/1e6y8wtHAricaEHS9CpasSGOx0aAxCGib/view) [[Notes](paper_notes/reid_surround_fisheye.md)] <kbd>CVPRW 2020</kbd> [tireline, vehicle ReID, Zongmu]
302302
- [PSDet: Efficient and Universal Parking Slot Detection](https://arxiv.org/abs/2005.05528) <kbd>IV 2020</kbd> [Zongmu, Parking]

paper_notes/monet3d.md

+5-3
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,14 @@ _November 2020_
55
tl;dr: Encodes the local geometric consistency (spatial correlation of neighboring objects) into learning.
66

77
#### Overall impression
8-
The idea is similar to enforcing certain order in prediction. It learns the second degree of information hidden in the GT labels. It incorporates prior knowledge of geometric locality as regularization in the training module.
8+
The idea is similar to enforcing certain order in prediction. It learns the second degree of information hidden in the GT labels. It incorporates prior knowledge of geometric locality as regularization in the training module. The mining of pair-wise relationship if similar to [MonoPair](monopair.md).
99

10-
The writing is actually quite bad with heavy use of non-standard terminology.
10+
The writing is actually quite bad with heavy use of non-standard terminology. No ablation study on the effect of this newly introduced regularization.
1111

1212
#### Key ideas
13-
- Local similarity constraints as additional regularization. If two objects are similar (close-by) in GT, then they should be similar in prediction as well.
13+
- Local similarity constraints as additional regularization. If two objects are similar (close-by) in GT, then they should be similar in prediction as well.
14+
- The similarity is defined as $s_{ij} = \exp (-\Delta u_{ij}^2 - \Delta z_{ij}^2/\lambda)$
15+
- The difference between the output for different vehicles are penalized according to this metric.
1416

1517
#### Technical details
1618
- Summary of technical details

paper_notes/monopair.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,19 @@ tl;dr: mono3D with pair wise relation and non-linear optimization.
77
#### Overall impression
88
This work is inspired by [CenterNet](centernet.md). it not only predicts the 3d bbox from the center of the bbox (similar to [RTM3D](RTM3D) but without predicting the eight points directly). It is similar to the popular solutions to the [Kaggle mono3D competition](https://www.kaggle.com/c/pku-autonomous-driving).
99

10-
The main idea is to predict distance of each instance and relative distance between neighboring pairs, and their corresponding uncertainties, then use nonlinear optimization (with g2o) for joint optimization. It refines the detection results based on spatial relationships.
10+
The main idea is to predict distance of each instance and relative distance between neighboring pairs, and their corresponding uncertainties, then use nonlinear optimization (with g2o) for joint optimization. It refines the detection results based on spatial relationships. The mining of pair-wise relationship if similar to [MoNet-3D](monet3d.md).
1111

1212
MonoPair improved accuracy dramatically, especially for heavily occluded scenario.
1313

1414
#### Key ideas
15-
- Range circle: diameter is set up
15+
- Range circle: diameter is set up to connect the center of the two instances
1616
- Predicting relative distance is in local coordinate. This is a brilliant idea as this makes the regression target to be invariant to global azimuth. Regression target is multiplied by the rotational matrix of the azimuth angle.
1717
- Predict uncertainty helps depth estimation greatly, as shown in Table 5.
1818
- The joint optimization does not lead to too much improvement as shown in Table 6.
1919

2020

2121
#### Technical details
22-
- Regressing depth target $z = 1-\sigma(\hat{z})-1$
22+
- Regressing depth target $z = 1/\sigma(\hat{z})-1$
2323
- Weight matrix is diagonal of predicted uncertainties of diff bits. $W = \text{diag}(1/\sigma_i) $. The authors tried various weighting strategies but no improvement.
2424
- For images with more pair constraints, the performnace is better, even before
2525
- The addition of uncertainty to depth leads to the biggest improvement.

0 commit comments

Comments
 (0)