Add MoNet-3D

patrick-llgc · patrick-llgc · commit d2b28e39a3d5 · 2020-11-05T15:57:33.000-08:00
diff --git a/README.md b/README.md
@@ -96,7 +96,7 @@ semi-supervised training](http://openaccess.thecvf.com/content_CVPR_2019/papers/
 
 ## 2020-11 (2)
 - [Unsupervised Monocular Depth Learning in Dynamic Scenes](https://arxiv.org/abs/2010.16404) [[Notes](paper_notes/learn_depth_and_motion.md)] <kbd>CoRL 2020</kbd> [LearnK improved ver, Google]
-- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [mono3D]
+- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [Mono3D, pairwise relationship]
 - [Object-Aware Centroid Voting for Monocular 3D Object Detection](https://arxiv.org/abs/2007.09836) <kbd>IROS 2020</kbd> [mono3D]
 - [Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding](https://arxiv.org/abs/2005.13423) [mono3D]
 - [SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
@@ -296,7 +296,7 @@ Crosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/pap
 - [DETR: End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) [[Notes](paper_notes/detr.md)] <kbd>ECCV 2020 oral</kbd> [FAIR]
 - [Transformer: Attention Is All You Need](https://arxiv.org/abs/1706.03762) [[Notes](paper_notes/transformer.md)] <kbd>NIPS 2017</kbd>
 - [SpeedNet: Learning the Speediness in Videos](https://arxiv.org/abs/2004.06130) [[Notes](paper_notes/speednet.md)] <kbd>CVPR 2020 oral</kbd>
-- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] <kbd>CVPR 2020</kbd> [Mono3D]
+- [MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships](https://arxiv.org/abs/2003.00504) [[Notes](paper_notes/monopair.md)] <kbd>CVPR 2020</kbd> [Mono3D, pairwise relationship]
 - [SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation](https://arxiv.org/abs/2002.10111) [[Notes](paper_notes/smoke.md)] <kbd>CVPRW 2020</kbd> [Mono3D, Zongmu]
 - [Vehicle Re-ID for Surround-view Camera System](https://drive.google.com/file/d/1e6y8wtHAricaEHS9CpasSGOx0aAxCGib/view) [[Notes](paper_notes/reid_surround_fisheye.md)] <kbd>CVPRW 2020</kbd> [tireline, vehicle ReID, Zongmu]
 - [PSDet: Efficient and Universal Parking Slot Detection](https://arxiv.org/abs/2005.05528) <kbd>IV 2020</kbd> [Zongmu, Parking]
diff --git a/paper_notes/monet3d.md b/paper_notes/monet3d.md
@@ -5,12 +5,14 @@ _November 2020_
 tl;dr: Encodes the local geometric consistency (spatial correlation of neighboring objects) into learning.
 
 #### Overall impression
-The idea is similar to enforcing certain order in prediction. It learns the second degree of information hidden in the GT labels. It incorporates prior knowledge of geometric locality as regularization in the training module.
+The idea is similar to enforcing certain order in prediction. It learns the second degree of information hidden in the GT labels. It incorporates prior knowledge of geometric locality as regularization in the training module. The mining of pair-wise relationship if similar to [MonoPair](monopair.md).
 
-The writing is actually quite bad with heavy use of non-standard terminology.
+The writing is actually quite bad with heavy use of non-standard terminology. No ablation study on the effect of this newly introduced regularization.
 
 #### Key ideas
-- Local similarity constraints as additional regularization. If two objects are similar (close-by) in GT, then they should be similar in prediction as well.
+- Local similarity constraints as additional regularization. If two objects are similar (close-by) in GT, then they should be similar in prediction as well. 
+- The similarity is defined as $s_{ij} = \exp (-\Delta u_{ij}^2 - \Delta z_{ij}^2/\lambda)$
+- The difference between the output for different vehicles are penalized according to this metric.
 
 #### Technical details
 - Summary of technical details
diff --git a/paper_notes/monopair.md b/paper_notes/monopair.md
@@ -7,19 +7,19 @@ tl;dr: mono3D with pair wise relation and non-linear optimization.
 #### Overall impression
 This work is inspired by [CenterNet](centernet.md). it not only predicts the 3d bbox from the center of the bbox (similar to [RTM3D](RTM3D) but without predicting the eight points directly). It is similar to the popular solutions to the [Kaggle mono3D competition](https://www.kaggle.com/c/pku-autonomous-driving).
 
-The main idea is to predict distance of each instance and relative distance between neighboring pairs, and their corresponding uncertainties, then use nonlinear optimization (with g2o) for joint optimization. It refines the detection results based on spatial relationships.
+The main idea is to predict distance of each instance and relative distance between neighboring pairs, and their corresponding uncertainties, then use nonlinear optimization (with g2o) for joint optimization. It refines the detection results based on spatial relationships. The mining of pair-wise relationship if similar to [MoNet-3D](monet3d.md).
 
 MonoPair improved accuracy dramatically, especially for heavily occluded scenario.
 
 #### Key ideas
-- Range circle: diameter is set up 
+- Range circle: diameter is set up to connect the center of the two instances
 - Predicting relative distance is in local coordinate. This is a brilliant idea as this makes the regression target to be invariant to global azimuth. Regression target is multiplied by the rotational matrix of the azimuth angle.
 - Predict uncertainty helps depth estimation greatly, as shown in Table 5.
 - The joint optimization does not lead to too much improvement as shown in Table 6.
 
 
 #### Technical details
-- Regressing depth target $z = 1-\sigma(\hat{z})-1$
+- Regressing depth target $z = 1/\sigma(\hat{z})-1$
 - Weight matrix is diagonal of predicted uncertainties of diff bits. $W = \text{diag}(1/\sigma_i) $. The authors tried various weighting strategies but no improvement.
 - For images with more pair constraints, the performnace is better, even before 
 - The addition of uncertainty to depth leads to the biggest improvement.