You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I regularly update [my blog in Toward Data Science](https://medium.com/@patrickllgc).
48
17
@@ -66,7 +35,7 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
66
35
## 2022-07 (3)
67
36
-[PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark](https://arxiv.org/abs/2203.11089)[[Notes](paper_notes/persformer.md)][BEVNet, lane line]
68
37
-[VectorMapNet: End-to-end Vectorized HD Map Learning](https://arxiv.org/abs/2206.08920)[[Notes](paper_notes/vectormapnet.md)][BEVNet, LLD, Hang Zhao]
69
-
-[PETR: Position Embedding Transformation for Multi-View 3D Object Detection](https://arxiv.org/abs/2203.05625)[[Notes](paper_notes/petr.md)][BEVNet]
38
+
-[PETR: Position Embedding Transformation for Multi-View 3D Object Detection](https://arxiv.org/abs/2203.05625)[[Notes](paper_notes/petr.md)]<kbd>ECCV 2022</kbd> [BEVNet]
70
39
-[PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images](https://arxiv.org/abs/2206.01256)[BEVNet, MegVii]
71
40
-[LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation](https://arxiv.org/abs/2206.13294)[Valeo]
Copy file name to clipboardExpand all lines: paper_notes/petr.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,8 @@ The PETR idea resembles [CoordConv](coord_conv.md) and [CamConv](cam_conv.md), b
22
22
-[PETR](petr.md) converges slower than [DETR3D](detr3d.md). The authors argue that PETR learns the 3D correlation through global attention while DETR3D perceives 3D scene within local regions (with the help of explicit 3D-2D feature projection).
23
23
24
24
#### Technical details
25
-
- The authors argue that in [DETR3D](detr3d.md) only the image feature at the projected point will be collected, which fails to perform the representation learning from global view. --> Actually this may not be that of a big issue for BEV perception, especially for object detection, which requires very localized attention. **I would rather consider this as an advantage** of [DETR3D](detr3d.md) and methods alike, such as [BEVFormer](bevformer.md).
26
-
- The parameter settings in many of the experiments does not matter that much, and may in some part reflects the lack of domain knowledge of the authors in 3D object detection for autonomous driving. (The authors are top experts in the 2D object detection, admittedly.) For example, Table 4 ablation study is not necessary, in particular the Z range of -10 to 10 meters.
25
+
- The authors argue that in [DETR3D](detr3d.md) only the image feature at the projected point will be collected, which fails to perform the representation learning from global view. --> Actually this may not be that of a big issue for BEV perception, especially for object detection, which requires very localized attention. **I would rather consider this as an advantage** of [DETR3D](detr3d.md) and methods alike, such as [BEVFormer](bevformer.md). --> Maybe adding this 2D-3D explicit link will boost the performance even further, with faster convergence?
26
+
- The parameter settings in many of the experiments does not matter that much. For example, Table 4 ablation study is not necessary, in particular the Z range of -10 to 10 meters.
27
27
- In Fig.3, the FC seems to stand for "fully convolutional". It is actually chosen to be 1x1 in the ablation study in Table5. **What is surprising is that if 3x3 is used instead of 1x1 in the feature blending, the network cannot converge.** --> The authors argue that this breaks the correspondence between 2D feature and 3D position. This is fishy.
0 commit comments