Skip to content

Commit a0eb042

Browse files
committed
Add EfficientDet
1 parent 1893861 commit a0eb042

File tree

3 files changed

+34
-3
lines changed

3 files changed

+34
-3
lines changed

README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,13 @@ I regularly update [my blog in Toward Data Science](https://medium.com/@patrickl
6060
- [Paper Reading in 2019](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3)
6161

6262

63-
## 2021-09 (2)
63+
## 2021-09 (3)
6464
- [DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?](https://arxiv.org/abs/2108.06417) [[Notes](paper_notes/dd3d.md)] [mono3D, Toyota]
65+
- [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070) [[Notes](paper_notes/efficientdet.md)] <kbd>CVPR 2020</kbd> [BiFPN, Tesla AI day]
6566
- [PnPNet: End-to-End Perception and Prediction with Tracking in the Loop](https://arxiv.org/abs/2005.14711) [[Notes](paper_notes/pnpnet.md)] <kbd>CVPR 2020</kbd> [Uber ATG]
6667
- [MP3: A Unified Model to Map, Perceive, Predict and Plan](https://arxiv.org/abs/2101.06806) [[Notes](paper_notes/mp3.md)] <kbd>CVPR 2021</kbd> [Uber, planning]
68+
- [PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection](https://arxiv.org/abs/1912.13192) <kbd>CVPR 2020</kbd> [Waymo challenge 2nd place]
69+
- [LiDAR R-CNN: An Efficient and Universal 3D Object Detector](https://arxiv.org/abs/2103.15297) <kbd>CVPR 2021</kbd> [TuSimple, Naiyan Wang]
6770

6871

6972
## 2021-08 (11)
@@ -387,7 +390,6 @@ Crosswalk Behavior](http://openaccess.thecvf.com/content_ICCV_2017_workshops/pap
387390
- [Stitcher: Feedback-driven Data Provider for Object Detection](https://arxiv.org/abs/2004.12432) [[Notes](paper_notes/stitcher.md)]
388391
- [SKNet: Selective Kernel Networks](https://arxiv.org/abs/1903.06586) [[Notes](paper_notes/sknet.md)] <kbd>CVPR 2019</kbd>
389392
- [CBAM: Convolutional Block Attention Module](https://arxiv.org/abs/1807.06521) [[Notes](paper_notes/cbam.md)] <kbd>ECCV 2018</kbd>
390-
- [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070) <kbd>CVPR 2020</kbd>
391393
- [ResNeSt: Split-Attention Networks](https://arxiv.org/abs/2004.08955) [[Notes](paper_notes/resnest.md)]
392394

393395
## 2020-04 (14)
@@ -904,7 +906,6 @@ Traffic Sign and Light Detection](https://arxiv.org/abs/1806.07987) <kbd>IEEE CR
904906
- [PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation](https://arxiv.org/abs/1711.10871) <kbd>CVPR 2018</kbd> [sensor fusion, Zoox]
905907
- [Deep Hough Voting for 3D Object Detection in Point Clouds](https://arxiv.org/abs/1904.09664) <kbd>ICCV 2019</kbd> [Charles Qi]
906908
- [StixelNet: A Deep Convolutional Network for Obstacle Detection and Road Segmentation](http://www.bmva.org/bmvc/2015/papers/paper109/paper109.pdf)
907-
- [PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection](https://arxiv.org/abs/1912.13192) <kbd>CVPR 2020</kbd> [Waymo challenge 2nd place]
908909
- [PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation](https://arxiv.org/abs/2003.14032) <kbd>CVPR 2020</kbd>
909910
- [Depth Sensing Beyond LiDAR Range](https://arxiv.org/abs/2004.03048) <kbd>CVPR 2020</kbd> [wide baseline stereo with trifocal]
910911
- [Probabilistic Semantic Mapping for Urban Autonomous Driving Applications](https://arxiv.org/abs/2006.04894) <kbd>IROS 2020</kbd> [lidar mapping]

paper_notes/efficientdet.md

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# [EfficientDet: Scalable and Efficient Object Detection](https://arxiv.org/abs/1911.09070)
2+
3+
_September 2021_
4+
5+
tl;dr: BiFPN and multidimensional scaling of object detection.
6+
7+
#### Overall impression
8+
This paper follows up on the work of [EfficientNet](efficientnet.md). The FPN neck essentially is a multi-scale feature fusion that aims to find a transformation that can effectively aggregate different features and output a list of new features.
9+
10+
#### Key ideas
11+
- BiFPN (bidirectional FPN) (<-- PANet <-- FPN)
12+
- [PANet](panet.md) to introduce bottom up pathway again.
13+
- **Remove nodes** from PANet that has only has one input edge.
14+
- **Add skip connection** from origiinal input to the output node if they are at the same level
15+
- **Repeat** blocks of the above BiFPN block.
16+
- Weighted feature fusion
17+
- Baseline is to resize and sum up. Each feature may have different weight contribution (feature level attention).
18+
- Softmax works, but a linear weighting normalization may work as well.
19+
- Multidimensional/compound scaling up is more effective than single dimension scaling. Resolution, depth and width.
20+
21+
#### Technical details
22+
- [NAS-FPN](nas_fpn.md) has repeated irregular blocks.
23+
- Simply repeating FNP blocks will not lead to much benefit. Repeating PANet blocks will be better, and repeated BiFPN yields similar results but with much less computation.
24+
- This still needs object assignemtns, like [RetinaNet](retinanet.md).
25+
26+
#### Notes
27+
- [Github](https://github.com/google/automl)
28+

paper_notes/efficientnet.md

+2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ The paper proposed a simple yet principled method to scale up networks. The main
1111

1212
On the other hand, the mobilenets papers ([v1](mobilenets_v1.md), [v2](mobilenets_v2.md) and [v3](mobilenets_v3.md)) goes the other way round. They start with an efficient network and scales it down further. The channel and resolution scaling factors are usually smaller than 1. Note that **MobileNetv3-Large optimizes based on MnasNet**. Therefore EfficientNet-B* is really all about how to scale up MobileNet, and tells us that a beefed-up MobileNet is better than ResNet. In the original [MobileNetsv1](mobilenets_v1.md)
1313

14+
This paper inspired follow-up work [EfficientDet](efficientdet.md), also by Quoc Le's team.
15+
1416
#### Key ideas
1517
- The balance of width/depth/resolution can be achieved by simply scaling each of them with constant ratio.
1618
- Deeper network captures richer and more complex features

0 commit comments

Comments
 (0)