Skip to content

Commit 45bf922

Browse files
committed
Reorg notes
1 parent 2ce1eb0 commit 45bf922

13 files changed

+790
-776
lines changed

README.md

+10-6
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,14 @@ This repository contains my paper reading notes on deep learning and machine lea
66
## Summary of Topics
77
The summary of the papers read in 2019 can be found [here on Towards Data Science](https://towardsdatascience.com/the-200-deep-learning-papers-i-read-in-2019-7fb7034f05f7?source=friends_link&sk=7628c5be39f876b2c05e43c13d0b48a3).
88

9-
The sections below records paper reading activity in chronological order. See notes organized according to subfields [here](organized.md) (up to 06-2019).
9+
The sections below records paper reading activity in chronological order.
1010

1111
## What to read
12-
### Source of papers
12+
### Where to start?
13+
If you are new to deep learning in computer vision and don't know where to start, I suggest you spend your first month or so dive deep into [this list of papers](start/first_cnn_papers.md). I did so ([see my notes](start/first_cnn_papers_notes.md)) and it served me well.
14+
1315
Here is [a list of trustworthy sources of papers](trusty.md) in case I ran out of papers to read.
1416

15-
The list of resource in this [link](https://autonomous-driving.org/front/resources/) talks about various topics in Autonomous Driving.
1617

1718
### Github repos
1819
- [MMAction2](https://github.com/open-mmlab/mmaction2) [268 stars]
@@ -32,7 +33,6 @@ The list of resource in this [link](https://autonomous-driving.org/front/resourc
3233
- [ORB SLAM2](https://github.com/raulmur/ORB_SLAM2) and [Docker version](https://github.com/yuyou/ORB_SLAM2#build-docker-image)
3334
- [PySLAM v2](https://github.com/luigifreda/pyslam)
3435

35-
3636
### Youtube channels
3737
- [Modern C++ for computer vision](https://www.youtube.com/playlist?list=PLgnQpQtFTOGR50iIOtO36nK6aNPtVq98C)
3838
- [SLAM by Cyrill Stachniss](https://www.youtube.com/playlist?list=PLgnQpQtFTOGQrZ4O5QzbIHgl3b1JHimN_)
@@ -41,6 +41,7 @@ The list of resource in this [link](https://autonomous-driving.org/front/resourc
4141
- [Andrej Karpathy's Talks](./talk_notes/andrej.md)
4242

4343
## My Review Posts by Topics
44+
- [Object detection in crowded scenes](??) ([related paper notes](topic_crowd_detection.md))
4445
- [Monocular Bird’s-Eye-View Semantic Segmentation for Autonomous Driving](https://towardsdatascience.com/monocular-birds-eye-view-semantic-segmentation-for-autonomous-driving-ee2f771afb59)
4546
- [Deep Learning in Mapping for Autonomous Driving](https://towardsdatascience.com/deep-learning-in-mapping-for-autonomous-driving-9e33ee951a44)
4647
- [Monocular Dynamic Object SLAM in Autonomous Driving](https://towardsdatascience.com/monocular-dynamic-object-slam-in-autonomous-driving-f12249052bf1)
@@ -57,6 +58,7 @@ The list of resource in this [link](https://autonomous-driving.org/front/resourc
5758
## 2020-11 (16)
5859
- [Scaled-YOLOv4: Scaling Cross Stage Partial Network](https://arxiv.org/abs/2011.08036) [[Notes](paper_notes/scaled_yolov4.md)] [yolo]
5960
- [PP-YOLO: An Effective and Efficient Implementation of Object Detector](https://arxiv.org/abs/2007.12099) [yolo, paddle-paddle, baidu]
61+
- [Sparse R-CNN: End-to-End Object Detection with Learnable Proposals](https://arxiv.org/abs/2011.12450) [Transformer head]
6062
- [Unsupervised Monocular Depth Learning in Dynamic Scenes](https://arxiv.org/abs/2010.16404) [[Notes](paper_notes/learn_depth_and_motion.md)] <kbd>CoRL 2020</kbd> [LearnK improved ver, Google]
6163
- [MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time](https://arxiv.org/abs/2006.16007) [[Notes](paper_notes/monet3d.md)] <kbd>ICML 2020</kbd> [Mono3D, pairwise relationship]
6264
- [Argoverse: 3D Tracking and Forecasting with Rich Maps](https://arxiv.org/abs/1911.02620) [[Notes](paper_notes/argoverse.md)] <kbd>CVPR 2019</kbd> [HD maps, dataset, CV lidar]
@@ -117,10 +119,10 @@ Feature Extraction](https://arxiv.org/abs/2010.02893) [monodepth, semantics, Nav
117119
- [VDO-SLAM: A Visual Dynamic Object-aware SLAM System](https://arxiv.org/abs/2005.11052) <kbd>IJRR 2020</kbd>
118120
- [Dynamic SLAM: The Need For Speed](https://arxiv.org/abs/2002.08584)
119121
- [Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction](https://arxiv.org/abs/2004.10681) <kbd>ECCV 2020</kbd>
120-
- [Traffic Light Mapping and Detection](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37259.pdf) [[Notes](paper_notes/tfl_mapping_google.md)] <kbd>ICRA 2011</kbd> [traffic light, Google]
122+
- [Traffic Light Mapping and Detection](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37259.pdf) [[Notes](paper_notes/tfl_mapping_google.md)] <kbd>ICRA 2011</kbd> [traffic light, Google, Chris Urmson]
123+
- [Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles](http://driving.stanford.edu/papers/ICRA2011.pdf) <kbd>ICRA 2011</kbd> [traffic light, Sebastian Thrun]
121124
- [Traffic light recognition exploiting map and localization at every stage](https://web.yonsei.ac.kr/jksuhr/papers/Traffic%20light%20recognition%20exploiting%20map%20and%20localization%20at%20every%20stage.pdf) [[Notes](paper_notes/tfl_exploting_map.md)] <kbd>Expert Systems 2017</kbd> [traffic light, 鲜于明镐,徐在圭,郑浩奇]
122125
- [Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars](https://arxiv.org/abs/1906.11886) [[Notes](paper_notes/tfl_lidar_map_building.md)] <kbd> IJCNN 2019</kbd> [traffic light, Espirito Santo Brazil]
123-
- [Traffic Light Mapping, Localization, and State Detection for Autonomous Vehicles](http://driving.stanford.edu/papers/ICRA2011.pdf) <kbd>ICRA 2011</kbd> [traffic light]
124126
- [Evaluating State-of-the-art Object Detector on Challenging Traffic Light Data](https://openaccess.thecvf.com/content_cvpr_2017_workshops/w9/papers/Jensen_Evaluating_State-Of-The-Art_Object_CVPR_2017_paper.pdf) <kbd>CVPR 2017 workshop</kbd>
125127
- [Traffic light recognition in varying illumination using deep learning and saliency map](https://www.researchgate.net/profile/Vijay_John3/publication/265014373_Traffic_Light_Recognition_in_Varying_Illumination_using_Deep_Learning_and_Saliency_Map/links/56aac00408ae8f3865666102.pdf) <kbd>ITSC 2014</kbd> [traffic light]
126128
- [Traffic light recognition using high-definition map features](https://sci-hub.st/https://www.sciencedirect.com/science/article/abs/pii/S0921889018301234) <kbd>RAS 2019</kbd>
@@ -144,6 +146,8 @@ Feature Extraction](https://arxiv.org/abs/2010.02893) [monodepth, semantics, Nav
144146
- [NeurAll: Towards a Unified Model for Visual Perception in Automated Driving](https://arxiv.org/abs/1902.03589) <kbd>ITSC 2019 oral</kbd> [MTL]
145147
- [Locating Objects Without Bounding Boxes](Locating Objects Without Bounding Boxes) <kbd>CVPR 2019</kbd>
146148
- [Deep Evidential Regression](https://papers.nips.cc/paper/2020/file/aab085461de182608ee9f607f3f7d18f-Paper.pdf) <kbd>NeurIPS 2020</kbd> [one-pass aleatoric/epistemic uncertainty]
149+
- [Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection](https://arxiv.org/abs/2011.12885) [focal loss]
150+
- [Rethinking Transformer-based Set Prediction for Object Detection](https://arxiv.org/abs/2011.10881) [transformers, Kris Kitani]
147151

148152
## 2020-10 (14)
149153
- [TSM: Temporal Shift Module for Efficient Video Understanding](https://arxiv.org/abs/1811.08383) [[Notes](paper_notes/tsm.md)] <kbd>ICCV 2019</kbd> [Song Han, video, object detection]
File renamed without changes.
File renamed without changes.

Learning_notes.md renamed to code_notes/old_tf_notes/tf_learning_notes.md

+9-9
Original file line numberDiff line numberDiff line change
@@ -134,22 +134,22 @@ Tensorflow's broadcasting rules are designed to follow those of `numpy`'s.
134134
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the **trailing dimensions**, and works its way forward. Two dimensions are compatible when
135135

136136
- they are equal, or
137-
- one of them is 1
138-
137+
- one of them is 1
138+
139139
### Memory requirements imposed by conv layers
140-
- An example taken from O'Reilly's *Hands-on machine learning with scikit-learn and tensorflow*: input 150x100 RGB image, one conv layer with 200 5x5 filters, 1x1 stride and `SAME` padding. The output parameters would be 200 feature maps of size 150x100, with a total number of parameters of (5x5x3+1)x200 = 15200 parameters.
141-
- Computation: Each of the 200 feature maps contains 150x100 neurons, and the each neuron needs to make weighted sum of 5x5x3 inputs, that is (5x5x3)x150x100x200=225 million multiplications. Including the same amount of addition, this requires 450 million flops.
142-
- Storage: if each weight is stored in 32-bit float (double), then the output features takes 200x150x100x32/8~11.4 MB of RAM per instance. If a training batch contains 100 instance, the this layer would take up 1 GB of RAM!
143-
- During training, every layer computed during the forward pass needs to be preserved for back-propagation, so the RAM needed is at least the total amount of RAM needed.
140+
- An example taken from O'Reilly's *Hands-on machine learning with scikit-learn and tensorflow*: input 150x100 RGB image, one conv layer with 200 5x5 filters, 1x1 stride and `SAME` padding. The output parameters would be 200 feature maps of size 150x100, with a total number of parameters of (5x5x3+1)x200 = 15200 parameters.
141+
- Computation: Each of the 200 feature maps contains 150x100 neurons, and the each neuron needs to make weighted sum of 5x5x3 inputs, that is (5x5x3)x150x100x200=225 million multiplications. Including the same amount of addition, this requires 450 million flops.
142+
- Storage: if each weight is stored in 32-bit float (double), then the output features takes 200x150x100x32/8~11.4 MB of RAM per instance. If a training batch contains 100 instance, the this layer would take up 1 GB of RAM!
143+
- During training, every layer computed during the forward pass needs to be preserved for back-propagation, so the RAM needed is at least the total amount of RAM needed.
144144
- During inference, the RAM occupied by one layer can be released as soon as the next layer has been completed, so only as much RAM as required by two consecutive layers are needed.
145145

146146
### Pooling layers
147-
- Pooling reduces the input image size and also makes the NN tolerate a bit more image shift (location invariance).
147+
- Pooling reduces the input image size and also makes the NN tolerate a bit more image shift (location invariance).
148148
- Pooling works on every input channel independently. Generally you can pool over the height and width in each channel, or pool over the channels. You can not do both currently in tensorflow.
149149

150150
### CNN architecture
151-
- Typical CNN architectures stack a few convolutional layers (each one followed by a ReLU layer) and a pooling layer. The image gets smaller and smaller but gets deeper and deeper as well.
152-
- Common mistake is to make kernels too large. We can get the same effect as 9x9 kernels by stacking two 3x3 kernels.
151+
- Typical CNN architectures stack a few convolutional layers (each one followed by a ReLU layer) and a pooling layer. The image gets smaller and smaller but gets deeper and deeper as well.
152+
- Common mistake is to make kernels too large. We can get the same effect as 9x9 kernels by stacking two 3x3 kernels.
153153
- Cross entropy cost function is preferred as it penalizes bad predictions much more, producing larger gradients and thus converging faster.
154154

155155
### Random seed (graph level and op level)
File renamed without changes.
File renamed without changes.

nlp.md renamed to nlp/nlp.md

File renamed without changes.
File renamed without changes.
File renamed without changes.

where_to_start.md renamed to start/first_cnn_papers.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Papers and books to read to start deep learning
2-
This list of papers provide a good introduction to deep learning in computer vision field.
2+
This list of papers provide a good introduction to deep learning in computer vision field. My notes on these papers are [here](first_cnn_papers.md).
33

44
## Most popular network architectures
55
- AlexNet https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks

0 commit comments

Comments
 (0)