A list of papers that I believe are either important or useful for understanding deep learning.
The most landmark and important papers in deep learning. Read these papers in order.
| Paper | Description |
|---|---|
| Backpropagation | The original paper that described the backpropagation algorithm, the central algorithm behind how modern deep learning models work. |
| AlexNet | Often considered the paper that kickstarted the modern era of deep learning, this paper proposed the idea of just stacking a bunch of layers as a performance improvement method |
| The Adam Optimizer | A key landmark in optimization algorithms for deep learning models, this paper proposes a new framework for weight updates during backpropagation. |
| Long Short-Term Memory | A huge breakthrough in sequential understanding for deep learning models, allowing them to store and tune the information they saw previously and use it for future predictions. |
| Attention is All You Need | Arugably one of the two most important papers in modern deep learning, along with AlexNet, this paper proposed the Transformer, the building block to large language models, and a huge milestone in language understanding for deep learning models. |
| Deep Reinforcement Learning | A key breakthrough in reinforcement learning, this paper combined modern efforts in deep learning with goal-based learning approaches, instead of loss-based approaches. |
| Denoising Diffusion Models | This paper proposed an architecture and algorithm for image generation that produced highly life-like images, a key landmark in artificial image understanding. |
| Language Models are Few-Shot Learners | This is the paper that was released alongside the original ChatGPT, explaining how very large language models could demonstrate viable performance in tasks they had limited knowledge in. |
Important works in computer vision.
| Paper |
|---|
| Residual Networks |
Papers about pre-transformer NLP breakthroughs.
| Paper |
|---|
| word2vec |
| Nucleus Sampling |
Papers about transformers and their applications.
| Paper |
|---|
| BERT |
| Vision Transformers |
Papers about large language models and related works on them.
| Paper |
|---|
| Chain of Thought Reasoning |
| Instruction Tuning |
| Speculative Decoding |
Historically famous papers that are still used today, but not essential reading.
| Paper |
|---|
| ReLU |
| UNet |
| XGBoost |
| Batch Normalization |
Papers about deep generative models.
| Paper |
|---|
| Variational Autoencoders |
| GANs |
Papers about reinforcement learning
| Paper |
|---|
| Proximal Policy Optimization |
Papers about research about deep learning.
| Paper |
|---|
| Contrastive Representation Learning (CLIP) |
| The Lottery Ticket Hypothesis |
Papers using deep learning to solve huge problems.
| Paper |
|---|
| AlphaFold |
Papers exploring the math of deep learning advancements.
| Paper |
|---|
| Dropout |
| Low Rank Adaptation |
| GANs as a Nash Equilibrium |