Skip to content
Max Kuhmichel edited this page Sep 12, 2023 · 16 revisions

Collection of research papers

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
arxiv.org 2021 Link
TORSTEN HOEFLER, et al.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
arxiv.org 2020 Link
Google AI Brain Team, Zihang Dai, et al.

Training Compute-Optimal Large Language Models
Chinchilla, training with smaller models and more data
arxiv.org 2020 Link
DeepMind, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.

RoBERTa: A Robustly Optimized BERT Pretraining Approach
An essay about the importance of pretraining when compressing models
arxiv.org 2019 Link
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, et al.

Quadapter: Adapter for GPT-2 Quantization
its hard to quantize GPT-2 and similar decoder based models. Ideas for preventing overfitting for finetuning
arxiv.org 2022 Link
Qualcomm AI Research, Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang

Clone this wiki locally