-
Notifications
You must be signed in to change notification settings - Fork 0
Papers
Sparsity in Deep Learning: Pruning and growth for efficient
inference and training in neural networks
arxiv.org 2021 Link
TORSTEN HOEFLER, et al.
Funnel-Transformer: Filtering out Sequential
Redundancy for Efficient Language Processing
arxiv.org 2020 Link
Google AI Brain Team, Zihang Dai, et al.
Training Compute-Optimal Large Language Models
Chinchilla, training with smaller models and more data
arxiv.org 2020 Link
DeepMind, Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al.
RoBERTa: A Robustly Optimized BERT Pretraining Approach
An essay about the importance of pretraining when compressing models
arxiv.org 2019 Link
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, et al.
Quadapter: Adapter for GPT-2 Quantization
its hard to quantize GPT-2 and similar decoder based models. Ideas for preventing overfitting for finetuning
arxiv.org 2022 Link
Qualcomm AI Research, Minseop Park, Jaeseong You, Markus Nagel, Simyung Chang