From ed44c96ad21adc870e6cd7113c8b6345894985ea Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=B6=A6=E5=BF=83?= Date: Sun, 14 Jan 2024 16:35:00 +0100 Subject: [PATCH] fix: translation in moe.md chinese translation in figure caption is not consistent with the paragraph above --- zh/moe.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/zh/moe.md b/zh/moe.md index ebe289dc1a..62f4d74ac3 100644 --- a/zh/moe.md +++ b/zh/moe.md @@ -251,7 +251,7 @@ Switch Transformers 的作者观察到,在相同的预训练困惑度下,稀
Table comparing fine-tuning batch size and learning rate between dense and sparse models. -
降低学习率和调大批量可以提升稀疏模型微调质量。该图来自 ST-MoE 论文
+
提高学习率和降低批量大小可以提升稀疏模型微调质量。该图来自 ST-MoE 论文
此时,您可能会对人们微调 MoE 中遇到的这些挑战而感到沮丧,但最近的一篇论文 [《MoEs Meets Instruction Tuning》](https://arxiv.org/pdf/2305.14705.pdf) (2023 年 7 月) 带来了令人兴奋的发现。这篇论文进行了以下实验: @@ -379,4 +379,4 @@ MoE 的 **量化** 也是一个有趣的研究领域。例如,[QMoE](https://a ``` Sanseviero, et al., "Mixture of Experts Explained", Hugging Face Blog, 2023. -``` \ No newline at end of file +```