From ed44c96ad21adc870e6cd7113c8b6345894985ea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E6=B6=A6=E5=BF=83?= <maosicheng98@gmail.com>
Date: Sun, 14 Jan 2024 16:35:00 +0100
Subject: [PATCH] fix: translation in moe.md

chinese translation in figure caption is not consistent with the paragraph above
---
 zh/moe.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/zh/moe.md b/zh/moe.md
index ebe289dc1a..62f4d74ac3 100644
--- a/zh/moe.md
+++ b/zh/moe.md
@@ -251,7 +251,7 @@ Switch Transformers 的作者观察到，在相同的预训练困惑度下，稀
 
 <figure class="image text-center">
   <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/moe/08_superglue_dense_vs_sparse.png" alt="Table comparing fine-tuning batch size and learning rate between dense and sparse models.">
-  <figcaption>降低学习率和调大批量可以提升稀疏模型微调质量。该图来自 ST-MoE 论文</figcaption>
+  <figcaption>提高学习率和降低批量大小可以提升稀疏模型微调质量。该图来自 ST-MoE 论文</figcaption>
 </figure>
 
 此时，您可能会对人们微调 MoE 中遇到的这些挑战而感到沮丧，但最近的一篇论文 [《MoEs Meets Instruction Tuning》](https://arxiv.org/pdf/2305.14705.pdf) (2023 年 7 月) 带来了令人兴奋的发现。这篇论文进行了以下实验:
@@ -379,4 +379,4 @@ MoE 的 **量化** 也是一个有趣的研究领域。例如，[QMoE](https://a
 
 ```
 Sanseviero, et al., "Mixture of Experts Explained", Hugging Face Blog, 2023.
-```
\ No newline at end of file
+```