From 800aeeeee178ab3bc7851543bbe1a0e9d665e080 Mon Sep 17 00:00:00 2001
From: Haoyu Wang <thuwanghaoyu@gmail.com>
Date: Fri, 17 Mar 2023 14:17:47 -0400
Subject: [PATCH] update notes for training slowdown

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 550ce51f..ad39f92f 100644
--- a/README.md
+++ b/README.md
@@ -178,6 +178,8 @@ torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
 Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
 To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Global batch size has not been tested for optimality.
 
+In case of a slowdown during multi-A100 training, please consider changing `--fsdp` from `"full_shard auto_wrap"` to `"shard_grad_op auto_wrap"`.
+
 ### Authors
 All grad students below contributed equally and the order is determined by random draw.