Remove exclusive option in SLURM job submission (#27)

ytivy · Yuma Tsuta · web-flow · commit eb0702b64d27 · 2024-09-04T16:59:28.000+09:00
Co-authored-by: Yuma Tsuta &lt;yuma-tsuta@nii.ac.jp&gt;
diff --git a/evaluation/installers/llm-jp-eval-v1.3.1/README.md b/evaluation/installers/llm-jp-eval-v1.3.1/README.md
@@ -59,6 +59,9 @@ huggingface-cli login
 必要に応じて`run_llm-jp-eval.sh`・`resources/config_base.yaml`内の変数を書き換えてください
  - tokenizer・wandb entity・wandb projectを変更する場合`run_llm-jp-eval.sh`のみの変更で対応可能
  - その他の変更を行う場合、`resources/config_base.yaml`を変更した上で、`run_llm-jp-eval.sh`内でファイルを指定
+
+VRAMはモデルサイズの2.5-3.5倍必要（例: 13B model -> 33GB-45GB）<br>
+SLURM環境で実行する場合、デフォルトでは`--gpus 1`のため、`--mem`と共にクラスタに適切なサイズに設定すること
 ```shell
 cd ~/myspace
 # (Optional) If you need to change variables
@@ -73,11 +76,11 @@ CUDA_VISIBLE_DEVICES={num} bash run_llm-jp-eval.sh {path/to/model} {wandb.run_na
 ```
 
 #### Sample code
-```shell
-# For a cluster with SLURM
-sbatch --partition {partition} run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
-# For a cluster without SLURM
-CUDA_VISIBLE_DEVICES=0 bash run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
+ ``shell
+# Evaluate 70B model on a cluster with SLURM using H100 (VRAM: 80GB)
+sbatch --partition {partition} --gpus 4 --mem 8G run_llm-jp-eval.sh sbintuitions/sarashina2-70b test-$(whoami)
+# Evakyate 13B model on a cluster without SLURM using A100 (VRAM: 40GB)
+CUDA_VISIBLE_DEVICES=0,1 bash run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
 ```
 
 ## 開発者向け: resources/sha256sums.csv の作成コマンド
diff --git a/evaluation/installers/llm-jp-eval-v1.3.1/scripts/run_llm-jp-eval.sh b/evaluation/installers/llm-jp-eval-v1.3.1/scripts/run_llm-jp-eval.sh
@@ -1,10 +1,10 @@
 #!/bin/bash
 #SBATCH --job-name=llm-jp-eval
 #SBATCH --partition=<partition>
-#SBATCH --exclusive
 #SBATCH --nodes=1
+#SBATCH --cpus-per-task=8
 #SBATCH --gpus=1
-#SBATCH --ntasks-per-node=8
+#SBATCH --mem=200G
 #SBATCH --output=logs/%x-%j.out
 #SBATCH --error=logs/%x-%j.err