Skip to content

Commit eb0702b

Browse files
ytivyYuma Tsuta
and
Yuma Tsuta
authored
Remove exclusive option in SLURM job submission (#27)
Co-authored-by: Yuma Tsuta <[email protected]>
1 parent 68a5d31 commit eb0702b

File tree

2 files changed

+10
-7
lines changed

2 files changed

+10
-7
lines changed

evaluation/installers/llm-jp-eval-v1.3.1/README.md

+8-5
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ huggingface-cli login
5959
必要に応じて`run_llm-jp-eval.sh``resources/config_base.yaml`内の変数を書き換えてください
6060
- tokenizer・wandb entity・wandb projectを変更する場合`run_llm-jp-eval.sh`のみの変更で対応可能
6161
- その他の変更を行う場合、`resources/config_base.yaml`を変更した上で、`run_llm-jp-eval.sh`内でファイルを指定
62+
63+
VRAMはモデルサイズの2.5-3.5倍必要(例: 13B model -> 33GB-45GB)<br>
64+
SLURM環境で実行する場合、デフォルトでは`--gpus 1`のため、`--mem`と共にクラスタに適切なサイズに設定すること
6265
```shell
6366
cd ~/myspace
6467
# (Optional) If you need to change variables
@@ -73,11 +76,11 @@ CUDA_VISIBLE_DEVICES={num} bash run_llm-jp-eval.sh {path/to/model} {wandb.run_na
7376
```
7477

7578
#### Sample code
76-
```shell
77-
# For a cluster with SLURM
78-
sbatch --partition {partition} run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
79-
# For a cluster without SLURM
80-
CUDA_VISIBLE_DEVICES=0 bash run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
79+
``shell
80+
# Evaluate 70B model on a cluster with SLURM using H100 (VRAM: 80GB)
81+
sbatch --partition {partition} --gpus 4 --mem 8G run_llm-jp-eval.sh sbintuitions/sarashina2-70b test-$(whoami)
82+
# Evakyate 13B model on a cluster without SLURM using A100 (VRAM: 40GB)
83+
CUDA_VISIBLE_DEVICES=0,1 bash run_llm-jp-eval.sh llm-jp/llm-jp-13b-v2.0 test-$(whoami)
8184
```
8285
8386
## 開発者向け: resources/sha256sums.csv の作成コマンド

evaluation/installers/llm-jp-eval-v1.3.1/scripts/run_llm-jp-eval.sh

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
#!/bin/bash
22
#SBATCH --job-name=llm-jp-eval
33
#SBATCH --partition=<partition>
4-
#SBATCH --exclusive
54
#SBATCH --nodes=1
5+
#SBATCH --cpus-per-task=8
66
#SBATCH --gpus=1
7-
#SBATCH --ntasks-per-node=8
7+
#SBATCH --mem=200G
88
#SBATCH --output=logs/%x-%j.out
99
#SBATCH --error=logs/%x-%j.err
1010

0 commit comments

Comments
 (0)