You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* new 32b script
* new 32b script
* beaker eval freq not upstreaming
* new var
* longer timeout on capturing cuda
* longer timeout on capturing cuda
* update params
* reduce more
* no optim
* working script
* zpg inc
* newer changes
* higher zpg
* changes
* fix
* zpg as arg
* debug
* update
* update
* del tmp script
Copy file name to clipboardExpand all lines: open_instruct/grpo_fast.py
+17-3Lines changed: 17 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -351,6 +351,12 @@ class Args:
351
351
"""vLLM top p for nucleus sampling"""
352
352
deepspeed_stage: int=0
353
353
"""the deepspeed stage"""
354
+
deepspeed_zpg: int=8
355
+
"""the deepspeed zpg value. Higher values are more memory efficient but slower. Set to 1 to disable zpg, which uses less memory but is significantly slower. Ideally is set to the number of GPUs per node (usually 8, default)."""
356
+
deepspeed_offload_param: bool=False
357
+
"""whether to offload parameters to CPU (reduces GPU memory usage)"""
358
+
deepspeed_offload_optimizer: bool=False
359
+
"""whether to offload optimizer states to CPU (reduces GPU memory usage)"""
354
360
gather_whole_model: bool=True
355
361
"""whether to gather the whole model to boardcast (not doable for 70B but can be faster for 8B)"""
0 commit comments