Open
Description
我在训练GRPO的时候只会按照save steps的倍数,有没有方法能够把最后一步的checkpoints也同样保存
Train: 99%|█████████▉| 80/81 [4:49:11<03:45, 225.62s/it]
Train: 99%|█████████▉| 80/81 [4:49:11<03:45, 225.62s/it]
Train: 99%|█████████▉| 80/81 [4:49:11<03:36, 216.89s/it]
[INFO:swift] last_model_checkpoint: /home/output_checkpoints/grpo/v6-20250609-104706/checkpoint-80
[INFO:swift] best_model_checkpoint: None
可能相关的配置参数
"save_strategy": "steps",
"save_steps": 20.0,
"save_total_limit": 2,
"save_safetensors": true,
"save_on_each_node": false,
"save_only_model": false,