You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 16, 2025. It is now read-only.
Hi, I have met a problem. I have a single server which has 8 gpus. I used ubuntu16.04 and pytorch1.4 and my cuda is 10.0.
The problem is that I met an error when I used following command:
CUDA_VISIBLE_DEVICES=0 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-0 &
CUDA_VISIBLE_DEVICES=1 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-1 &
CUDA_VISIBLE_DEVICES=2 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-2 &
CUDA_VISIBLE_DEVICES=3 python3 scripts/train.py --dist_url 'file:///data/luwantong/nonexistent_file' --cfgs_file cfgs/yc2.yml
--checkpoint_path ./checkpoint/$id --batch_size 14 --world_size 4
--cuda --sent_weight 0.25 | tee log/$id-3
ValueError: Error initializing torch.distributed using file:// rendezvous: rank parameter missing