Error when using slurm #1252

lx19930 · 2022-05-06T17:50:11Z

lx19930
May 6, 2022

I tried to run train.py directly in slurm environment and got this error message.

File "train2.py", line 852, in
main()
File "train2.py", line 643, in main
amp_autocast=amp_autocast, loss_scaler=loss_scaler, model_ema=model_ema, mixup_fn=mixup_fn)
File "train2.py", line 711, in train_one_epoch
output = model(input)
File "/projects/academic/wjzheng/xliu79/anaconda3/envs/ptimm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/projects/academic/wjzheng/xliu79/pytorch-image-models/timm/models/efficientnet.py", line 557, in forward
x = self.forward_features(x)
File "/projects/academic/wjzheng/xliu79/pytorch-image-models/timm/models/efficientnet.py", line 540, in forward_features
x = self.conv_stem(x)
File "/projects/academic/wjzheng/xliu79/anaconda3/envs/ptimm/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/projects/academic/wjzheng/xliu79/anaconda3/envs/ptimm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/projects/academic/wjzheng/xliu79/anaconda3/envs/ptimm/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 444, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper___slow_conv2d_forward)

Can someone help?
Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Error when using slurm #1252

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Error when using slurm #1252

Uh oh!

lx19930 May 6, 2022

Replies: 0 comments

lx19930
May 6, 2022