Open
Description
I use 8 GPUS to train one model. During the training, the cuda occupied about 19K MiB (24k total), but in validation, it needs more than 24k MiB, out of memory and stop training. In training, the random crop size is 512,512, and in validation and test, the data will be resize to 512,512 (keep_ratio=False)。At the begining, I think it's due to the softmax layer in inference, because the number of class is very large(194). So I remove it, but do not fix the problem. Can you tell me other possible reasons for the problem? Thanks a lot!