Batch size mismatch during loss computation #14812

Estabi · 2022-09-20T18:21:21Z

Estabi
Sep 20, 2022

Hello,

I have a pre-trained model that I had trained from scratch using Densenet Architecture with size output 2048.
I am trying to use my pretrained model for finetuning/transfer learning on a downstream task.
However, I keep having size mismatch issues at the loss computation line and loss explosion. I am thinking it has something to do with batch size not in tune with each other.
Please see a minimal reproduction of my code below:

class DownstreamTask(pl.LightningModule):
    def __init__(self, pre_model, lr=LR):
        super().__init__()
        self.network = pre_model
        self.fc = nn.Sequential(nn.Linear(2048,22)) 
        self.learning_rate = lr

    def forward(self, x):
        features = self.gaze_network(x)
        features = features.view(features.size(0), -1)
        gaze = self.fc(features)
        return gaze

    def training_step(self, batch):
        x, y = batch
        y_hat = self(x)
        loss = F.l1_loss(y_hat, y)
        self.log("my_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
        return {'loss': loss}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.learning_rate)

model = DenseNet()
model.load_state_dict(torch.load(PATH))
model.eval()

train_loader = DataLoader(TrainLoader(data_dir, batch_size, num_workers), batch_size=batch_size, shuffle=True, num_workers=int(num_workers))

learner = learner = DownstreamTask(model)
trainer = pl.Trainer(accelerator='gpu', devices=num_gpus, max_epochs=epochs, strategy='ddp', num_nodes=num_nodes,)

trainer.fit(learner, train_loader)

Here is the error I got:

/trainer.py:44: UserWarning: Using a target size (torch.Size([10, 2])) that is different to the input size (torch.Size([490, 2])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss = F.l1_loss(y_hat, y)

File "/usersDownstream/lib/python3.10/site-packages/pytorch_lightning/overrides/base.py", line 79, in forward
    output = self.module.training_step(*inputs, **kwargs)
  File "/users/trainer.py", line 44, in training_step
    loss = F.l1_loss(y_hat, y)
  File "/users/Downstream/lib/python3.10/site-packages/torch/nn/functional.py", line 3248, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/users/Downstream/lib/python3.10/site-packages/torch/functional.py", line 73, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore[attr-defined]
RuntimeError: The size of tensor a (490) must match the size of tensor b (10) at non-singleton dimension 0

I have tried to resize/reshape but nothing seems to work. It looks like there is a batch size mismatch.
I am seriously confused, I will appreciate any help. Many thanks.

Answered by akihironitta

Sep 27, 2022

@Estabi I believe this is irrelevant to PL. Have you checked the sizes of y_hat and y match? See PyTorch docs:　https://pytorch.org/docs/1.12/generated/torch.nn.functional.l1_loss.html#torch.nn.functional.l1_loss

View full answer

akihironitta · 2022-09-27T08:34:41Z

akihironitta
Sep 27, 2022

@Estabi I believe this is irrelevant to PL. Have you checked the sizes of y_hat and y match? See PyTorch docs:　https://pytorch.org/docs/1.12/generated/torch.nn.functional.l1_loss.html#torch.nn.functional.l1_loss

1 reply

Estabi Sep 27, 2022
Author

Yes, you are right, I was able to solve this by matching the shapes of the defined layers. The input to the proceeding layers must match the output of the preceding layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch size mismatch during loss computation #14812

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Batch size mismatch during loss computation #14812

Uh oh!

Uh oh!

Estabi Sep 20, 2022

Replies: 1 comment · 1 reply

Uh oh!

akihironitta Sep 27, 2022

Uh oh!

Estabi Sep 27, 2022 Author

Estabi
Sep 20, 2022

Replies: 1 comment 1 reply

akihironitta
Sep 27, 2022

Estabi Sep 27, 2022
Author