Skip to content

Propose fix perceptual loss sqrt nan #8414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

cvbourne
Copy link

@cvbourne cvbourne commented Apr 7, 2025

Fixes # 8412

Description

This PR fixes a numerical stability issue in the PerceptualLoss implementation where the normalize_tensor function can produce NaN gradients when the input values are very small.

  • Moved epsilon inside the square root calculation instead of after it
  • Increased default from 1e-10 to 1e-8 for better stability
  • Added test

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.

@cvbourne cvbourne mentioned this pull request Apr 7, 2025
@KumoLiu KumoLiu requested a review from marksgraham April 8, 2025 15:14
@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 8, 2025

Thanks for the update, the changes looks fine to me.
Could you please help fix the failed checks then I could trigger the blossom tests? Thanks.

@KumoLiu KumoLiu requested review from ericspod and virginiafdez April 8, 2025 15:16
Comment on lines +274 to 276
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / (norm_factor + eps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / (norm_factor + eps)
def normalize_tensor(x: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
norm_factor = torch.sqrt(torch.sum(x**2, dim=1, keepdim=True) + eps)
return x / norm_factor

Do we want to remove eps from the denominator? As proposed eps will contribute twice to the final result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should go into an appropriate subdirectory in the tests directory. We've changed the directory structure there recently so probably tests/losses.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger.

# Create tensor
x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the optimizer is needed for this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, will remove.

x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
x_scaled = x * scale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since x is all 0, x_scaled is always going to be 0 unless you're expected float imprecision to create values here. If so, I would add a comment to mention this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

Copy link
Contributor

@virginiafdez virginiafdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me. I'd modify one of the tests, but the rest is fine.

x = torch.zeros(2, 3, 10, 10, requires_grad=True)

optimizer = optim.Adam([x], lr=0.01)
x_scaled = x * scale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the point of this test with regards to the next one; instead of a zeros tensor, couldn't it be a random one which will be then multiplied by a really small number?

Copy link
Contributor

@virginiafdez virginiafdez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides my comment about the point of one of the tests, I think this PR can be merged, as long as the errors happening on the automatic tests are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants