Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to stop memory leak while using adahessian? #540

Open
Shubh-Goyal-07 opened this issue Aug 8, 2024 · 1 comment
Open

How to stop memory leak while using adahessian? #540

Shubh-Goyal-07 opened this issue Aug 8, 2024 · 1 comment

Comments

@Shubh-Goyal-07
Copy link

No description provided.

@VadisettyRahul
Copy link

VadisettyRahul commented Oct 10, 2024

  1. Use torch.no_grad() Where Applicable: Ensure that gradient calculations are disabled when not needed, such as during validation or inference, to save memory.
with torch.no_grad():
    # Validation or inference code
  1. Delete Unused Variables: Remove any intermediate tensors or variables that are no longer needed. This can be done with Python’s del statement followed by clearing the GPU cache if using CUDA.
del variable_name
torch.cuda.empty_cache()  # Clears GPU memory
  1. Enable Gradient Checkpointing: This reduces memory consumption by recomputing parts of the graph during the backward pass rather than storing all intermediate activations. Useful when dealing with large models.
    from torch.utils.checkpoint import checkpoint
# Example of gradient checkpointing
output = checkpoint(model, input_data)
  1. Optimize Batch Size: Large batch sizes consume more memory. Reducing the batch size helps prevent memory overflows.

  2. Detach Unnecessary Tensors: Use detach() to prevent PyTorch from retaining computation graphs for tensors that no longer require gradient tracking.
    tensor = tensor.detach()

  3. Use torch.cuda.empty_cache() Regularly: In GPU operations, periodically clearing the cache can help release memory back to the GPU and prevent leaks.
    torch.cuda.empty_cache()

  4. Monitor Memory Usage: Use PyTorch's memory profiling tools to track and profile memory usage during the training process.

import torch
print(torch.cuda.memory_summary())
  1. Check for Redundant Hessian Calculations: Since AdaHessian performs second-order derivative calculations, ensure they are optimized and not needlessly repeated, as these calculations are memory-intensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants