Skip to content

New OptimizerInBackward #2719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

joecummings
Copy link
Contributor

@joecummings joecummings commented May 12, 2025

OptimizerInBackward

Motivation

Previously, we utilized a process very similar to the one outlined in this blog post. Although it worked, it involved lots of if/else switching in the recipe based on what composed well with optimizer fused in backwards and what did not.

Goal

Simplify our recipes (729 LOC -> 677 LOC)! This PR provides a canonical OptimizerInBackward class that can be used as a drop-in replacement for any PyTorch optimizer. It integrates into the full_finetune_single_device.py recipe and adds tests + updates documentation.

Testing

Screenshot 2025-05-15 at 7 16 23 PM

To-do

  • Integrate into full_finetune_distributed.py
  • Integrate into rest of recipes
  • Deprecate the utility functions used to create the optim in backward hooks previously

Copy link

pytorch-bot bot commented May 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2719

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ba8a975 with merge base e5ee1b2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2025
@joecummings joecummings changed the title [WIP] FusedOptimizerInBackward (redesign) New OptimizerInBackward May 14, 2025
@joecummings joecummings marked this pull request as ready for review May 14, 2025 22:37
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 74.10072% with 36 lines in your changes missing coverage. Please review.

Project coverage is 60.12%. Comparing base (c8e670b) to head (5674f9a).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
recipes/full_finetune_single_device.py 0.00% 35 Missing ⚠️
torchtune/modules/optim.py 97.05% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2719      +/-   ##
==========================================
- Coverage   60.64%   60.12%   -0.52%     
==========================================
  Files         428      432       +4     
  Lines       26091    26520     +429     
==========================================
+ Hits        15823    15946     +123     
- Misses      10268    10574     +306     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

with a model with a lot of parameters, and when you don't need to use :ref:`gradient accumulation <glossary_grad_accm>`.
.. code-block:: python
OptimizerInBackward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was already discussed but looking at it here it's confusing that we have a class that's meant to be a drop in optimizer, but if you actually want to use it in the recipe we aren't using torchtune's regular instantiation-based configuation for this component, but rely on logic inside the recipe itself to construct the optimizer.

I don't have a neat solution that doesn't involve breaking all the configs unfortunately. Maybe we don't advertise this as a separate modular component but as a core recipe feature? Power users can bear the brunt of implementing it in their own recipes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I'll work on the wording here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants