-
Notifications
You must be signed in to change notification settings - Fork 610
New OptimizerInBackward
#2719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
New OptimizerInBackward
#2719
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2719
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ba8a975 with merge base e5ee1b2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
OptimizerInBackward
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2719 +/- ##
==========================================
- Coverage 60.64% 60.12% -0.52%
==========================================
Files 428 432 +4
Lines 26091 26520 +429
==========================================
+ Hits 15823 15946 +123
- Misses 10268 10574 +306 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
with a model with a lot of parameters, and when you don't need to use :ref:`gradient accumulation <glossary_grad_accm>`. | ||
.. code-block:: python | ||
OptimizerInBackward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was already discussed but looking at it here it's confusing that we have a class that's meant to be a drop in optimizer, but if you actually want to use it in the recipe we aren't using torchtune's regular instantiation-based configuation for this component, but rely on logic inside the recipe itself to construct the optimizer.
I don't have a neat solution that doesn't involve breaking all the configs unfortunately. Maybe we don't advertise this as a separate modular component but as a core recipe feature? Power users can bear the brunt of implementing it in their own recipes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I'll work on the wording here.
OptimizerInBackward
Motivation
Previously, we utilized a process very similar to the one outlined in this blog post. Although it worked, it involved lots of if/else switching in the recipe based on what composed well with optimizer fused in backwards and what did not.
Goal
Simplify our recipes (729 LOC -> 677 LOC)! This PR provides a canonical
OptimizerInBackward
class that can be used as a drop-in replacement for any PyTorch optimizer. It integrates into thefull_finetune_single_device.py
recipe and adds tests + updates documentation.Testing
To-do
full_finetune_distributed.py