-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of lookahead #242
base: master
Are you sure you want to change the base?
Conversation
Currently untested:
|
Tested Multi-GPU using Azure and verified that it at least ran for >100 iterations and produced expected outputs. That's about as much as I can validate for now. |
Also tested resuming training after starting without lookahead to confirm that works as well. |
Friendly ping to @lucidrains 😄. My own testing with lookahead resulted in excellent improvements of outputs when training without attention, I'm interested to see if others see similar improvements. My results weren't as good with attention applied, though I did do all of my testing before the recent attention fix, so it might perform better now. |
@lucidrains any chance of merging this? 😃 |
Hm, I've tried this and it doesn't actually seem to work:
|
It looks like something in PyTorch changed in the past year that makes the code not work. I promise it did work when I made the PR 😄. Unfortunately, I don't have a setup available right now to debug and fix the issue. @tannisroot If you're feeling adventurous, I think the issue might be worked around by calling Unfortunately, I'm not very familiar with implementing a PyTorch optimizer. The code for |
From https://arxiv.org/abs/2006.14567 (taken from the Projects page for this repo)
Currently testing on my own, but so far has had very promising early results.