-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XLoRA: training issues, Gradients will be None #2015
Comments
This sounds like the X-LoRA classifier layers don't have We're still working on a training example for X-LoRA, so it's possible that there are still some kinks that need to be ironed out. |
@benjamin-marie thanks for the example. I'll take a look this.
Hmm ok, thanks for reporting this, I'll see what could be causing it. |
Here is my model (Llama 3.1 8B):
Sure, how do you do this? None of the params seems to have a "requires_grad" but I'm not sure whether I did it right. |
First of all, you can run for name, param in model.named_parameters():
if param.requires_grad:
print(name) |
I added this code:
It prints:
And then, just after that, I run the SFTTrainer, which prints, exactly:
|
Thanks @benjamin-marie. The |
Yes, exactly. I'll try to reproduce and fix this! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Not stale |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
I installed PEFT from source.
And use the latest versions of Transformers and TRL.
I passed the XLoRA model to TRL but the training doesn't seem to work (training loss doesn't decrease and validation loss remains constant). I get this warning:
UserWarning: None of the inputs have requires_grad=True. Gradients will be None
I load Llama 3.1 (without quantization) and then run this code:
I also observed another bug: The adapters must be named "0", "1", etc in the adapters dict() otherwise training won't start and will say that the adapters don't exist.
Maybe @EricLBuehler can help with this?
The text was updated successfully, but these errors were encountered: