-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
Hi neuralmagic team !
Very nice work with AutoFP8 ! We were thinking of integrating AutoFP8 in transformers, so that users can run your checkpoints directly with transformers ! We would simply replace the linear layers by its quantized version. Hence, we would only support the inference. Let us know if you agree with this ! The goal would be to explose the quantized linear layer class in this repo (I see that you have several quantized linear) and import it in transformers.
I will be leading the integration, so any help is appreciated ! Also, are there any big blockers that I might not have seen ?
Thanks in advance !
Metadata
Metadata
Assignees
Labels
No labels