Replies: 2 comments
-
Thanks for sharing, I just skimmed it. Quite interesting and I'll need to read more carefully later.
|
Beta Was this translation helpful? Give feedback.
-
Briefly looking at the code, looks like the authors are using a Pytorch reimplementation of FA, which I worry may not have the same numerical behavior as the FA implementation here. E.g. here we keep accumulator in fp32, use fused multiply add (fma) instead of separate mul and add, etc. Those details all affect numerical error. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm curious if the authors have seen this paper, and whether a toggle could be added for this stability patch they proposed?
https://arxiv.org/pdf/2510.04212
Beta Was this translation helpful? Give feedback.
All reactions