Float8LinearConverter offers a convenient filter_fqns method to filter out modules to not quantize ("opt out")
MXFP8LinearConverter, on the other hand, offers a fqns method to "opt in" modules to quantize.
For MXFP8LinearConverter, it is common to want to quantize all linear except the router, lm head, and attention's wk/wv (as pointed out in the official blog as well).
For such usages, the filter_fqns approach is more natural and less error prone.
Float8LinearConverter offers a convenient filter_fqns method to filter out modules to not quantize ("opt out")
MXFP8LinearConverter, on the other hand, offers a fqns method to "opt in" modules to quantize.
For MXFP8LinearConverter, it is common to want to quantize all linear except the router, lm head, and attention's wk/wv (as pointed out in the official blog as well).
For such usages, the filter_fqns approach is more natural and less error prone.