[Quantization] MXFP8LinearConverter should offer filter_fqns instead of fqns

Float8LinearConverter offers a convenient filter_fqns method to filter out modules to not quantize ("opt out")
MXFP8LinearConverter, on the other hand, offers a fqns method to "opt in" modules to quantize.

For MXFP8LinearConverter, it is common to want to quantize all linear except the router, lm head, and attention's wk/wv (as pointed out in the official [blog](https://pytorch.org/blog/mxfp8-training-for-moes-1-3x-training-speedup-vs-bf16-for-llama4-scout-on-gb200-cluster-using-torchao-and-torchtitan/) as well). 

For such usages, the filter_fqns approach is more natural and less error prone. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Quantization] MXFP8LinearConverter should offer filter_fqns instead of fqns #3150

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Quantization] MXFP8LinearConverter should offer filter_fqns instead of fqns #3150

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions