flux fp4 example(WIP) #3537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

lanluo-nvidia wants to merge 10 commits into lluo/fp4_issue_debugging from lluo/flux_fp4

Collaborator

lanluo-nvidia commented May 28, 2025

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

lanluo-nvidia added 3 commits

May 25, 2025 19:53


          add flux example.

3f09d97


          add flux

9f803ee


          Merge branch 'lluo/fp4_issue_debugging' into lluo/flux_fp4

bbd6d97

lanluo-nvidia self-assigned this

facebook-github-bot added the cla signed label

github-actions bot added component: lowering component: conversion component: converters component: api [Python] component: dynamo labels

github-actions bot requested a review from gs-olive

May 28, 2025 16:10

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/addmm.py	2025-05-28 16:10:39.268834+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/addmm.py	2025-05-28 16:11:01.327800+00:00
@@ -6,10 +6,11 @@
from torch_tensorrt.dynamo._SourceIR import SourceIR
from torch_tensorrt.dynamo.conversion import impl
from torch_tensorrt.dynamo.conversion._ConversionContext import ConversionContext
from torch_tensorrt.fx.types import TRTTensor
import os
+

def addmm(
    ctx: ConversionContext,
    target: Target,
    source_ir: Optional[SourceIR],
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2025-05-28 16:10:39.267834+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py	2025-05-28 16:11:01.908540+00:00
@@ -272,17 +272,23 @@
            builder_config.set_memory_pool_limit(
                trt.MemoryPoolType.DLA_GLOBAL_DRAM,
                self.compilation_settings.dla_global_dram_size,
            )

-        if not self.compilation_settings.use_explicit_typing and dtype.float16 in self.compilation_settings.enabled_precisions:
+        if (
+            not self.compilation_settings.use_explicit_typing
+            and dtype.float16 in self.compilation_settings.enabled_precisions
+        ):
            builder_config.set_flag(trt.BuilderFlag.FP16)

        if dtype.int8 in self.compilation_settings.enabled_precisions:
            builder_config.set_flag(trt.BuilderFlag.INT8)

-        if not self.compilation_settings.use_explicit_typing and dtype.fp8 in self.compilation_settings.enabled_precisions:
+        if (
+            not self.compilation_settings.use_explicit_typing
+            and dtype.fp8 in self.compilation_settings.enabled_precisions
+        ):
            builder_config.set_flag(trt.BuilderFlag.FP8)

        if dtype.bfloat16 in self.compilation_settings.enabled_precisions:
            builder_config.set_flag(trt.BuilderFlag.BF16)

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/permutation.py	2025-05-28 16:10:39.269834+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/permutation.py	2025-05-28 16:11:01.949586+00:00
@@ -13,10 +13,11 @@
)
from torch_tensorrt.dynamo.conversion.impl.shape import get_shape_with_dynamic_shape
from torch_tensorrt.fx.types import TRTTensor
import os

+
def permute(
    ctx: ConversionContext,
    target: Target,
    source_ir: Optional[SourceIR],
    name: str,

lanluo-nvidia mentioned this pull request

Add fp4 support #3532

Merged

7 tasks

lanluo-nvidia added the WIP label


          Merge branch 'lluo/fp4_issue_debugging' into lluo/flux_fp4

b275873

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-08 16:14:53.013799+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-08 16:15:15.490934+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output

lanluo-nvidia added 2 commits

June 9, 2025 09:34


          test

bb7760a


          test

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-09 16:52:54.851163+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-09 16:53:17.995276+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output


          test

f757ac4

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-09 23:54:44.134169+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-09 23:55:10.573434+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output


          Merge branch 'lluo/fp4_issue_debugging' into lluo/flux_fp4

8f423a4

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:38:26.461575+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:38:49.017416+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:38:26.463575+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:38:49.337824+00:00
@@ -98,16 +98,17 @@
class _TorchTensorRTConstantFolder(ConstantFolder):  # type: ignore[misc]
    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)

    def is_impure(self, node: torch.fx.node.Node) -> bool:
-        # Set of known quantization ops to be excluded from constant folding. 
+        # Set of known quantization ops to be excluded from constant folding.
        # Currently, we exclude all quantization ops coming from modelopt library.
        quantization_ops = {}
        try:
-            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered 
+            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered
            import modelopt.torch.quantization as mtq
+
            assert torch.ops.tensorrt.quantize_op.default
            quantization_ops.add(torch.ops.tensorrt.quantize_op.default)
            quantization_ops.add(torch.ops.tensorrt.dynamic_block_quantize_op.default)
        except Exception as e:
            pass


          add save load

43e9abe

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:44:47.754478+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:45:12.051449+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:44:47.756479+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:45:12.430297+00:00
@@ -98,16 +98,17 @@
class _TorchTensorRTConstantFolder(ConstantFolder):  # type: ignore[misc]
    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)

    def is_impure(self, node: torch.fx.node.Node) -> bool:
-        # Set of known quantization ops to be excluded from constant folding. 
+        # Set of known quantization ops to be excluded from constant folding.
        # Currently, we exclude all quantization ops coming from modelopt library.
        quantization_ops = {}
        try:
-            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered 
+            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered
            import modelopt.torch.quantization as mtq
+
            assert torch.ops.tensorrt.quantize_op.default
            quantization_ops.add(torch.ops.tensorrt.quantize_op.default)
            quantization_ops.add(torch.ops.tensorrt.dynamic_block_quantize_op.default)
        except Exception as e:
            pass


          test

d54149f

github-actions bot requested changes

View reviewed changes

github-actions bot left a comment

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:46:31.608373+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/conversion/impl/quantize.py	2025-06-12 14:46:59.413246+00:00
@@ -94,6 +94,6 @@
        if axis is not None:
            dequantize_layer.axis = axis
        set_layer_name(dequantize_layer, target, name + "_dequantize", source_ir)
        dq_output = dequantize_layer.get_output(0)

-        return dq_output
\ No newline at end of file
+        return dq_output
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:46:31.610373+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/lowering/passes/constant_folding.py	2025-06-12 14:46:59.893008+00:00
@@ -98,16 +98,17 @@
class _TorchTensorRTConstantFolder(ConstantFolder):  # type: ignore[misc]
    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__(*args, **kwargs)

    def is_impure(self, node: torch.fx.node.Node) -> bool:
-        # Set of known quantization ops to be excluded from constant folding. 
+        # Set of known quantization ops to be excluded from constant folding.
        # Currently, we exclude all quantization ops coming from modelopt library.
        quantization_ops = {}
        try:
-            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered 
+            # modelopt import ensures torch.ops.tensorrt.quantize_op.default is registered
            import modelopt.torch.quantization as mtq
+
            assert torch.ops.tensorrt.quantize_op.default
            quantization_ops.add(torch.ops.tensorrt.quantize_op.default)
            quantization_ops.add(torch.ops.tensorrt.dynamic_block_quantize_op.default)
        except Exception as e:
            pass

lanluo-nvidia closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] component: conversion component: converters component: dynamo component: lowering WIP