Skip to content

Pass "fuse_bn_into_conv" results in problematic graph and affects results. #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
researcher-no-name opened this issue Mar 26, 2025 · 0 comments

Comments

@researcher-no-name
Copy link

researcher-no-name commented Mar 26, 2025

When using fuse_bn_into_conv the optimizer causes a generation of redundant nodes which result in problematic model behavior. I am aware of the issue #133, however (1) I observed non-crashing model behavior that is different from the original model, (2) models with constant nodes being also affected (not only in initializer nodes) and (3) the issue opened provides no details about which models show this and to what extent. So I will provide a thorough detail of the bugs encountered here.

1. Extent Of Models Affected:

Classification and text models seemed to work quite well, however some object detection models were affected. I used the ILSVRC 2017 object detection dataset for this effort and compared top-K results for vision models, calculating F1 scores (using an IoU threshold of 0.5 to 0.9), precision and recall for object detection.

For classification models, EfficientNet-Lite4 of (opset=11) was very slightly affected, with a top-10 label prediction difference of 1% between the original and the optimized model.

For Object Detection models, I observed up to 5.5% difference in F1 score, precision and recall of SSD model (opset=12), , while for the same setting, YOLOv3 (opset=11) presented differences of up to 1.5%, while Tiny YOLOv3-11 also had small differences.

In addition, I found that SSD label accuracy is affected, with top-1 label accuracy to differentiate by 3.4% between original and optimized model, with this performance worsening for top-10 labels above 5%.

2. Overall Behavior:

As mentioned in #133, this problem occurs quite commonly in the presence of initializers.
However, it occurs in the presence of the weight and the absence of bias initializer, specifically.
When bias initializers are detected in the Conv node (both weight and bias initializers are there), the process is skipped.

For example, node conv2d_10 in TinyYOLOv3, opset=11.
This is the original node structure (the node is in the centre):

Image

And this is how it is transformed using the pass:

Image

Notice how it remains unaffected because Conv contains the bias initialized, but it is still affected if it contains a weight initializer only. I observed the same behavior in other models (e.g., EfficientNet-Lite4). In particular there, I observed that at BatchNormalization node succeeding Conv can also present such behavior.

The original EfficientNet-Lite4:

Image

And the optimized version.

Image

In addition, we discovered an issue related to duplicate initializer associated with this pass, which we reported separately (#174).

Important Note: This bug finding is part of a research project, in which me and my collaborators searched extensively for issues related to the ONNX optimizer. We found the tool to be quite robust across a large majority of models of different types, and we will report the results soon. However we considered valuable to report all the bugs we found here. To the developers of the optimizer, keep up the fantastic work you do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant