Fix missing const's layout transformation and refactor NCHWtoNHWC passes #122

lukamac · 2025-10-14T07:47:10Z

The NCHWtoNHWC passes were missing the layout transformation of the bias (in the Conv case) or mul/add/shift (in the RequantizedConv case).
In the process of adding that I refactored the passes in general.

Changed

refactor of NCHWtoNHWC passes

Fixed

missing layout transformation of the const's (bias, mul, add, shift in Conv/RequantizedConv)

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

coderabbitai · 2025-10-14T07:55:57Z

📝 Walkthrough

Summary by CodeRabbit

Bug Fixes
- Corrected layout transformation for constants in Conv/RequantizedConv (bias, mul, add, shift), resolving layout/shape mismatches.
- Improved handling of MatMul/Gemm input transposes for more reliable conversions.
Refactor
- Streamlined NCHW→NHWC conversion passes with a unified, node-centric approach, improving consistency across operators (including depthwise conv and pooling) and enabling more predictable optimizations.
Documentation
- Updated changelog to reflect the refactor and the fixes to constant layout transformations.

Walkthrough

Refactors NCHW↔NHWC lowering to node-centric passes: adds _singleNodePattern and layout-permutation/helpers, centralizes prepend/append Transpose insertion and constant-weight layout transforms, introduces NonBranchingMatcher wiring, renames several passes, and updates Deployer imports to the new pass names.

Changes

Cohort / File(s)	Summary of Changes
Core helpers & layout transforms `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`	Added `_singleNodePattern`, `_transformLayoutPermutation`, `_invertPermutation`, `_prependTranspose`, `_appendTranspose`, `_transformLayoutConst`, `_transformLayoutDwWeightConst`; imported `NonBranchingMatcher`; consolidated and replaced ad-hoc permutation/transpose helpers.
MatMul / GEMM handling `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`	Converted `_transposeMatMulInputs_fun` to node-centric behavior (derive node from match, initialize `transA`/`transB`/`alpha`/`beta` defaults on node), and use `_prependTranspose`/`_appendTranspose` to insert transposes.
NCHW→NHWC passes & matchers `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`	Replaced per-pass ad-hoc patterns with `_singleNodePattern` + `NonBranchingMatcher(regex_op=True)` across many passes; renamed/rewired passes (e.g., `PULPDWConvPass` → `NCHWtoNHWCDwConvPass`, `PULPNCHWtoNHWCPass` → `NCHWtoNHWCPass`) to use node-centric processing.
Spatial dims & operator-specific handling `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`	Adjusted spatial-dimension calculations, applied `_transformLayoutPermutation` to inputs/outputs, applied `_transformLayoutConst` / `_transformLayoutDwWeightConst` to weights/biases, and consolidated node attribute defaults for channel ordering.
Deployer integration `Deeploy/Targets/Neureka/Deployer.py`	Updated import and instantiation: replaced `NeurekaNCHWtoNHWCPass` with `NCHWtoNHWCPass` in the 3x3 enablement branch and updated the import statement.
Changelog `CHANGELOG.md`	Added Unreleased entries documenting the refactor of NCHW→NHWC passes and fixes for missing layout transformation of consts.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant G as Graph
  participant P as NCHWtoNHWCPass
  participant M as NonBranchingMatcher
  participant U as LayoutUtils

  G->>P: run pass
  P->>M: match single-node pattern (Conv/Depthwise/Pool/Pad/Gemm)
  alt node matched
    M-->>P: matched node
    P->>U: compute spatialDims & permutation
    U-->>P: permutation & inverse
    P->>U: transform consts/weights/biases
    U-->>P: transformed tensors
    P->>G: prepend/append Transpose nodes and rewire node I/O
  else no match
    P-->>G: no-op
  end

sequenceDiagram
  autonumber
  participant G as Graph
  participant P as TransposeMatmulInputsPass
  participant M as NonBranchingMatcher
  participant T as TransposeHelpers

  G->>P: run pass
  P->>M: match MatMul/Gemm node (_singleNodePattern)
  alt node matched
    M-->>P: node map
    P->>P: ensure defaults (transA/transB/alpha/beta) on node
    P->>T: prepend/append Transpose for inputs per trans flags
    T-->>P: updated input connections
    P->>G: commit updated wiring
  else no match
    P-->>G: no-op
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly and accurately summarizes the primary changes in this pull request by highlighting both the addition of missing constant layout transformations and the refactoring of the NCHWtoNHWC passes, making it clear and specific to a reader scanning the project history.
Description Check	✅ Passed	The pull request description directly addresses the changes made by explaining the missing layout transformations for bias and other constants in the NCHWtoNHWC passes and noting the refactoring work, which aligns with the actual modifications in the code.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

Deeploy/Targets/Neureka/Deployer.py (1)
35-36: 3x3 branch pass swap is correct; tiny style nit.

Replacing PULPNCHWtoNHWCPass with NCHWtoNHWCPass under enable3x3 is appropriate. Consider enumerate for clarity.
- for idx in range(len(self.loweringOptimizer.passes)):
-     if isinstance(self.loweringOptimizer.passes[idx], PULPNCHWtoNHWCPass):
-         self.loweringOptimizer.passes[idx] = NCHWtoNHWCPass(self.default_channels_first)
+ for idx, p in enumerate(self.loweringOptimizer.passes):
+     if isinstance(p, PULPNCHWtoNHWCPass):
+         self.loweringOptimizer.passes[idx] = NCHWtoNHWCPass(self.default_channels_first)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)
102-113: Permutation helper: keep, but minor readability tweaks.

Use iterable unpacking to avoid list concatenation noise.
-    if targetChannelsFirst:
-        ch = dims - 1
-        nonBatchPerm = [
-            ch,
-        ] + list(range(batchDims, ch))
-    else:
-        ch = batchDims
-        nonBatchPerm = list(range(ch + 1, dims)) + [ch]
+    if targetChannelsFirst:
+        ch = dims - 1
+        nonBatchPerm = [ch, *range(batchDims, ch)]
+    else:
+        ch = batchDims
+        nonBatchPerm = [*range(ch + 1, dims), ch]
[Based on hints from static analysis]

133-157: Transpose helpers: ensure shapes are known to avoid crashes.

shape may be None/symbolic; _permute and _invertPermutation assume concrete sequences and will assert/raise. Guard or early-return when shapes are unknown.

Would you like a patch that skips shape-based preallocation when tensor.shape is None and relies on shape inference instead?

167-173: Small lint fixes.

Remove extraneous f-prefix.

Prefer iterable unpacking as above.
-    assert not targetChannelsFirst, f"Target layout should be channels_last!"
+    assert not targetChannelsFirst, "Target layout should be channels_last!"
-    perm = list(range(1, dims)) + [0]
+    perm = [*range(1, dims), 0]
[Based on hints from static analysis]

176-176: Prefer iterator access over list slice for first map value.

Use next(iter(...)) to avoid building a list.
-    node = list(match.nodes_map.values())[0]
+    node = next(iter(match.nodes_map.values()))
[Based on hints from static analysis]

Also applies to: 212-212, 273-273, 315-315, 383-383, 506-506

219-227: Deriving spatialDims for Pad is brittle.

Hardcoding 2 risks breakage. Prefer deriving from input rank or pads attr.
-    elif node.op == "Pad":
-        spatialDims = 2  # Hack based on current status
+    elif node.op == "Pad":
+        in_shape = node.inputs[0].shape
+        spatialDims = (len(in_shape) - 2) if isinstance(in_shape, Sequence) and in_shape is not None else 2
Please confirm Pad cases in your models are covered by this derivation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 362033f and effba85.

📒 Files selected for processing (2)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (12 hunks)
Deeploy/Targets/Neureka/Deployer.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

Deeploy/Targets/Neureka/Deployer.py (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

NCHWtoNHWCPass (357-366)

PULPNCHWtoNHWCPass (370-379)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

106-108: Consider [ch, *list(range(batchDims, ch))] instead of concatenation

Replace with [ch, *list(range(batchDims, ch))]

(RUF005)

111-111: Consider [*list(range(ch + 1, dims)), ch] instead of concatenation

Replace with [*list(range(ch + 1, dims)), ch]

(RUF005)

168-168: f-string without any placeholders

Remove extraneous f prefix

(F541)

171-171: Consider [*list(range(1, dims)), 0] instead of concatenation

Replace with [*list(range(1, dims)), 0]

(RUF005)

175-175: Unused function argument: name

(ARG001)

176-176: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

211-211: Unused function argument: name

(ARG001)

212-212: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

273-273: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

314-314: Unused function argument: name

(ARG001)

315-315: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

331-331: Avoid specifying long messages outside the exception class

(TRY003)

383-383: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

504-504: Unused function argument: name

(ARG001)

506-506: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

🔇 Additional comments (5)

Deeploy/Targets/Neureka/Deployer.py (1)

11-11: Import rename aligns with new public API.

Switch to NCHWtoNHWCPass/PULPNCHWtoNHWCPass looks correct and consistent with refactored passes.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

17-23: Helper _singleNodePattern is a good simplification.

This reduces boilerplate and pairs well with NonBranchingMatcher(regex_op=True).

356-365: Pass sequencing/naming update looks good.

Sequencing Pad → MaxPool → DwConv → Conv under NCHW→NHWC and PULP variants is clear and consistent.

Also applies to: 369-378

402-418: RequantizedGemm defaults and B handling are sensible.

Defaulting trans/alpha/beta and in-place transpose of constant B is fine. If B can be shared across nodes, consider cloning to avoid unintended side effects.

Do any graphs share the same Constant B across multiple GEMMs? If yes, we should clone B before mutating.

500-501: Pattern helper usage for RemoveGlobalOutputReshape/ConvBias passes LGTM.

Simplifies patterns without changing behavior.

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

coderabbitai · 2025-10-14T07:56:37Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- More consistent NCHW↔NHWC layout conversions across supported operators.
Bug Fixes
- Corrected layout transformation of constants (bias, mul, add, shift) in Conv/RequantizedConv, improving accuracy when switching between NCHW and NHWC.
Refactor
- Streamlined and unified layout-conversion passes for improved robustness and maintainability.
Documentation
- Changelog updated with notes on the NCHW↔NHWC pass refactor and the constant layout transformation fix.

Walkthrough

The diff refactors NCHW↔NHWC layout transformation passes by introducing generic permutation and transpose helpers, ensuring constants (weights/biases) are transformed. Several pass implementations are updated to use single-node patterns and new helpers. A pass class is renamed, and the deployer is updated accordingly. The changelog records these updates.

Changes

Cohort / File(s)	Summary
Changelog updates `CHANGELOG.md`	Added Unreleased PR note; added “Changed” entry for refactoring NCHWtoNHWC passes; added “Fixed” entry for missing layout transformation of consts in Conv/RequantizedConv.
Lowering passes refactor and helpers `Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py`	Added generic helpers: single-node pattern creation, layout permutation/inversion, prepend/append transpose, and constant transformers (including DW weights). Replaced ad-hoc permutations and inline graph builds. Updated passes (Conv, RequantizedConv, DWConv, MatMul/GEMM, NCHW↔NHWC) to use new helpers and NonBranchingMatcher; adjusted attribute handling. Introduced/renamed pass classes accordingly.
Deployer pass rename usage `Deeploy/Targets/Neureka/Deployer.py`	Updated import and usage to reflect pass class rename from NeurekaNCHWtoNHWCPass to NCHWtoNHWCPass; adjusted isinstance checks and replacement logic.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Deployer as NeurekaDeployer
  participant Pipeline as OptimizationPipeline
  participant Pass as NCHWtoNHWCPass
  participant Graph as ONNX Graph
  participant Helper as LayoutHelpers

  User->>Deployer: deploy(model)
  Deployer->>Pipeline: build passes
  Pipeline->>Pass: run(Graph)
  activate Pass

  Note over Pass,Helper: For each matched node (Conv/MatMul/...):
  Pass->>Helper: compute permutation (dims, spatialDims, targetChannelsFirst)
  Helper-->>Pass: perm, invPerm

  alt input layout needs change
    Pass->>Graph: prepend Transpose on inputs
    Pass->>Helper: transform constants (weights/bias/bn/mul/add)
    Helper-->>Pass: constants permuted
    Pass->>Graph: adjust node attrs (e.g., transA/B)
    Pass->>Graph: append Transpose on outputs
  else no change
    Pass-->>Graph: no-op
  end

  deactivate Pass
  Pipeline-->>Deployer: optimized Graph
  Deployer-->>User: deployment artifacts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly identifies the primary changes by mentioning both the fix for missing constant layout transformations and the refactoring of the NCHWtoNHWC passes, making it specific and directly related to the main modifications in the pull request.
Description Check	✅ Passed	The description directly describes the missing layout transformations for bias, mul/add/shift constants and the general refactor of the NCHWtoNHWC passes, clearly reflecting the content of the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
102-172: LGTM! Well-designed layout transformation API.

The new transformation helpers (_transformLayoutPermutation, _invertPermutation, _prependTranspose, _appendTranspose, _transformLayoutConst, _transformLayoutDwWeightConst) provide a clean, reusable abstraction for layout transformations. The API is well-structured and improves code maintainability.

Optional: Consider using unpacking syntax for list concatenation as suggested by static analysis (lines 106-108, 111, 171):
-        nonBatchPerm = [
-            ch,
-        ] + list(range(batchDims, ch))
+        nonBatchPerm = [ch, *range(batchDims, ch)]
-        nonBatchPerm = list(range(ch + 1, dims)) + [ch]
+        nonBatchPerm = [*range(ch + 1, dims), ch]
-    perm = list(range(1, dims)) + [0]
+    perm = [*range(1, dims), 0]

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 362033f and 5ff89ae.

📒 Files selected for processing (3)

CHANGELOG.md (3 hunks)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (12 hunks)
Deeploy/Targets/Neureka/Deployer.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

Deeploy/Targets/Neureka/Deployer.py (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

NCHWtoNHWCPass (357-366)

PULPNCHWtoNHWCPass (370-379)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

106-108: Consider [ch, *list(range(batchDims, ch))] instead of concatenation

Replace with [ch, *list(range(batchDims, ch))]

(RUF005)

111-111: Consider [*list(range(ch + 1, dims)), ch] instead of concatenation

Replace with [*list(range(ch + 1, dims)), ch]

(RUF005)

168-168: f-string without any placeholders

Remove extraneous f prefix

(F541)

171-171: Consider [*list(range(1, dims)), 0] instead of concatenation

Replace with [*list(range(1, dims)), 0]

(RUF005)

175-175: Unused function argument: name

(ARG001)

176-176: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

211-211: Unused function argument: name

(ARG001)

212-212: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

273-273: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

314-314: Unused function argument: name

(ARG001)

315-315: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

331-331: Avoid specifying long messages outside the exception class

(TRY003)

383-383: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

504-504: Unused function argument: name

(ARG001)

506-506: Prefer next(iter(match.nodes_map.values())) over single element slice

Replace with next(iter(match.nodes_map.values()))

(RUF015)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)

GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)

🔇 Additional comments (9)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (7)

17-23: LGTM! Clean helper for single-node patterns.

The _singleNodePattern helper effectively consolidates repeated graph construction logic throughout the file, improving maintainability.

234-237: Critical fix: Const tensor layout transformation now applied.

This code correctly transforms all constant tensors (weights, bias, mul, add, shift) for Conv and RequantizedConv operations. Previously, these layout transformations were missing, which would cause incorrect behavior when switching between NCHW and NHWC layouts.

292-297: Critical fix: DW Conv const layout transformation now applied.

Correctly transforms all constant tensors for depthwise convolutions, including the specialized DW weight transformation (line 292) and additional constants like bias/mul/add/shift (lines 294-297). This fixes the missing layout transformations mentioned in the PR description.

338-340: Critical fix: PULP DW Conv const layout transformation now applied.

Correctly transforms all constant tensors for PULP-specific depthwise convolutions. This completes the const transformation fix across all Conv variants.

175-197: LGTM! Refactored to use new transpose helpers.

The function correctly uses .get() with defaults for GEMM/MatMul attributes and applies the new transpose helpers. The unused name parameter flagged by static analysis is part of the required callback signature for ReplaceSequentialPatternPass, so this is expected.

203-379: LGTM! Pass classes correctly updated.

All pass classes have been properly refactored to use:

_singleNodePattern for cleaner pattern creation

New transformation helpers for layout conversions

NonBranchingMatcher(regex_op=True) where needed to match multiple operators (e.g., "Conv|RequantizedConv")

The class naming is now more consistent (e.g., NCHWtoNHWCDwConvPass, PULPNCHWtoNHWCDwConvPass).

382-525: LGTM! Remaining functions consistently refactored.

The remaining transformation functions (_requantized_gemm_to_pw_fun, _remove_global_output_reshape_fun, _remove_empty_conv_bias_fun) have been consistently updated to:

Use .get() with defaults for attribute access

Use new transpose helpers where applicable

Use _singleNodePattern in their corresponding pass classes

Deeploy/Targets/Neureka/Deployer.py (1)

11-11: LGTM! Import and usage correctly updated.

The changes correctly reflect the class rename from the refactored LoweringOptimizationPasses module. The logic remains unchanged—when 3x3 mode is enabled, the PULPNCHWtoNHWCPass is replaced with the generic NCHWtoNHWCPass.

Also applies to: 35-36

CHANGELOG.md (1)

7-7: LGTM! Changelog accurately documents the changes.

The changelog entries correctly capture:

The PR reference and summary (line 7)

The refactoring of NCHWtoNHWC passes (line 76)

The critical bug fix for missing const layout transformations (line 86)

Also applies to: 76-76, 86-86

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
276-277: [Duplicate] Use safe attribute access for 'group'.

This issue was previously flagged: accessing node.attrs["group"] directly can raise KeyError when the attribute is missing. ONNX Conv defaults group=1, so use .get("group", 1) for safe access.

Apply this diff:
-    if node.attrs["group"] == 1:
+    if node.attrs.get("group", 1) == 1:
         return graph
Also applies to line 317.

🧹 Nitpick comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

226-226: Consider extracting detailed error messages.

Static analysis flags the long message in ValueError. For better maintainability, consider defining a custom exception class or moving the message to a constant.

Also applies to line 331.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe29278 and 2594ce6.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

deleteNode (67-86)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

175-175: Unused function argument: name

(ARG001)

211-211: Unused function argument: name

(ARG001)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

314-314: Unused function argument: name

(ARG001)

331-331: Avoid specifying long messages outside the exception class

(TRY003)

484-484: Unused function argument: name

(ARG001)

502-502: Unused function argument: name

(ARG001)

🔇 Additional comments (3)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)

211-241: Well-structured refactoring of layout transformation logic.

The refactored code effectively centralizes the layout transformation using the new helper functions. The spatialDims determination is operator-specific, and the transformation correctly handles both data tensors (via transpose nodes) and constant tensors (via in-place modification).

382-473: Clean transformation from RequantizedGemm to RequantizedConv.

The refactored code correctly handles the transformation, including:

Proper reordering of mul/add parameters (lines 457-460)

Dimension expansion/squeeze for convolution compatibility

Transpose handling with the new helper functions

Attribute defaults for GEMM operations (lines 402-406)

203-209: Consistent migration to single-node pattern.

The updated pass constructors consistently use _singleNodePattern and appropriate matchers, improving code clarity and maintainability.

Also applies to: 245-269, 305-311, 348-353, 357-379, 477-499, 515-520

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
159-164: Avoid mutating shared gs.Constant values.

_transformLayoutConst reassigns const.values to a transposed view without copying. If a gs.Constant (e.g. an ONNX initializer) is shared by multiple nodes, this will cause unintended side effects.

Based on past review comments

272-301: Potential KeyError on missing 'group' attr.

Line 276: ONNX Conv defaults group=1; accessing via ["group"] can raise KeyError when the attribute is missing.

Apply this diff:
-    if node.attrs["group"] == 1:
+    if node.attrs.get("group", 1) == 1:
         return graph
Based on past review comments

314-338: Potential KeyError on missing 'group' attr.

Line 318: ONNX Conv defaults group=1; accessing via ["group"] can raise KeyError when the attribute is missing.

Apply this diff:
-    if node.attrs["group"] == 1:
+    if node.attrs.get("group", 1) == 1:
         return graph
Based on past review comments

167-172: Function only supports one direction but may receive opposite.

_transformLayoutDwWeightConst asserts not targetChannelsFirst (line 168), restricting it to converting TO channels_last. However, at line 292, this function is called with default_channels_first, which can be True (the constructor default at line 307), causing an assertion failure at runtime.

Apply this diff to support both directions or validate at the call site:
-def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None:
-    assert not targetChannelsFirst, "Target layout should be channels_last!"
+def _transformLayoutDwWeightConst(const: gs.Constant, spatialDims: int, targetChannelsFirst: bool) -> None:
     assert isinstance(const, gs.Constant)
     dims = len(const.shape)
-    perm = [*range(1, dims), 0]
+    if targetChannelsFirst:
+        # channels_last -> channels_first: move last dim to first
+        perm = [dims - 1, *range(dims - 1)]
+    else:
+        # channels_first -> channels_last: move first dim to last
+        perm = [*range(1, dims), 0]
     const.values = const.values.transpose(perm)
Then update the call site at line 292:
-        _transformLayoutDwWeightConst(node.inputs[1], default_channels_first)  # weights
+        _transformLayoutDwWeightConst(node.inputs[1], spatialDims, default_channels_first)  # weights

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2594ce6 and 01a22b9.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

deleteNode (67-86)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

175-175: Unused function argument: name

(ARG001)

211-211: Unused function argument: name

(ARG001)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

314-314: Unused function argument: name

(ARG001)

478-478: Unused function argument: name

(ARG001)

496-496: Unused function argument: name

(ARG001)

🔇 Additional comments (5)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)

17-23: LGTM! Nice refactoring.

The _singleNodePattern helper centralizes single-node pattern construction and improves code reusability across multiple pass definitions.

102-117: LGTM! Layout helpers are well-designed.

The permutation helper functions correctly implement bidirectional NCHW ↔ NHWC transformations and permutation inversion.

133-156: LGTM! Transpose insertion helpers are correct.

Both _prependTranspose and _appendTranspose correctly handle tensor rewiring and shape calculations with proper permutation directions.

211-241: LGTM! Core fix implemented correctly.

The function now correctly transforms constant tensors (bias, mul, add, shift) for Conv and RequantizedConv operations, which was the primary objective of this PR. The refactored implementation using new helper functions improves maintainability.

474-514: LGTM! Consistent refactoring.

The pass classes are consistently updated to use the new _singleNodePattern helper, improving code maintainability and reducing duplication.

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

…ation to conform to <type>Permutation

…c PULP transformation function

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
276-277: Potential KeyError on missing 'group' attr. Default to 1.

ONNX Conv defaults group=1; accessing via ["group"] can raise KeyError.

Apply this diff:
-    if node.attrs["group"] == 1:
+    if node.attrs.get("group", 1) == 1:
         return graph
Also applies to: 317-318

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01a22b9 and c3d504c.

📒 Files selected for processing (3)

CHANGELOG.md (3 hunks)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)
Deeploy/Targets/Neureka/Deployer.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Deeploy/Targets/Neureka/Deployer.py

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

deleteNode (67-86)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

175-175: Unused function argument: name

(ARG001)

211-211: Unused function argument: name

(ARG001)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

314-314: Unused function argument: name

(ARG001)

478-478: Unused function argument: name

(ARG001)

496-496: Unused function argument: name

(ARG001)

🔇 Additional comments (5)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)

17-22: LGTM: Cleaner pattern creation.

The _singleNodePattern helper nicely centralizes single-node pattern creation, reducing boilerplate and improving maintainability across all the pass constructors.

175-197: LGTM: Proper defaults and node-centric design.

The function correctly derives the node from the match, sets default attributes safely with .get(), and uses the new transpose helpers appropriately.

234-237: LGTM: Fixes missing const layout transformations.

This correctly applies layout transformations to constant inputs (weights, bias, mul, add, shift) for Conv and RequantizedConv, addressing the core issue described in the PR objectives.

351-373: LGTM: Clean pass structure and naming.

The refactored passes use consistent naming, simplified pattern construction via _singleNodePattern, and properly compose the individual transformation passes into the main NCHWtoNHWCPass and PULPNCHWtoNHWCPass.

396-411: Approve with mutation caveat.

The function correctly sets default attributes and uses the new transpose helpers. However, note the in-place mutation issue at line 411 that was flagged separately.

CHANGELOG.md

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)
159-164: Critical: Avoid mutating shared constants.

Line 164's .transpose(perm) returns a view, not a copy. If const is shared (e.g., an ONNX initializer used by multiple nodes), this mutates all references.

Apply this diff:
-    const.values = const.values.transpose(perm)
+    const.values = const.values.transpose(perm).copy()
Based on learnings (past review comment on lines 159-164).

167-172: Major: Assertion will fail for channels_first callers.

Line 168 asserts targetChannelsFirst must be False, but line 293 calls this with default_channels_first, which defaults to True (per line 308's constructor). This causes a runtime assertion failure.

Additionally, line 172 has the same shared-constant mutation issue as flagged above. Apply both fixes:
-def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None:
-    assert not targetChannelsFirst, "Target layout should be channels_last!"
+def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None:
     assert isinstance(const, gs.Constant)
     dims = len(const.shape)
     perm = [*range(1, dims), 0]
-    const.values = const.values.transpose(perm)
+    if not targetChannelsFirst:
+        # Transpose from channels_first to channels_last
+        const.values = const.values.transpose(perm).copy()
+    else:
+        # Transpose from channels_last to channels_first (inverse perm)
+        inv_perm = _invertPermutation(perm)
+        const.values = const.values.transpose(inv_perm).copy()
Or, if the function should only handle one direction, rename it and validate at call sites.

Based on learnings (past review comments on lines 167-172 and 159-164).

404-412: Critical: Avoid mutating shared constant at line 412.

Line 412 modifies matrixB.values in-place with .transpose(). If this constant is shared, this causes unintended side effects.

Apply this diff:
-        matrixB.values = matrixB.values.transpose(perm)
+        matrixB.values = matrixB.values.transpose(perm).copy()
Based on learnings (past review comment on lines 404-412).

🧹 Nitpick comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

211-241: LGTM! Comprehensive layout conversion.

The function correctly:

Determines spatial dims per operator type

Inserts input/output transposes

Transforms constant layouts for Conv weights/bias and RequantizedConv constants

Minor: Consider extracting line 226's long error message to a constant or using a shorter inline message.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3d504c and 6c4866c.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

deleteNode (67-86)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

175-175: Unused function argument: name

(ARG001)

211-211: Unused function argument: name

(ARG001)

226-226: Avoid specifying long messages outside the exception class

(TRY003)

272-272: Unused function argument: name

(ARG001)

315-315: Unused function argument: name

(ARG001)

479-479: Unused function argument: name

(ARG001)

497-497: Unused function argument: name

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled

🔇 Additional comments (7)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (7)

17-22: LGTM! Clean pattern-builder helper.

This helper nicely centralizes single-node pattern creation, eliminating repeated boilerplate.

97-117: LGTM! Solid permutation utilities.

The layout permutation logic correctly handles batch dimensions and spatial dimensions for channels-first ↔ channels-last conversions.

133-156: LGTM! Transpose insertion correctly rewires graph.

The prepend/append transpose helpers properly create intermediate tensors, compute transposed shapes, and update node connections.

175-197: LGTM! Cleaner transpose insertion using helpers.

The refactor correctly uses _appendTranspose and attribute defaults, making the logic more maintainable.

Note: The unused name parameter is required by ReplaceSequentialPatternPass's signature.

272-302: LGTM! Safe attribute access.

Line 277 correctly uses .get("group", 1) to handle missing attributes (fixing the past KeyError concern).

Note: Line 293 calls _transformLayoutDwWeightConst which has the assertion issue flagged separately in lines 167-172.

315-339: LGTM! PULP-specific depthwise handling.

Correctly handles PULP semantics (no input transpose) and uses safe attribute access.

203-515: LGTM! Well-structured pass refactoring.

The pass classes are well-organized:

Consistent use of _singleNodePattern eliminates boilerplate

NonBranchingMatcher with regex_op=True correctly handles multi-operator patterns

Sequential pass composition in NCHWtoNHWCPass and PULPNCHWtoNHWCPass maintains proper ordering

diaconuccalin · 2025-10-14T13:55:26Z

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

-
-    if opNode.attrs['group'] == 1:
+    # Skip non-dw nodes
+    if node.attrs.get("group", 1) == 1:


I think that the complete condition for DW should be: groups == in_channels and out_channels % in_channels == 0, according to the Torch documentation. There should be situations where group is greater than 1, but we would not deal with DW.

addressed here 77dffdd

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
178-179: Clarify unidirectional constraint or support bidirectional transformation.

The assertion at line 179 restricts this function to converting TO channels_last only (targetChannelsFirst=False). However, this constraint may not be obvious to callers and could cause runtime failures if called with targetChannelsFirst=True.

Consider:

Renaming to _transformLayoutDwWeightConstToChannelsLast to make the constraint explicit, or

Supporting bidirectional transformation like _transformLayoutConst does

Based on learnings (past review comment).

170-176: Critical: Avoid mutating shared constant values.

Line 175 modifies const.values in-place with .transpose(). If this constant is shared across multiple nodes (e.g., ONNX initializer), this will cause unintended side effects.

Apply this diff to create a copy:
-    const.values = const.values.transpose(perm)
+    const.values = const.values.transpose(perm).copy()
Based on learnings (past review comment).

178-183: Critical: Avoid mutating shared constant values.

Line 183 performs an in-place transpose on const.values. If the constant is shared across nodes, this causes unintended side effects.

Apply this diff to create a copy:
-    const.values = const.values.transpose(perm)
+    const.values = const.values.transpose(perm).copy()
Based on learnings (past review comment).

411-419: Critical: Avoid mutating shared constant values.

Line 419 modifies matrixB.values in-place with .transpose(). If this constant is shared across multiple nodes, this will cause unintended side effects.

Apply this diff to create a copy:
-        matrixB.values = matrixB.values.transpose(perm)
+        matrixB.values = matrixB.values.transpose(perm).copy()
Based on learnings (past review comment).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c4866c and 449f607.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

contextagnostic (285-298)

deleteNode (67-86)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

186-186: Unused function argument: name

(ARG001)

222-222: Unused function argument: name

(ARG001)

237-237: Avoid specifying long messages outside the exception class

(TRY003)

283-283: Unused function argument: name

(ARG001)

324-324: Unused function argument: name

(ARG001)

330-330: Undefined name channels_first

(F821)

337-337: Undefined name channels_first

(F821)

486-486: Unused function argument: name

(ARG001)

504-504: Unused function argument: name

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (100)

GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: generic-models / test-runner-generic
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: siracusa-models / test-runner-siracusa

🔇 Additional comments (15)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (15)

17-23: LGTM!

The new _singleNodePattern helper effectively reduces duplication across pass definitions and provides a clean way to create single-node patterns for matching.

108-111: LGTM!

The simplified implementation using range unpacking is cleaner while maintaining correct logic.

113-124: LGTM!

The _transformLayoutPermutation helper correctly computes the permutation for channels_first ↔ channels_last transformations, properly handling arbitrary batch and spatial dimensions.

127-128: LGTM!

The permutation inversion logic is correct, using index() to find the inverse mapping.

144-168: LGTM!

The _prependTranspose and _appendTranspose helpers correctly insert transpose nodes and rewire graph connections, properly computing transposed shapes using the inverse permutation.

186-208: LGTM!

The refactored logic correctly uses the new transpose helpers and properly initializes default attributes for GEMM operations.

222-252: LGTM! Key fix for missing constant transformations.

The refactored logic now correctly transforms layout for all constant inputs (lines 245-248), including:

Conv: bias constant

RequantizedConv: mul, add, and shift constants

This addresses the missing transformations mentioned in the PR objectives.

256-271: LGTM!

The refactored passes correctly use the new pattern helper and NonBranchingMatcher with regex support for matching both Conv and RequantizedConv operations.

283-311: LGTM! Fixes missing DW constant transformations.

The refactored depthwise handling now correctly transforms:

DW weight constants (line 302)

Bias/mul/add/shift constants (lines 306-307)

This addresses the missing transformations for depthwise convolutions mentioned in the PR objectives.

315-321: LGTM!

The new NCHWtoNHWCDwConvPass provides proper separation of concerns for depthwise convolution handling.

350-355: LGTM structurally, but depends on fix for _PULP_NCHWtoNHWC_dw_fun.

The pass structure is correct, but the underlying function has a critical bug with undefined channels_first (see previous comment).

359-381: LGTM!

The sequential passes are properly wired to use the new depthwise-specific passes, with appropriate ordering for pass application.

421-475: LGTM!

The logic for converting GEMM to pointwise convolution is correct, properly handling dimension expansion and node cleanup.

478-522: LGTM!

The remaining passes are consistently refactored to use the new _singleNodePattern helper and node-centric match extraction.

25-34: No change needed for _isDepthwise. ONNX Conv semantics guarantee out_channels divisible by group, so checking group == in_channels suffices.

Likely an incorrect or invalid review comment.

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

CHANGELOG.md (3)
7-7: Wording/grammar nit: use “constants” and pluralize “transformations”.

Improves clarity and style.
-- Fix missing const's layout transformation and refactor NCHWtoNHWC passes [#122](https://github.com/pulp-platform/Deeploy/pull/122)
+- Fix missing layout transformations for constants and refactor NCHWtoNHWC passes [#122](https://github.com/pulp-platform/Deeploy/pull/122)
77-77: Tighten phrasing for consistency with other “Changed” bullets.
-- Refactor of the NCHWtoNHWC passes
+- Refactor NCHWtoNHWC passes
88-88: Fix grammar and align phrasing with “Fixed” section style.
-- Missing layout transformation of the const's (bias, mul, add, shift in Conv/RequantizedConv)
+- Fixed missing layout transformations for constants (bias in Conv; mul, add, shift in RequantizedConv)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77dffdd and 5249fb4.

📒 Files selected for processing (1)

CHANGELOG.md (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)

GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3 (Transformer, 30000, 8, true, L3) / test-runner-siracusa-neureka-tiled (30000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3 (miniMobileNet, 2000, 8, true, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (microLlama/microLlama1, 10000, 8, L3) / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: deeploy-test-dmas
GitHub Check: deeploy-tiler-extension
GitHub Check: deeploy-debug
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels / test-runner-siracusa

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
170-176: Critical: Don’t mutate shared gs.Constant via view; use .copy() after transpose.

Transposing const.values and matrixB.values returns a view. If these constants are shared, downstream nodes see mutated data. Copy the array post-transpose.

Apply:
@@ def _transformLayoutConst(const: gs.Constant, spatialDims: int, targetChannelsFirst: bool) -> None:
-    const.values = const.values.transpose(perm)
+    const.values = const.values.transpose(perm).copy()
@@ def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None:
-    const.values = const.values.transpose(perm)
+    const.values = const.values.transpose(perm).copy()
@@ def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, name: str):
-        matrixB.values = matrixB.values.transpose(perm)
+        matrixB.values = matrixB.values.transpose(perm).copy()
If some initializers are truly shared by design, consider cloning the gs.Constant node per consumer before modifying values. Based on past review comments.

Also applies to: 178-184, 418-421

🧹 Nitpick comments (3)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)
283-284: Typo: rename _NCWHtoNHWC_dw_fun → _NCHWtoNHWC_dw_fun.

Avoid confusion/readability issues.
-def _NCWHtoNHWC_dw_fun(graph: gs.Graph, match: Match, name: str, default_channels_first: bool) -> gs.Graph:
+def _NCHWtoNHWC_dw_fun(graph: gs.Graph, match: Match, name: str, default_channels_first: bool) -> gs.Graph:
@@ class NCHWtoNHWCDwConvPass(ReplaceSequentialPatternPass):
-        super().__init__(graph, partial(_NCWHtoNHWC_dw_fun, default_channels_first = default_channels_first), name,
+        super().__init__(graph, partial(_NCHWtoNHWC_dw_fun, default_channels_first = default_channels_first), name,
                          NonBranchingMatcher(regex_op = True))
Also applies to: 320-321

186-186: Silence linter: unused parameter name.

Either drop the param or rename to _name to satisfy ARG001.

Examples:
-def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, name: str):
+def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, _name: str):
Apply similarly to other helpers flagged by Ruff. Based on static analysis hints.

Also applies to: 222-222, 283-283, 324-324, 487-487, 505-505

233-237: Optional: avoid brittle direct attr access and hardcoded Pad spatial dims.

Use .get('kernel_shape') with a clearer error message.

Replace Pad spatial dims hack with a TODO or compute from pads shape.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5249fb4 and 7717c7b.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)

ReplaceSequentialPatternPass (265-282)

SequentialPass (201-210)

deleteNode (67-86)

Deeploy/TilingExtension/TilingCodegen.py (1)

HyperRectangle (24-35)

🪛 Ruff (0.14.0)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

186-186: Unused function argument: name

(ARG001)

222-222: Unused function argument: name

(ARG001)

237-237: Avoid specifying long messages outside the exception class

(TRY003)

283-283: Unused function argument: name

(ARG001)

324-324: Unused function argument: name

(ARG001)

487-487: Unused function argument: name

(ARG001)

505-505: Unused function argument: name

(ARG001)

🔇 Additional comments (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
447-458: Guard required RequantizedGemm attributes to avoid KeyError.

node.attrs[...] for 'div', 'n_levels_out', 'shift', 'signed' assumes presence. Protect or validate.

Minimal safe pattern:
-    pwAttrs = {
+    required = ['div', 'n_levels_out', 'shift', 'signed']
+    missing = [k for k in required if k not in node.attrs]
+    assert not missing, f"RequantizedGemm missing attrs: {missing}"
+    pwAttrs = {
         'channels_first': False,
         'dilations': [1, 1],
         'group': 1,
         'kernel_shape': [1, 1],
         'pads': [0, 0, 0, 0],
         'strides': [1, 1],
         'div': node.attrs['div'],
         'n_levels_out': node.attrs['n_levels_out'],
         'shift': node.attrs['shift'],
         'signed': node.attrs['signed'],
     }
If these are guaranteed upstream, confirm and dismiss. Based on static analysis hints.

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

lukamac requested review from Victor-Jung and Xeratec as code owners October 14, 2025 07:47

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Show resolved Hide resolved

lukamac added 6 commits October 14, 2025 15:11

Fix bias/mul/add in NCHWtoNHWC passes and refactor

2dbcdb7

Update changelog

8081dcf

Neater permutation forming with range unpacking and renames of permut…

1c22e46

…ation to conform to <type>Permutation

Replace list indexing with next(iter()) combo

e8e05de

Remove extraneous f-prefix

4735760

Remove copied MaxPool and Pad spatial dim calculation from DW-specifi…

c3d504c

…c PULP transformation function

lukamac force-pushed the fix-nchw-to-nhwc branch from 01a22b9 to c3d504c Compare October 14, 2025 13:12

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Show resolved Hide resolved

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Show resolved Hide resolved

Add defaults for the group attribute

6c4866c

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

diaconuccalin reviewed Oct 14, 2025

View reviewed changes

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Show resolved Hide resolved

Fix depthwise check

77dffdd

lukamac force-pushed the fix-nchw-to-nhwc branch from 449f607 to 77dffdd Compare October 14, 2025 21:35

Remove extra PR entry that I thought was missing

5249fb4

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

Account for the edge case where there is only one input channel

7717c7b

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Show resolved Hide resolved

Fix missing const's layout transformation and refactor NCHWtoNHWC passes #122

Are you sure you want to change the base?

Fix missing const's layout transformation and refactor NCHWtoNHWC passes #122

Uh oh!

Conversation

lukamac commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changed

Fixed

PR Merge Checklist

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Oct 14, 2025

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

diaconuccalin Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

lukamac Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukamac commented Oct 14, 2025 •

edited

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading

lukamac Oct 14, 2025 •

edited

Loading