-
Notifications
You must be signed in to change notification settings - Fork 22
Fix missing const's layout transformation and refactor NCHWtoNHWC passes #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughRefactors NCHW↔NHWC lowering to node-centric passes: adds _singleNodePattern and layout-permutation/helpers, centralizes prepend/append Transpose insertion and constant-weight layout transforms, introduces NonBranchingMatcher wiring, renames several passes, and updates Deployer imports to the new pass names. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant G as Graph
participant P as NCHWtoNHWCPass
participant M as NonBranchingMatcher
participant U as LayoutUtils
G->>P: run pass
P->>M: match single-node pattern (Conv/Depthwise/Pool/Pad/Gemm)
alt node matched
M-->>P: matched node
P->>U: compute spatialDims & permutation
U-->>P: permutation & inverse
P->>U: transform consts/weights/biases
U-->>P: transformed tensors
P->>G: prepend/append Transpose nodes and rewire node I/O
else no match
P-->>G: no-op
end
sequenceDiagram
autonumber
participant G as Graph
participant P as TransposeMatmulInputsPass
participant M as NonBranchingMatcher
participant T as TransposeHelpers
G->>P: run pass
P->>M: match MatMul/Gemm node (_singleNodePattern)
alt node matched
M-->>P: node map
P->>P: ensure defaults (transA/transB/alpha/beta) on node
P->>T: prepend/append Transpose for inputs per trans flags
T-->>P: updated input connections
P->>G: commit updated wiring
else no match
P-->>G: no-op
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
Deeploy/Targets/Neureka/Deployer.py (1)
35-36
: 3x3 branch pass swap is correct; tiny style nit.Replacing PULPNCHWtoNHWCPass with NCHWtoNHWCPass under enable3x3 is appropriate. Consider enumerate for clarity.
- for idx in range(len(self.loweringOptimizer.passes)): - if isinstance(self.loweringOptimizer.passes[idx], PULPNCHWtoNHWCPass): - self.loweringOptimizer.passes[idx] = NCHWtoNHWCPass(self.default_channels_first) + for idx, p in enumerate(self.loweringOptimizer.passes): + if isinstance(p, PULPNCHWtoNHWCPass): + self.loweringOptimizer.passes[idx] = NCHWtoNHWCPass(self.default_channels_first)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)
102-113
: Permutation helper: keep, but minor readability tweaks.Use iterable unpacking to avoid list concatenation noise.
- if targetChannelsFirst: - ch = dims - 1 - nonBatchPerm = [ - ch, - ] + list(range(batchDims, ch)) - else: - ch = batchDims - nonBatchPerm = list(range(ch + 1, dims)) + [ch] + if targetChannelsFirst: + ch = dims - 1 + nonBatchPerm = [ch, *range(batchDims, ch)] + else: + ch = batchDims + nonBatchPerm = [*range(ch + 1, dims), ch][Based on hints from static analysis]
133-157
: Transpose helpers: ensure shapes are known to avoid crashes.shape may be None/symbolic; _permute and _invertPermutation assume concrete sequences and will assert/raise. Guard or early-return when shapes are unknown.
Would you like a patch that skips shape-based preallocation when tensor.shape is None and relies on shape inference instead?
167-173
: Small lint fixes.
- Remove extraneous f-prefix.
- Prefer iterable unpacking as above.
- assert not targetChannelsFirst, f"Target layout should be channels_last!" + assert not targetChannelsFirst, "Target layout should be channels_last!" - perm = list(range(1, dims)) + [0] + perm = [*range(1, dims), 0][Based on hints from static analysis]
176-176
: Prefer iterator access over list slice for first map value.Use next(iter(...)) to avoid building a list.
- node = list(match.nodes_map.values())[0] + node = next(iter(match.nodes_map.values()))[Based on hints from static analysis]
Also applies to: 212-212, 273-273, 315-315, 383-383, 506-506
219-227
: Deriving spatialDims for Pad is brittle.Hardcoding 2 risks breakage. Prefer deriving from input rank or pads attr.
- elif node.op == "Pad": - spatialDims = 2 # Hack based on current status + elif node.op == "Pad": + in_shape = node.inputs[0].shape + spatialDims = (len(in_shape) - 2) if isinstance(in_shape, Sequence) and in_shape is not None else 2Please confirm Pad cases in your models are covered by this derivation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(12 hunks)Deeploy/Targets/Neureka/Deployer.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Neureka/Deployer.py (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
NCHWtoNHWCPass
(357-366)PULPNCHWtoNHWCPass
(370-379)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
106-108: Consider [ch, *list(range(batchDims, ch))]
instead of concatenation
Replace with [ch, *list(range(batchDims, ch))]
(RUF005)
111-111: Consider [*list(range(ch + 1, dims)), ch]
instead of concatenation
Replace with [*list(range(ch + 1, dims)), ch]
(RUF005)
168-168: f-string without any placeholders
Remove extraneous f
prefix
(F541)
171-171: Consider [*list(range(1, dims)), 0]
instead of concatenation
Replace with [*list(range(1, dims)), 0]
(RUF005)
175-175: Unused function argument: name
(ARG001)
176-176: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
211-211: Unused function argument: name
(ARG001)
212-212: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
273-273: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
314-314: Unused function argument: name
(ARG001)
315-315: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
331-331: Avoid specifying long messages outside the exception class
(TRY003)
383-383: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
504-504: Unused function argument: name
(ARG001)
506-506: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
🔇 Additional comments (5)
Deeploy/Targets/Neureka/Deployer.py (1)
11-11
: Import rename aligns with new public API.Switch to NCHWtoNHWCPass/PULPNCHWtoNHWCPass looks correct and consistent with refactored passes.
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
17-23
: Helper _singleNodePattern is a good simplification.This reduces boilerplate and pairs well with NonBranchingMatcher(regex_op=True).
356-365
: Pass sequencing/naming update looks good.Sequencing Pad → MaxPool → DwConv → Conv under NCHW→NHWC and PULP variants is clear and consistent.
Also applies to: 369-378
402-418
: RequantizedGemm defaults and B handling are sensible.Defaulting trans/alpha/beta and in-place transpose of constant B is fine. If B can be shared across nodes, consider cloning to avoid unintended side effects.
Do any graphs share the same Constant B across multiple GEMMs? If yes, we should clone B before mutating.
500-501
: Pattern helper usage for RemoveGlobalOutputReshape/ConvBias passes LGTM.Simplifies patterns without changing behavior.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Outdated
Show resolved
Hide resolved
📝 WalkthroughSummary by CodeRabbit
WalkthroughThe diff refactors NCHW↔NHWC layout transformation passes by introducing generic permutation and transpose helpers, ensuring constants (weights/biases) are transformed. Several pass implementations are updated to use single-node patterns and new helpers. A pass class is renamed, and the deployer is updated accordingly. The changelog records these updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Deployer as NeurekaDeployer
participant Pipeline as OptimizationPipeline
participant Pass as NCHWtoNHWCPass
participant Graph as ONNX Graph
participant Helper as LayoutHelpers
User->>Deployer: deploy(model)
Deployer->>Pipeline: build passes
Pipeline->>Pass: run(Graph)
activate Pass
Note over Pass,Helper: For each matched node (Conv/MatMul/...):
Pass->>Helper: compute permutation (dims, spatialDims, targetChannelsFirst)
Helper-->>Pass: perm, invPerm
alt input layout needs change
Pass->>Graph: prepend Transpose on inputs
Pass->>Helper: transform constants (weights/bias/bn/mul/add)
Helper-->>Pass: constants permuted
Pass->>Graph: adjust node attrs (e.g., transA/B)
Pass->>Graph: append Transpose on outputs
else no change
Pass-->>Graph: no-op
end
deactivate Pass
Pipeline-->>Deployer: optimized Graph
Deployer-->>User: deployment artifacts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
102-172
: LGTM! Well-designed layout transformation API.The new transformation helpers (
_transformLayoutPermutation
,_invertPermutation
,_prependTranspose
,_appendTranspose
,_transformLayoutConst
,_transformLayoutDwWeightConst
) provide a clean, reusable abstraction for layout transformations. The API is well-structured and improves code maintainability.Optional: Consider using unpacking syntax for list concatenation as suggested by static analysis (lines 106-108, 111, 171):
- nonBatchPerm = [ - ch, - ] + list(range(batchDims, ch)) + nonBatchPerm = [ch, *range(batchDims, ch)]- nonBatchPerm = list(range(ch + 1, dims)) + [ch] + nonBatchPerm = [*range(ch + 1, dims), ch]- perm = list(range(1, dims)) + [0] + perm = [*range(1, dims), 0]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.md
(3 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(12 hunks)Deeploy/Targets/Neureka/Deployer.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
Deeploy/Targets/Neureka/Deployer.py (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
NCHWtoNHWCPass
(357-366)PULPNCHWtoNHWCPass
(370-379)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
106-108: Consider [ch, *list(range(batchDims, ch))]
instead of concatenation
Replace with [ch, *list(range(batchDims, ch))]
(RUF005)
111-111: Consider [*list(range(ch + 1, dims)), ch]
instead of concatenation
Replace with [*list(range(ch + 1, dims)), ch]
(RUF005)
168-168: f-string without any placeholders
Remove extraneous f
prefix
(F541)
171-171: Consider [*list(range(1, dims)), 0]
instead of concatenation
Replace with [*list(range(1, dims)), 0]
(RUF005)
175-175: Unused function argument: name
(ARG001)
176-176: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
211-211: Unused function argument: name
(ARG001)
212-212: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
273-273: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
314-314: Unused function argument: name
(ARG001)
315-315: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
331-331: Avoid specifying long messages outside the exception class
(TRY003)
383-383: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
504-504: Unused function argument: name
(ARG001)
506-506: Prefer next(iter(match.nodes_map.values()))
over single element slice
Replace with next(iter(match.nodes_map.values()))
(RUF015)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (5000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
🔇 Additional comments (9)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (7)
17-23
: LGTM! Clean helper for single-node patterns.The
_singleNodePattern
helper effectively consolidates repeated graph construction logic throughout the file, improving maintainability.
234-237
: Critical fix: Const tensor layout transformation now applied.This code correctly transforms all constant tensors (weights, bias, mul, add, shift) for Conv and RequantizedConv operations. Previously, these layout transformations were missing, which would cause incorrect behavior when switching between NCHW and NHWC layouts.
292-297
: Critical fix: DW Conv const layout transformation now applied.Correctly transforms all constant tensors for depthwise convolutions, including the specialized DW weight transformation (line 292) and additional constants like bias/mul/add/shift (lines 294-297). This fixes the missing layout transformations mentioned in the PR description.
338-340
: Critical fix: PULP DW Conv const layout transformation now applied.Correctly transforms all constant tensors for PULP-specific depthwise convolutions. This completes the const transformation fix across all Conv variants.
175-197
: LGTM! Refactored to use new transpose helpers.The function correctly uses
.get()
with defaults for GEMM/MatMul attributes and applies the new transpose helpers. The unusedname
parameter flagged by static analysis is part of the required callback signature forReplaceSequentialPatternPass
, so this is expected.
203-379
: LGTM! Pass classes correctly updated.All pass classes have been properly refactored to use:
_singleNodePattern
for cleaner pattern creation- New transformation helpers for layout conversions
NonBranchingMatcher(regex_op=True)
where needed to match multiple operators (e.g., "Conv|RequantizedConv")The class naming is now more consistent (e.g.,
NCHWtoNHWCDwConvPass
,PULPNCHWtoNHWCDwConvPass
).
382-525
: LGTM! Remaining functions consistently refactored.The remaining transformation functions (
_requantized_gemm_to_pw_fun
,_remove_global_output_reshape_fun
,_remove_empty_conv_bias_fun
) have been consistently updated to:
- Use
.get()
with defaults for attribute access- Use new transpose helpers where applicable
- Use
_singleNodePattern
in their corresponding pass classesDeeploy/Targets/Neureka/Deployer.py (1)
11-11
: LGTM! Import and usage correctly updated.The changes correctly reflect the class rename from the refactored
LoweringOptimizationPasses
module. The logic remains unchanged—when 3x3 mode is enabled, thePULPNCHWtoNHWCPass
is replaced with the genericNCHWtoNHWCPass
.Also applies to: 35-36
CHANGELOG.md (1)
7-7
: LGTM! Changelog accurately documents the changes.The changelog entries correctly capture:
- The PR reference and summary (line 7)
- The refactoring of NCHWtoNHWC passes (line 76)
- The critical bug fix for missing const layout transformations (line 86)
Also applies to: 76-76, 86-86
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
276-277
: [Duplicate] Use safe attribute access for 'group'.This issue was previously flagged: accessing
node.attrs["group"]
directly can raiseKeyError
when the attribute is missing. ONNX Conv defaultsgroup=1
, so use.get("group", 1)
for safe access.Apply this diff:
- if node.attrs["group"] == 1: + if node.attrs.get("group", 1) == 1: return graphAlso applies to line 317.
🧹 Nitpick comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
226-226
: Consider extracting detailed error messages.Static analysis flags the long message in
ValueError
. For better maintainability, consider defining a custom exception class or moving the message to a constant.Also applies to line 331.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)deleteNode
(67-86)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
175-175: Unused function argument: name
(ARG001)
211-211: Unused function argument: name
(ARG001)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
314-314: Unused function argument: name
(ARG001)
331-331: Avoid specifying long messages outside the exception class
(TRY003)
484-484: Unused function argument: name
(ARG001)
502-502: Unused function argument: name
(ARG001)
🔇 Additional comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)
211-241
: Well-structured refactoring of layout transformation logic.The refactored code effectively centralizes the layout transformation using the new helper functions. The spatialDims determination is operator-specific, and the transformation correctly handles both data tensors (via transpose nodes) and constant tensors (via in-place modification).
382-473
: Clean transformation from RequantizedGemm to RequantizedConv.The refactored code correctly handles the transformation, including:
- Proper reordering of mul/add parameters (lines 457-460)
- Dimension expansion/squeeze for convolution compatibility
- Transpose handling with the new helper functions
- Attribute defaults for GEMM operations (lines 402-406)
203-209
: Consistent migration to single-node pattern.The updated pass constructors consistently use
_singleNodePattern
and appropriate matchers, improving code clarity and maintainability.Also applies to: 245-269, 305-311, 348-353, 357-379, 477-499, 515-520
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
159-164
: Avoid mutating shared gs.Constant values.
_transformLayoutConst
reassignsconst.values
to a transposed view without copying. If ags.Constant
(e.g. an ONNX initializer) is shared by multiple nodes, this will cause unintended side effects.Based on past review comments
272-301
: Potential KeyError on missing 'group' attr.Line 276: ONNX Conv defaults
group=1
; accessing via["group"]
can raise KeyError when the attribute is missing.Apply this diff:
- if node.attrs["group"] == 1: + if node.attrs.get("group", 1) == 1: return graphBased on past review comments
314-338
: Potential KeyError on missing 'group' attr.Line 318: ONNX Conv defaults
group=1
; accessing via["group"]
can raise KeyError when the attribute is missing.Apply this diff:
- if node.attrs["group"] == 1: + if node.attrs.get("group", 1) == 1: return graphBased on past review comments
167-172
: Function only supports one direction but may receive opposite.
_transformLayoutDwWeightConst
assertsnot targetChannelsFirst
(line 168), restricting it to converting TO channels_last. However, at line 292, this function is called withdefault_channels_first
, which can beTrue
(the constructor default at line 307), causing an assertion failure at runtime.Apply this diff to support both directions or validate at the call site:
-def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None: - assert not targetChannelsFirst, "Target layout should be channels_last!" +def _transformLayoutDwWeightConst(const: gs.Constant, spatialDims: int, targetChannelsFirst: bool) -> None: assert isinstance(const, gs.Constant) dims = len(const.shape) - perm = [*range(1, dims), 0] + if targetChannelsFirst: + # channels_last -> channels_first: move last dim to first + perm = [dims - 1, *range(dims - 1)] + else: + # channels_first -> channels_last: move first dim to last + perm = [*range(1, dims), 0] const.values = const.values.transpose(perm)Then update the call site at line 292:
- _transformLayoutDwWeightConst(node.inputs[1], default_channels_first) # weights + _transformLayoutDwWeightConst(node.inputs[1], spatialDims, default_channels_first) # weights
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)deleteNode
(67-86)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
175-175: Unused function argument: name
(ARG001)
211-211: Unused function argument: name
(ARG001)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
314-314: Unused function argument: name
(ARG001)
478-478: Unused function argument: name
(ARG001)
496-496: Unused function argument: name
(ARG001)
🔇 Additional comments (5)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)
17-23
: LGTM! Nice refactoring.The
_singleNodePattern
helper centralizes single-node pattern construction and improves code reusability across multiple pass definitions.
102-117
: LGTM! Layout helpers are well-designed.The permutation helper functions correctly implement bidirectional NCHW ↔ NHWC transformations and permutation inversion.
133-156
: LGTM! Transpose insertion helpers are correct.Both
_prependTranspose
and_appendTranspose
correctly handle tensor rewiring and shape calculations with proper permutation directions.
211-241
: LGTM! Core fix implemented correctly.The function now correctly transforms constant tensors (bias, mul, add, shift) for Conv and RequantizedConv operations, which was the primary objective of this PR. The refactored implementation using new helper functions improves maintainability.
474-514
: LGTM! Consistent refactoring.The pass classes are consistently updated to use the new
_singleNodePattern
helper, improving code maintainability and reducing duplication.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
…ation to conform to <type>Permutation
…c PULP transformation function
01a22b9
to
c3d504c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
276-277
: Potential KeyError on missing 'group' attr. Default to 1.ONNX Conv defaults
group=1
; accessing via["group"]
can raiseKeyError
.Apply this diff:
- if node.attrs["group"] == 1: + if node.attrs.get("group", 1) == 1: return graphAlso applies to: 317-318
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
CHANGELOG.md
(3 hunks)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)Deeploy/Targets/Neureka/Deployer.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/Targets/Neureka/Deployer.py
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)deleteNode
(67-86)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
175-175: Unused function argument: name
(ARG001)
211-211: Unused function argument: name
(ARG001)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
314-314: Unused function argument: name
(ARG001)
478-478: Unused function argument: name
(ARG001)
496-496: Unused function argument: name
(ARG001)
🔇 Additional comments (5)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)
17-22
: LGTM: Cleaner pattern creation.The
_singleNodePattern
helper nicely centralizes single-node pattern creation, reducing boilerplate and improving maintainability across all the pass constructors.
175-197
: LGTM: Proper defaults and node-centric design.The function correctly derives the node from the match, sets default attributes safely with
.get()
, and uses the new transpose helpers appropriately.
234-237
: LGTM: Fixes missing const layout transformations.This correctly applies layout transformations to constant inputs (weights, bias, mul, add, shift) for Conv and RequantizedConv, addressing the core issue described in the PR objectives.
351-373
: LGTM: Clean pass structure and naming.The refactored passes use consistent naming, simplified pattern construction via
_singleNodePattern
, and properly compose the individual transformation passes into the mainNCHWtoNHWCPass
andPULPNCHWtoNHWCPass
.
396-411
: Approve with mutation caveat.The function correctly sets default attributes and uses the new transpose helpers. However, note the in-place mutation issue at line 411 that was flagged separately.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)
159-164
: Critical: Avoid mutating shared constants.Line 164's
.transpose(perm)
returns a view, not a copy. Ifconst
is shared (e.g., an ONNX initializer used by multiple nodes), this mutates all references.Apply this diff:
- const.values = const.values.transpose(perm) + const.values = const.values.transpose(perm).copy()Based on learnings (past review comment on lines 159-164).
167-172
: Major: Assertion will fail for channels_first callers.Line 168 asserts
targetChannelsFirst
must beFalse
, but line 293 calls this withdefault_channels_first
, which defaults toTrue
(per line 308's constructor). This causes a runtime assertion failure.Additionally, line 172 has the same shared-constant mutation issue as flagged above. Apply both fixes:
-def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None: - assert not targetChannelsFirst, "Target layout should be channels_last!" +def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None: assert isinstance(const, gs.Constant) dims = len(const.shape) perm = [*range(1, dims), 0] - const.values = const.values.transpose(perm) + if not targetChannelsFirst: + # Transpose from channels_first to channels_last + const.values = const.values.transpose(perm).copy() + else: + # Transpose from channels_last to channels_first (inverse perm) + inv_perm = _invertPermutation(perm) + const.values = const.values.transpose(inv_perm).copy()Or, if the function should only handle one direction, rename it and validate at call sites.
Based on learnings (past review comments on lines 167-172 and 159-164).
404-412
: Critical: Avoid mutating shared constant at line 412.Line 412 modifies
matrixB.values
in-place with.transpose()
. If this constant is shared, this causes unintended side effects.Apply this diff:
- matrixB.values = matrixB.values.transpose(perm) + matrixB.values = matrixB.values.transpose(perm).copy()Based on learnings (past review comment on lines 404-412).
🧹 Nitpick comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
211-241
: LGTM! Comprehensive layout conversion.The function correctly:
- Determines spatial dims per operator type
- Inserts input/output transposes
- Transforms constant layouts for Conv weights/bias and RequantizedConv constants
Minor: Consider extracting line 226's long error message to a constant or using a shorter inline message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)deleteNode
(67-86)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
175-175: Unused function argument: name
(ARG001)
211-211: Unused function argument: name
(ARG001)
226-226: Avoid specifying long messages outside the exception class
(TRY003)
272-272: Unused function argument: name
(ARG001)
315-315: Unused function argument: name
(ARG001)
479-479: Unused function argument: name
(ARG001)
497-497: Unused function argument: name
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
🔇 Additional comments (7)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (7)
17-22
: LGTM! Clean pattern-builder helper.This helper nicely centralizes single-node pattern creation, eliminating repeated boilerplate.
97-117
: LGTM! Solid permutation utilities.The layout permutation logic correctly handles batch dimensions and spatial dimensions for channels-first ↔ channels-last conversions.
133-156
: LGTM! Transpose insertion correctly rewires graph.The prepend/append transpose helpers properly create intermediate tensors, compute transposed shapes, and update node connections.
175-197
: LGTM! Cleaner transpose insertion using helpers.The refactor correctly uses
_appendTranspose
and attribute defaults, making the logic more maintainable.Note: The unused
name
parameter is required byReplaceSequentialPatternPass
's signature.
272-302
: LGTM! Safe attribute access.Line 277 correctly uses
.get("group", 1)
to handle missing attributes (fixing the past KeyError concern).Note: Line 293 calls
_transformLayoutDwWeightConst
which has the assertion issue flagged separately in lines 167-172.
315-339
: LGTM! PULP-specific depthwise handling.Correctly handles PULP semantics (no input transpose) and uses safe attribute access.
203-515
: LGTM! Well-structured pass refactoring.The pass classes are well-organized:
- Consistent use of
_singleNodePattern
eliminates boilerplateNonBranchingMatcher
withregex_op=True
correctly handles multi-operator patterns- Sequential pass composition in
NCHWtoNHWCPass
andPULPNCHWtoNHWCPass
maintains proper ordering
|
||
if opNode.attrs['group'] == 1: | ||
# Skip non-dw nodes | ||
if node.attrs.get("group", 1) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the complete condition for DW should be: groups == in_channels and out_channels % in_channels == 0
, according to the Torch documentation. There should be situations where group is greater than 1, but we would not deal with DW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed here 77dffdd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
178-179
: Clarify unidirectional constraint or support bidirectional transformation.The assertion at line 179 restricts this function to converting TO channels_last only (
targetChannelsFirst=False
). However, this constraint may not be obvious to callers and could cause runtime failures if called withtargetChannelsFirst=True
.Consider:
- Renaming to
_transformLayoutDwWeightConstToChannelsLast
to make the constraint explicit, or- Supporting bidirectional transformation like
_transformLayoutConst
doesBased on learnings (past review comment).
170-176
: Critical: Avoid mutating shared constant values.Line 175 modifies
const.values
in-place with.transpose()
. If this constant is shared across multiple nodes (e.g., ONNX initializer), this will cause unintended side effects.Apply this diff to create a copy:
- const.values = const.values.transpose(perm) + const.values = const.values.transpose(perm).copy()Based on learnings (past review comment).
178-183
: Critical: Avoid mutating shared constant values.Line 183 performs an in-place transpose on
const.values
. If the constant is shared across nodes, this causes unintended side effects.Apply this diff to create a copy:
- const.values = const.values.transpose(perm) + const.values = const.values.transpose(perm).copy()Based on learnings (past review comment).
411-419
: Critical: Avoid mutating shared constant values.Line 419 modifies
matrixB.values
in-place with.transpose()
. If this constant is shared across multiple nodes, this will cause unintended side effects.Apply this diff to create a copy:
- matrixB.values = matrixB.values.transpose(perm) + matrixB.values = matrixB.values.transpose(perm).copy()Based on learnings (past review comment).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (4)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)contextagnostic
(285-298)deleteNode
(67-86)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
186-186: Unused function argument: name
(ARG001)
222-222: Unused function argument: name
(ARG001)
237-237: Avoid specifying long messages outside the exception class
(TRY003)
283-283: Unused function argument: name
(ARG001)
324-324: Unused function argument: name
(ARG001)
330-330: Undefined name channels_first
(F821)
337-337: Undefined name channels_first
(F821)
486-486: Unused function argument: name
(ARG001)
504-504: Unused function argument: name
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (100)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: generic-models / test-runner-generic
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (2500)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (10000)
- GitHub Check: siracusa-models-tiled-singlebuffer-L3 (Attention, 60000, 10000, 5000, 2500, 8, L3) / test-runner-siracusa-tiled (60000)
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: siracusa-models / test-runner-siracusa
🔇 Additional comments (15)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (15)
17-23
: LGTM!The new
_singleNodePattern
helper effectively reduces duplication across pass definitions and provides a clean way to create single-node patterns for matching.
108-111
: LGTM!The simplified implementation using range unpacking is cleaner while maintaining correct logic.
113-124
: LGTM!The
_transformLayoutPermutation
helper correctly computes the permutation for channels_first ↔ channels_last transformations, properly handling arbitrary batch and spatial dimensions.
127-128
: LGTM!The permutation inversion logic is correct, using
index()
to find the inverse mapping.
144-168
: LGTM!The
_prependTranspose
and_appendTranspose
helpers correctly insert transpose nodes and rewire graph connections, properly computing transposed shapes using the inverse permutation.
186-208
: LGTM!The refactored logic correctly uses the new transpose helpers and properly initializes default attributes for GEMM operations.
222-252
: LGTM! Key fix for missing constant transformations.The refactored logic now correctly transforms layout for all constant inputs (lines 245-248), including:
- Conv: bias constant
- RequantizedConv: mul, add, and shift constants
This addresses the missing transformations mentioned in the PR objectives.
256-271
: LGTM!The refactored passes correctly use the new pattern helper and NonBranchingMatcher with regex support for matching both Conv and RequantizedConv operations.
283-311
: LGTM! Fixes missing DW constant transformations.The refactored depthwise handling now correctly transforms:
- DW weight constants (line 302)
- Bias/mul/add/shift constants (lines 306-307)
This addresses the missing transformations for depthwise convolutions mentioned in the PR objectives.
315-321
: LGTM!The new
NCHWtoNHWCDwConvPass
provides proper separation of concerns for depthwise convolution handling.
350-355
: LGTM structurally, but depends on fix for_PULP_NCHWtoNHWC_dw_fun
.The pass structure is correct, but the underlying function has a critical bug with undefined
channels_first
(see previous comment).
359-381
: LGTM!The sequential passes are properly wired to use the new depthwise-specific passes, with appropriate ordering for pass application.
421-475
: LGTM!The logic for converting GEMM to pointwise convolution is correct, properly handling dimension expansion and node cleanup.
478-522
: LGTM!The remaining passes are consistently refactored to use the new
_singleNodePattern
helper and node-centric match extraction.
25-34
: No change needed for _isDepthwise. ONNX Conv semantics guarantee out_channels divisible by group, so checking group == in_channels suffices.Likely an incorrect or invalid review comment.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
449f607
to
77dffdd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
CHANGELOG.md (3)
7-7
: Wording/grammar nit: use “constants” and pluralize “transformations”.Improves clarity and style.
-- Fix missing const's layout transformation and refactor NCHWtoNHWC passes [#122](https://github.com/pulp-platform/Deeploy/pull/122) +- Fix missing layout transformations for constants and refactor NCHWtoNHWC passes [#122](https://github.com/pulp-platform/Deeploy/pull/122)
77-77
: Tighten phrasing for consistency with other “Changed” bullets.-- Refactor of the NCHWtoNHWC passes +- Refactor NCHWtoNHWC passes
88-88
: Fix grammar and align phrasing with “Fixed” section style.-- Missing layout transformation of the const's (bias, mul, add, shift in Conv/RequantizedConv) +- Fixed missing layout transformations for constants (bias in Conv; mul, add, shift in RequantizedConv)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CHANGELOG.md
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3 (Transformer, 30000, 8, true, L3) / test-runner-siracusa-neureka-tiled (30000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3 (miniMobileNet, 2000, 8, true, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (microLlama/microLlama1, 10000, 8, L3) / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Attention, 2500, 8, L3) / test-runner-siracusa-neureka-tiled (2500)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: deeploy-test-dmas
- GitHub Check: deeploy-tiler-extension
- GitHub Check: deeploy-debug
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels / test-runner-siracusa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
170-176
: Critical: Don’t mutate shared gs.Constant via view; use.copy()
after transpose.Transposing
const.values
andmatrixB.values
returns a view. If these constants are shared, downstream nodes see mutated data. Copy the array post-transpose.Apply:
@@ def _transformLayoutConst(const: gs.Constant, spatialDims: int, targetChannelsFirst: bool) -> None: - const.values = const.values.transpose(perm) + const.values = const.values.transpose(perm).copy() @@ def _transformLayoutDwWeightConst(const: gs.Constant, targetChannelsFirst: bool) -> None: - const.values = const.values.transpose(perm) + const.values = const.values.transpose(perm).copy() @@ def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, name: str): - matrixB.values = matrixB.values.transpose(perm) + matrixB.values = matrixB.values.transpose(perm).copy()If some initializers are truly shared by design, consider cloning the
gs.Constant
node per consumer before modifying values. Based on past review comments.Also applies to: 178-184, 418-421
🧹 Nitpick comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (3)
283-284
: Typo: rename_NCWHtoNHWC_dw_fun
→_NCHWtoNHWC_dw_fun
.Avoid confusion/readability issues.
-def _NCWHtoNHWC_dw_fun(graph: gs.Graph, match: Match, name: str, default_channels_first: bool) -> gs.Graph: +def _NCHWtoNHWC_dw_fun(graph: gs.Graph, match: Match, name: str, default_channels_first: bool) -> gs.Graph: @@ class NCHWtoNHWCDwConvPass(ReplaceSequentialPatternPass): - super().__init__(graph, partial(_NCWHtoNHWC_dw_fun, default_channels_first = default_channels_first), name, + super().__init__(graph, partial(_NCHWtoNHWC_dw_fun, default_channels_first = default_channels_first), name, NonBranchingMatcher(regex_op = True))Also applies to: 320-321
186-186
: Silence linter: unused parametername
.Either drop the param or rename to
_name
to satisfy ARG001.Examples:
-def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, name: str): +def _requantized_gemm_to_pw_fun(graph: gs.Graph, match: Match, _name: str):Apply similarly to other helpers flagged by Ruff. Based on static analysis hints.
Also applies to: 222-222, 283-283, 324-324, 487-487, 505-505
233-237
: Optional: avoid brittle direct attr access and hardcoded Pad spatial dims.
- Use
.get('kernel_shape')
with a clearer error message.- Replace
Pad
spatial dims hack with a TODO or compute from pads shape.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
ReplaceSequentialPatternPass
(265-282)SequentialPass
(201-210)deleteNode
(67-86)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle
(24-35)
🪛 Ruff (0.14.0)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
186-186: Unused function argument: name
(ARG001)
222-222: Unused function argument: name
(ARG001)
237-237: Avoid specifying long messages outside the exception class
(TRY003)
283-283: Unused function argument: name
(ARG001)
324-324: Unused function argument: name
(ARG001)
487-487: Unused function argument: name
(ARG001)
505-505: Unused function argument: name
(ARG001)
🔇 Additional comments (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
447-458
: Guard required RequantizedGemm attributes to avoid KeyError.
node.attrs[...]
for 'div', 'n_levels_out', 'shift', 'signed' assumes presence. Protect or validate.Minimal safe pattern:
- pwAttrs = { + required = ['div', 'n_levels_out', 'shift', 'signed'] + missing = [k for k in required if k not in node.attrs] + assert not missing, f"RequantizedGemm missing attrs: {missing}" + pwAttrs = { 'channels_first': False, 'dilations': [1, 1], 'group': 1, 'kernel_shape': [1, 1], 'pads': [0, 0, 0, 0], 'strides': [1, 1], 'div': node.attrs['div'], 'n_levels_out': node.attrs['n_levels_out'], 'shift': node.attrs['shift'], 'signed': node.attrs['signed'], }If these are guaranteed upstream, confirm and dismiss. Based on static analysis hints.
The NCHWtoNHWC passes were missing the layout transformation of the bias (in the
Conv
case) or mul/add/shift (in theRequantizedConv
case).In the process of adding that I refactored the passes in general.
Changed
Fixed
PR Merge Checklist
devel
commit and pointing todevel
.CHANGELOG.md
file has been updated.