Skip to content

Conversation

diaconuccalin
Copy link
Contributor

@diaconuccalin diaconuccalin commented Sep 24, 2025

This PR brings the changes required for a working minimal TinyViT to work on the Siracusa platform, without tiling.

Added

  • PULP 2D FP DW conv Im2Col template and kernel, with bias support.
  • Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
  • PULP FP DW conv 2D parser.
  • FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
  • FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
  • FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
  • RemoveEmptyConvBiasPass to the PULP optimizer.

Changed

  • Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.

Fixed

  • Fixed bug in alias_of node parameter handling, that takes care of the lifetime of buffers in skip connection situations.
  • Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
  • Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
  • Data types in the PULPOpen FP add and mul templates.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added floating-point depthwise convolution support with bias handling.
    • Extended slice operations for floating-point types with optimized memory-aware variants.
    • Enhanced floating-point reduce-mean operations with multi-type combinations.
    • Added configurable multi-core support for improved parallelization.
  • Bug Fixes

    • Fixed alias tracking for buffer lifetimes in skip connections.
    • Corrected data type handling in floating-point operations.
    • Fixed internal naming to prevent character-related issues.
  • Tests

    • Added new floating-point convolution variants and model tests to CI pipeline.

Walkthrough

Adds FP32 depthwise Conv2D (Im2Col) with optional bias across parsers, bindings, templates, and kernels; introduces DMA-based Slice path and float ReduceMean bindings; fixes alias_of propagation and closure naming; propagates n_cores through contexts/deployers/tests; and adds Siracusa CI FP test variants.

Changes

Cohort / File(s) Summary of changes
CI workflow updates
.github/workflows/ci-platform-siracusa.yml
Added new Siracusa test names for FP conv/DW conv variants, reshape w/ skip-connection, and TinyViT demo.
Closure naming
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
Prefix closure names with _ before the closure suffix; propagate to derived identifiers.
NHWC lowering & DW passes
Deeploy/CommonExtensions/OptimizationPasses/.../LoweringOptimizationPasses.py
Hardened group checks (use get('group',1)), added PULPDWConvPass and PULPFPDWConvPass, expanded DW input-transpose path for non-requantized convs, and wired FP DW pass into NHWC conversion sequence.
Alias tracking & n_cores in contexts
Deeploy/DeeployTypes.py
Store copies of alias_of, propagate alias lists across buffer creation/hoisting, allow TransientBuffer.size as int
Reshape alias symmetry
Deeploy/Targets/Generic/Parsers.py
Reshape parser now back-populates input node aliases with the output node alias to make alias relations symmetric.
PULPOpen bindings updates
Deeploy/Targets/PULPOpen/Bindings.py
Add FloatDataTypes; add PULPSliceBindings and PULPFloatDWConv2DBindings; switch DMA slice to DMASliceTemplate.referenceTemplate; extend ReduceMean bindings to include float/int combos.
PULPOpen parsers
Deeploy/Targets/PULPOpen/Parsers.py
FP Conv2D parser accepts 2–3 inputs and produces bias-aware context; added PULPFPDWConv2DParser for depthwise FP conv with bias detection.
PULPOpen platform wiring & passes
Deeploy/Targets/PULPOpen/Platform.py
Add RemoveEmptyConvBiasPass; wire FP DW conv mapper/bindings; add DMASliceMapper alongside SliceMapper; expose new bindings/parsers and update PULPMapping.
Float Conv templates & ctxt buffer
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
New PULP2DFloatDWConvIm2ColTemplate with dynamic per-core context buffer sizing/hoisting; templates updated to pass ctxtBuffer, pSrcBias, and has_bias; added DW Im2Col reference template.
Float Add / Mul templates (types & casts)
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py, Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
Standardized core/index types to unsigned (uint32_t), added explicit casts in chunk/min calculations, and minor spacing/formatting adjustments.
Float GEMM / MatMul templates (batching)
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py, Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
Add conditional handling for batched A/B with guarded updates and per-batch pointer selection.
Kernel headers: Conv prototypes (bias + DW)
TargetLibraries/PULPOpen/inc/kernel/Conv.h
Updated prototypes to accept const float32_t* pSrcBias and const bool has_bias; added biased DW Im2Col prototype with pContextBuffer.
FP conv implementations
TargetLibraries/PULPOpen/src/Convolution_fp32.c, TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
Added bias-aware signatures and logic, per-core channel partitioning, and new PULP_DW_Conv2d_Im2Col_fp32 implementation (DW Im2Col FP32).
Deployer / n_cores propagation
Deeploy/Targets/PULPOpen/Deployer.py, Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py, Deeploy/TilingExtension/TilerExtension.py
Add n_cores parameter/field to deployers and sign-propagator; propagate n_cores into contexts and tiler wrapper (ctxt.n_cores assignment).
Test runner / CLI / mapping helpers
DeeployTest/testMVP.py, DeeployTest/testRunner_tiled_siracusa.py, DeeployTest/testUtils/testRunner.py, DeeployTest/testUtils/platformMapping.py
Expose -n_cores CLI option; thread n_cores through mapDeployer, TestRunner, and deployer creation; include core count in CMake args for Siracusa tests.
CHANGELOG
CHANGELOG.md
Documented new FP DW conv/template/kernel, bias support, DMA slice path, n_cores propagation, CI tests, and related fixes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant ONNX as ONNX Graph
  participant Lower as NHWC Lowering
  participant Map as PULPOpen Mapping
  participant Parser as PULPFPDWConv2DParser
  participant Bind as PULPFloatDWConv2DBindings
  participant Tmpl as DW Im2Col Template
  participant Kern as PULP_DW_Conv2d_Im2Col_fp32

  ONNX->>Lower: detect Conv (inspect group attr)
  Lower-->>Map: lowered node (transpose weights/inputs if needed)
  Map->>Parser: parseNode / parseNodeCtxt
  Parser-->>Map: context (has_bias, inputs, depthwise)
  Map->>Bind: select DW conv bindings
  Bind-->>Tmpl: emit template call (include pSrcBias, has_bias, ctxtBuffer)
  Tmpl->>Tmpl: compute/hoist per-core ctxtBuffer using ctxt.n_cores
  Tmpl->>Kern: call kernel(..., pSrcBias, has_bias, pContextBuffer)
  Kern-->>Tmpl: perform per-core DW Im2Col convolution and write outputs
  Tmpl-->>Map: generated code/artifacts
Loading
sequenceDiagram
  autonumber
  participant ONNX as ONNX Graph
  participant Map as PULPOpen Mapping
  participant SliceStd as SliceMapper (PULPSliceBindings)
  participant SliceDMA as DMASliceMapper (PULPDMASliceBindings)

  ONNX->>Map: dispatch Slice node
  alt standard slice path
    Map->>SliceStd: select PULPSliceBindings
    SliceStd-->>Map: generate standard slice code
  else DMA-capable slice path
    Map->>SliceDMA: select PULPDMASliceBindings
    SliceDMA-->>Map: generate DMA-based slice code
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • Victor-Jung
  • Xeratec
  • lukamac

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "TinyViT on non-tiled Siracusa" directly reflects the main objective of the changeset, which is to enable minimal TinyViT support on the Siracusa platform without tiling. The title is concise, clear, and specific enough to convey the primary change—a teammate scanning commit history would immediately understand the PR's core purpose without needing to read the full description.
Description Check ✅ Passed The description provides meaningful and comprehensive information about the changeset. It includes a clear summary statement, well-organized sections for Added/Changed/Fixed items, and covers multiple aspects of the PR including new templates, parsers, bindings, CI tests, optimization passes, and bug fixes. The description is directly related to the actual changes across 26 files and is neither vague nor generic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cba0cf and da162b0.

📒 Files selected for processing (1)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
  • contextagnostic (285-298)
  • ReplaceSequentialPatternPass (265-282)
🔇 Additional comments (5)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)

250-250: Good improvement: using .get() with ONNX default.

This change correctly handles the missing 'group' attribute by defaulting to 1, which aligns with the ONNX specification. This is more concise and Pythonic than the previous explicit membership check.


344-344: Consistent improvement matching line 250.

Good consistency applying the same .get('group', 1) pattern here for proper default handling.


370-373: FP DW Conv handling via input transpose.

This else-branch correctly handles the floating-point DW convolution case by transposing the input rather than the weight (as done for RequantizedConv at lines 364-369). The different approach aligns with the FP implementation requirements mentioned in the PR objectives.


380-425: Clean refactoring with good code reuse.

The refactoring of PULPDWConvPass and addition of PULPFPDWConvPass follows the existing pattern established by other passes in this file. Both classes effectively reuse _PULPDWNCHWtoNHWC_fun via different pattern graphs, which is elegant and maintainable. The pattern-based approach ensures:

  • PULPDWConvPass matches RequantizedConv nodes → weight transpose path
  • PULPFPDWConvPass matches Conv nodes → input transpose path

505-505: Proper integration of FP DW Conv pass.

The placement of PULPFPDWConvPass in the pass sequence is correct, running after the integer DW pass and before the dense conv passes.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

344-346: Prevent KeyError when 'group' attr is absent

Depthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.

-    if opNode.attrs['group'] == 1:
-        return graph
+    group = opNode.attrs.get('group', 1)
+    if group == 1:
+        return graph
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

16-23: Loop index type should match buffer indexing

Use unsigned 32-bit for i to match array indexing and avoid UB on large sizes.

(Handled in the diff above.)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct

  • Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
  • Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
  • (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-11: Avoid libm at runtime: use integer log2 for NUM_CORES

Replace double log2() with __builtin_ctz() to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.

Apply this diff:

-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two");
+#endif
+uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);
.github/workflows/ci-platform-siracusa.yml (1)

98-98: TinyViT demo test: validate runtime envelope on Siracusa

Great to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.

Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.

This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.

Deeploy/DeeployTypes.py (3)

320-324: Prefer None over mutable default and let ctor initialize alias_of.

Passing alias_of = [] creates a new list here (safe), but use None consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            alias_of = None,
+        )

469-474: Avoid sharing alias list between buffers; copy it.

Passing buffer.alias_of by reference couples future mutations between the source and new ConstantBuffer.

-        ret = cls(
+        ret = cls(
             name = buffer.name,
             shape = buffer.shape,
             values = values,
-            alias_of = buffer.alias_of,
+            alias_of = list(buffer.alias_of),
         )

1176-1180: Pass alias_of=None and let the ctor initialize.

Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    alias_of = None,
+                )
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)

137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop

im2col_idx % (P * Q) is redundant. Precompute a per-filter base pointer for tighter inner loops.

Apply this diff in both bias and non-bias paths:

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum +=
-                  im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q);
+            for (uint32_t k = 0; k < P * Q; k++) {
+              sum += im2col_buffer[k] * local_w[k];
+            }

Also applies to: 236-240


47-48: Remove unused variable kernel_size

kernel_size is computed but never used.

-  uint32_t kernel_size = P * Q * F_total;

61-97: Consider simplifying padding handling with a single p/q nested loop

You can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

20-27: Document pContextBuffer size expectations and bias semantics

The new signatures look correct. Please add brief comments indicating required pContextBuffer sizes:

  • Conv2d_Im2Col: NUM_CORES * (CPQ) floats
  • DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
    Also note that has_bias governs whether pSrcBias is read.

Also applies to: 29-36

Deeploy/Targets/PULPOpen/Bindings.py (2)

239-245: Constrain FP DWConv binding to supported float type(s).

FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.

Apply this diff:

-        ForkTransformer) for float_type in FloatDataTypes
+        ForkTransformer) for float_type in [float32_t]

299-302: ReduceMean (float): restrict axes integer type to int32.

ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.

Apply this diff:

-    for integer_type in SignedIntegerDataTypes
-    for float_type in FloatDataTypes
+    for integer_type in [int32_t]
+    for float_type in FloatDataTypes
Deeploy/Targets/PULPOpen/Platform.py (1)

108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.

If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.

Apply this diff to prioritize DWConv first:

-    'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]),
+    'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),

Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

42-51: Fix return typing and silence ARG004 (unused ctxt)

  • Return type now returns a str for size; widen the annotation to reflect this.
  • Rename ctxt to _ctxt to address Ruff ARG004.

Apply:

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(_ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:

93-93: Typo in comment: “ChannelOout” → “ChannelOut”

Minor doc polish.

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

124-126: Typo in comment: “ChannelOout” → “ChannelOut”

Same typo in DW template header.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa3c31 and de2752b.

📒 Files selected for processing (17)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2 hunks)
  • Deeploy/DeeployTypes.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • TargetLibraries/PULPOpen/inc/DeeployPULPMath.h (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • add_aliases (326-345)
  • lookup (763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (82-224)
  • computeTransientBuffersSize (137-157)
  • NetworkContext (554-1063)
  • hoistTransientBuffers (160-180)
  • hoistTransientBuffer (864-883)
  • lookup (763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (667-679)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (122-188)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1695-1838)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2492-2509)
  • lookup (763-795)
  • parseNode (1076-1091)
  • parseNodeCtxt (1094-1119)
  • NetworkContext (554-1063)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1182-1248)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

43-43: Unused static method argument: ctxt

(ARG004)

Deeploy/DeeployTypes.py

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: unsigned arithmetic for start/end bounds

Casting ${size} to uint32_t aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistently uint32_t.

.github/workflows/ci-platform-siracusa.yml (2)

75-75: Reshape + skip connection test: ensure alias_of fix is covered

Good addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.


61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo

.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)

27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.

Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).


39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.

If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:

  • enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
  • gate the out advance with an explicit data_out_batched flag.

Example if such a flag is available:

-    ref_${data_out}_${data_out} += ${M} * ${O};
+    % if data_out_batched:
+    ref_${data_out}_${data_out} += ${M} * ${O};
+    % endif
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

370-374: Verify FP DW Conv layout: input-only transpose may be insufficient

For FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.


385-403: New PULPDWConvPass wiring looks good

Pattern graph + partial replacement is clear and consistent with the pipeline.


404-426: New PULPFPDWConvPass wiring looks good

Extends NHWC conversion to FP DW conv as intended.


505-505: Pass ordering LGTM

Adding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

41-43: Confirm context buffer sizing per core

This implementation expects pContextBuffer sized to NUM_CORES * (P*Q) floats. Please confirm templates/allocations provide that.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-16: LGTM on signature extension and core partitioning

Bias handling and per-core channel slicing are consistent. Weight and output indexing are correct.

Please ensure all call sites updated to pass pSrcBias, has_bias, and pContextBuffer (for the Im2Col variant).

Also applies to: 18-39


139-176: Good reuse of im2col buffer and contiguous weight blocks

The inner product uses a contiguous kernel_size slice, matching the buffer layout. Looks correct.

Also applies to: 179-218

Deeploy/Targets/PULPOpen/Parsers.py (3)

104-116: Bias handling looks correct; maps optional third input and flags has_bias

Parsing and context mapping are consistent with the new kernels.

Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.


127-159: DW FP2D parser shape/padding checks look reasonable

Depthwise gating via inputs count and padding symmetry is fine.


185-186: Depthwise condition check aligns with ONNX semantics

group == ch_im_in is the expected DW condition.

Deeploy/Targets/PULPOpen/Bindings.py (2)

143-153: DMA Slice binding switch looks good.

Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.


155-165: Float Slice mapping: validate starts/ends/axes/steps typing.

Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-79: FP DWConv mapper wiring: LGTM.

Parser and bindings are correctly introduced and exposed.


92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.

Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.


8-9: RemoveEmptyConvBiasPass import: LGTM.

Consistent with optimizer usage below.


235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.

Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

60-62: Good: transient buffer typed to input element

Ensures the generated C sees the correct pointee type instead of void. This helps alloc/codegen correctness.


81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWC

TargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.

@Xeratec Xeratec added this to the Release 0.2.1 milestone Sep 29, 2025
@Xeratec Xeratec added the Bug Something isn't working label Sep 29, 2025
@Xeratec Xeratec moved this to Need Reviewer in Deeploy Sep 29, 2025
@Xeratec Xeratec moved this from Need Reviewer to In review in Deeploy Sep 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)

241-241: Fix remaining mutable default arguments.

VariableBuffer.__init__ still has mutable defaults (shape = [1], alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples or None) to follow Python best practices and avoid shared mutable state across calls.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):

The defensive copy on line 260 already handles the None case correctly, so no further changes are needed there.

Based on past review feedback and static analysis (Ruff B006).

🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)

454-454: Optional: Avoid redundant defensive copy.

Since VariableBuffer.__init__ already defensively copies alias_of (line 260), passing list(alias_of) here results in double copying. You can simplify by passing the list directly:

-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+        super().__init__(name, shape, alias_of)

This is a minor optimization and doesn't affect correctness.


490-495: Optional: Make alias propagation more explicit.

While correct (the constructor defensively copies alias_of), explicitly copying here would make the intent clearer:

     ret = cls(
         name = buffer.name,
         shape = buffer.shape,
         values = values,
-        alias_of = buffer.alias_of,
+        alias_of = list(buffer.alias_of),  # Explicit copy for clarity
     )

This is a minor clarity improvement and doesn't affect correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7e4c8 and 4165983.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

325-329: LGTM!

Correctly initializes alias_of with an explicit empty list for each new instance, avoiding shared mutable state.


362-362: LGTM!

Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.


404-406: LGTM!

The updates to support size as Union[int, str] are consistent across TransientBuffer.__init__, _bufferRepresentation, and hoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.

Also applies to: 431-431, 888-888


1200-1204: LGTM!

Correctly initializes non-global outputs with an explicit empty alias_of list, maintaining consistency with the alias tracking implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)

241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)

Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.

Apply:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
         self.name: str = name  #: str: Canonical name that this buffer is registered as in the NetworkContext
-        self.shape: Sequence[
-            int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
+        self.shape: Sequence[int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
@@
-        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []
+        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

As per Ruff B006.


452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)

Use immutable defaults and forward alias_of safely.

Apply:

-    def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None):
-        # Pass a copy of alias_of to avoid shared references
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+    def __init__(self,
+                 name: str = '',
+                 shape: Sequence[int] = (1,),
+                 values: Union[Sequence, np.ndarray] = (0,),
+                 alias_of: Optional[List[str]] = None):
+        super().__init__(name, shape, alias_of)

(values processing via np.asarray below remains correct.)

🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-24: Im2Col buffer bytes should derive from input element width (not weight).

The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.

Apply:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
-            'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \
+                     * ctxt.n_cores \
+                     * operatorRepresentation['ch_im_in'] \
+                     * operatorRepresentation['dim_kernel_x'] \
+                     * operatorRepresentation['dim_kernel_y']

43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.

  • Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
  • Comment says “value is only used as string”, but an int is produced. Update comment and/or value.

Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
@@
-        # Memory allocation for the im2col buffer can be dynamic, based on the number of cores.
-        # WARNING: This works because value is only used as string, in the allocate template.
-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS.
+        # Byte size is derived from the buffer's referenced type at allocation/codegen time.
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']

Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.


124-154: DW Im2Col call wiring is consistent; minor docstring typo.

The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
Deeploy/DeeployTypes.py (1)

2708-2714: Preserve context name when recreating NetworkContext during parse

Recreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.

Apply:

-        self.ctxt = NetworkContext(
+        self.ctxt = NetworkContext(
             variableBuffer = self.Platform.VariableBuffer,
             constantBuffer = self.Platform.ConstantBuffer,
             structBuffer = self.Platform.StructBuffer,
             transientBuffer = self.Platform.TransientBuffer,
-            n_cores = self.ctxt.n_cores,
+            n_cores = self.ctxt.n_cores,
+            name = self.name,
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4165983 and af78d75.

📒 Files selected for processing (9)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3256-3633)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2451-2494)
  • TopologyOptimizer (2249-2278)
  • NetworkDeployer (3256-3633)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (575-1090)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (891-910)
  • lookup (790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py

[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?


[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max

🪛 Ruff (0.13.3)
DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

81-84: Bias/has_bias plumbing for direct conv looks good.

Bias flags are correctly passed to the FP Conv2d HWC call.


100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.

The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.

If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.

Deeploy/TilingExtension/TilerExtension.py (1)

987-989: LGTM: propagate core count to context

Assigning ctxt.n_cores ensures downstream kernels see the configured core count.

DeeployTest/testRunner_tiled_siracusa.py (1)

20-26: LGTM: cores threaded into TestRunner and CMake

Passing args.cores to TestRunner and exporting NUM_CORES is correct.

Also applies to: 28-28

DeeployTest/testUtils/platformMapping.py (1)

92-104: LGTM: add n_cores and forward to PULPDeployer

Signature and forwarding align with deployer changes.

Also applies to: 214-224

DeeployTest/testMVP.py (1)

57-60: LGTM: introduce and propagate n_cores

CLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.

Also applies to: 84-93, 201-206, 261-263

Comment on lines +25 to +26
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid mutable default for inputOffsets; initialize from None

Using {} as a default shares state across instances. Accept None and init per inputs. Also simplify the empty-check.

Apply:

-                 inputOffsets: Dict[str, int] = {},
-                 n_cores: int = 8):
+                 inputOffsets: Optional[Dict[str, int]] = None,
+                 n_cores: int = 8):
@@
-        if inputOffsets == {}:
-            for key in inputTypes.keys():
-                inputOffsets[key] = 0
+        if inputOffsets is None:
+            inputOffsets = {key: 0 for key in inputTypes.keys()}

As per Ruff B006.

Also applies to: 30-35

🧰 Tools
🪛 Ruff (0.13.3)

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py around lines
25-26 and 30-35, the function/class signature uses a mutable default
inputOffsets: Dict[str, int] = {} which shares state across calls; change the
parameter default to inputOffsets: Optional[Dict[str,int]] = None and inside the
body set inputOffsets = {} if inputOffsets is None, then use that local dict;
also simplify any subsequent checks for emptiness to use simple truthiness (if
not inputOffsets) or explicit length checks as appropriate.

Comment on lines 586 to 597
name: str = 'DeeployNetwork',
n_cores: int = 8):
self.globalObjects = OrderedDict()
self.localObjects = OrderedDict()
self.VariableBuffer = variableBuffer
self.ConstantBuffer = constantBuffer
self.StructBuffer = structBuffer
self.TransientBuffer = transientBuffer
self.n_cores = n_cores
self.name = name
self.n_cores = n_cores

Copy link
Contributor

@coderabbitai coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate assignment of n_cores in NetworkContext.init

self.n_cores is set twice. Remove one.

Apply:

-        self.n_cores = n_cores
         self.name = name
         self.n_cores = n_cores

Optional: also replace mutable defaults for globalObjects/localObjects with None and init inside to avoid shared dicts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: str = 'DeeployNetwork',
n_cores: int = 8):
self.globalObjects = OrderedDict()
self.localObjects = OrderedDict()
self.VariableBuffer = variableBuffer
self.ConstantBuffer = constantBuffer
self.StructBuffer = structBuffer
self.TransientBuffer = transientBuffer
self.n_cores = n_cores
self.name = name
self.n_cores = n_cores
def __init__(self,
# … other parameters …
name: str = 'DeeployNetwork',
n_cores: int = 8):
self.globalObjects = OrderedDict()
self.localObjects = OrderedDict()
self.VariableBuffer = variableBuffer
self.ConstantBuffer = constantBuffer
self.StructBuffer = structBuffer
self.TransientBuffer = transientBuffer
- self.n_cores = n_cores
self.name = name
self.n_cores = n_cores
🤖 Prompt for AI Agents
In Deeploy/DeeployTypes.py around lines 586 to 597, NetworkContext.__init__
assigns self.n_cores twice; remove the redundant assignment so n_cores is only
set once. Also (optional) avoid mutable default shared dicts by changing
parameters defaulting to OrderedDict() to None and initializing
self.globalObjects = OrderedDict() and self.localObjects = OrderedDict() inside
__init__ when the corresponding arg is None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukamac, got it! The duplicate self.n_cores = n_cores assignment has been noted. 👍

Comment on lines +40 to +53
inputOffsets = {},
n_cores: int = 8):
super().__init__(
graph = graph,
deploymentPlatform = deploymentPlatform,
inputTypes = inputTypes,
loweringOptimizer = loweringOptimizer,
scheduler = scheduler,
name = name,
default_channels_first = default_channels_first,
deeployStateDir = deeployStateDir,
inputOffsets = inputOffsets,
n_cores = n_cores,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid mutable default for inputOffsets; forward None to base

Use Optional and let the base handle initialization.

Apply:

-                 inputOffsets = {},
-                 n_cores: int = 8):
-        super().__init__(
+                 inputOffsets: Optional[Dict[str, int]] = None,
+                 n_cores: int = 8):
+        super().__init__(
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets,
             n_cores = n_cores,
         )

(Requires the SignPropDeployer change to accept Optional.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inputOffsets = {},
n_cores: int = 8):
super().__init__(
graph = graph,
deploymentPlatform = deploymentPlatform,
inputTypes = inputTypes,
loweringOptimizer = loweringOptimizer,
scheduler = scheduler,
name = name,
default_channels_first = default_channels_first,
deeployStateDir = deeployStateDir,
inputOffsets = inputOffsets,
n_cores = n_cores,
)
inputOffsets: Optional[Dict[str, int]] = None,
n_cores: int = 8):
super().__init__(
graph = graph,
deploymentPlatform = deploymentPlatform,
inputTypes = inputTypes,
loweringOptimizer = loweringOptimizer,
scheduler = scheduler,
name = name,
default_channels_first = default_channels_first,
deeployStateDir = deeployStateDir,
inputOffsets = inputOffsets,
n_cores = n_cores,
)
🧰 Tools
🪛 Ruff (0.13.3)

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In Deeploy/Targets/PULPOpen/Deployer.py around lines 40 to 53, the constructor
uses a mutable default argument inputOffsets = {} which should be avoided;
change the parameter type to Optional[dict] and default it to None, update any
necessary typing imports, and forward that None to the base class (letting the
base initializer create the default dict) so you no longer use a shared mutable
default; ensure the SignPropDeployer/base constructor is updated to accept
Optional as noted.

Comment on lines +48 to +51
im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
im2col_name = operatorRepresentation['nodeName'] + "_buffer"
return [(im2col_name, im2col_dim)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Avoid potential double-accounting of bytes for DW Im2Col buffer.

Here you compute bytes (via typeWidth//8) and also set the buffer’s referencedType. If the allocator multiplies element count by sizeof(type) when a concrete type is set (as used elsewhere), this could oversize the buffer by a factor of element size.

  • Option A (recommended): Keep im2col_dim as element count (remove × bytes) and retain the referencedType assignment.
  • Option B: Keep im2col_dim in bytes and do not rely on per-type multiplication in the allocation path for this buffer.

Please verify allocator/codegen behavior. Run:

If multiplication by sizeof(type) is present for typed transient buffers, switch to element-count sizing here to avoid double-accounting. Based on learnings.

Also applies to: 59-65


🏁 Script executed:

#!/bin/bash
# Inspect how ctxtBufferSize and TransientBuffer.size are consumed.
rg -n -C3 -P 'ctxtBuffer(Size)?\b|TransientBuffer\(|hoistTransientBuffer\(' Deeploy
rg -n -C3 -P '(alloc|allocate|malloc|rt_alloc|pi_l2_malloc|sizeof)\\s*\\('
# Look for size computations mixing buffer.type and size
rg -n -C4 -P '(?:ctx|ctxt|buffer).*size|size.*type|referencedType.*size' Deeploy

Length of output: 119828


Remove the typeWidth//8 factor from im2col_dim
Drop the //8 multiplication in the im2col_dim calculation (FloatConvTemplate.py lines 48–51 and 59–65). PULP’s allocation templates already do sizeof(type) * size, so passing element counts avoids double‐sizing.

🤖 Prompt for AI Agents
In Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py around lines 48–51
and 59–65, the im2col_dim computation multiplies by
operatorRepresentation["weight_type"].typeWidth // 8 which double-sizes the
allocation; remove the typeWidth // 8 factor so im2col_dim represents element
count only (leave the remaining factors: ctxt.n_cores * dim_kernel_x *
dim_kernel_y and the name construction unchanged), updating both occurrences
accordingly.

Comment on lines 279 to +290
class TestRunner():

def __init__(self,
platform: str,
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
tiling: bool,
argument_parser: TestRunnerArgumentParser,
gen_args: str = "",
cmake_args: str = ""):
def __init__(
self,
platform: str,
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
tiling: bool,
argument_parser: TestRunnerArgumentParser,
gen_args: str = "",
cmake_args: str = "",
cores: int = 8,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix AttributeError and command construction in TestRunner.generate_test

  • Typo: use self.n_cores (not self._cores).
  • Remove redundant command reassignment.

Apply:

-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
@@
-        if self._tiling is True:
-            command += f" -n_cores {self._cores}"
+        if self._tiling is True:
+            command += f" -n_cores {self.n_cores}"

This resolves the CI failure.

Run to confirm no remaining _cores typos:

Also applies to: 349-357


🏁 Script executed:

#!/bin/bash
rg -nP '_cores\b' -C2

Length of output: 21522


Fix duplicate command and typo in TestRunner.generate_test

In testUtils/testRunner.py (generate_test):

  • Remove the redundant initial command = … {self.gen_args} assignment.
  • Replace self._cores with self.n_cores when appending -n_cores.
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
@@
-        if self._tiling is True:
-            command += f" -n_cores {self._cores}"
+        if self._tiling is True:
+            command += f" -n_cores {self.n_cores}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class TestRunner():
def __init__(self,
platform: str,
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
tiling: bool,
argument_parser: TestRunnerArgumentParser,
gen_args: str = "",
cmake_args: str = ""):
def __init__(
self,
platform: str,
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
tiling: bool,
argument_parser: TestRunnerArgumentParser,
gen_args: str = "",
cmake_args: str = "",
cores: int = 8,
):
command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
if self._tiling is True:
command += f" -n_cores {self.n_cores}"
🤖 Prompt for AI Agents
In DeeployTest/testUtils/testRunner.py around lines 279 to 290, the
generate_test builds the same command twice and uses the wrong attribute name
for cores; remove the redundant initial "command = … {self.gen_args}" assignment
so the command is constructed only once, and replace any use of self._cores with
self.n_cores when appending the "-n_cores" flag so the correct attribute is
used.

Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)

Comment on lines +370 to +373
else:
inputTransposeNode, inputTransposeOutput = _appendTransposeNode(inputNode, name + "_TransposeIn", inPermute)
opNode.inputs[0] = inputTransposeOutput
graph.nodes.append(inputTransposeNode)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why we are transposing the input on non-RequantizedConvs? which cases are that?

Copy link
Collaborator

@lukamac lukamac Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood it from our private conversation, it's the floating-point implementation of the dw conv. As suggested somewhere else, I would separate it into a dedicated function NCHWtoNHWC_dw_fun for clarity, or just wait for my PR to land.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can wait for your PR.

deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.


# Non DW-Type:
if opNode.attrs['group'] == 1:
if 'group' in opNode.attrs and opNode.attrs['group'] == 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should better be covered by a get with a default because ONNX Conv spec defines the default for group to be 1:

if opNode.attrs.get('group', 1):

P.S. I have a PR refactoring this file so you might have some merge conflicts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will replace it with a get, but even if the default is group=1, this also handles the RequantizedConv, which is "non-ONNX-canonical". If I remember correctly, it was the RequantConv that was missing this attribute and raising an error in those cases.

Copy link
Contributor Author

@diaconuccalin diaconuccalin Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, looking again at it, I can't implement a getter function, since the opNode is a Node object, implemented in the ONNX Graph Surgeon library. I may implement a private getter somewhere in this file, but to me it seems superfluous, since it would do pretty much the same operation as the current if, and it would only be used once. So let me know please if and how you'd like this to be modified.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to implement a get function, it's already implemented for the Python dictionaries (which node.attrs is).
And yes, RequantizedConv is not a standard ONNX node, but it is kinda inheriting a lot of the stuff from the Conv node sooooo, imo, I would say it's ok to assume that default.


# Initialize Pass
super().__init__(pattern = graph,
replacement_fn = partial(_PULPDWNCHWtoNHWC_fun,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend writing another NCHWtoNHWC_dw function for the PULP conv kernels, and maybe even check that it's an fp kernel, just to make it even clearer that it differs from the integer one. The transposition of the input is quite a big difference that imo deserves a separate function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in a comment above, if it's ok with you, I'll wait for your PR and make the changes afterwards.

Comment on lines 586 to 597
name: str = 'DeeployNetwork',
n_cores: int = 8):
self.globalObjects = OrderedDict()
self.localObjects = OrderedDict()
self.VariableBuffer = variableBuffer
self.ConstantBuffer = constantBuffer
self.StructBuffer = structBuffer
self.TransientBuffer = transientBuffer
self.n_cores = n_cores
self.name = name
self.n_cores = n_cores

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

return ctxt, False


class PULPFPDWConv2DParser(Conv2DParser):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this parser does basically the same thing as the above FP Conv parser, you can inherit it, and just check for the additional group condition. You can immediately check in the parseNode if the group attribute is present and equal to the node.inputs[1].shape[<ch idx>]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are mainly the same, but with a key direct incompatibility, because of the self.operatorRepresentation['group'] == 1, condition in regular Conv2d.

That condition does not really make sense in the context of the ONNX Conv definition, but grouping is not handled in the FP kernel of PULPOpen, so for now I think it's a good check to keep it in place (which would mean that we need to keep the 2 parsers, regular and DW, separate for now).

Let me know if you'd like me to proceed differently.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)

31-53: Avoid the shared mutable default for inputOffsets

Keeping inputOffsets = {} means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safe None default and materialize a new dict per instance before calling super().

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
+        if inputOffsets is None:
+            inputOffsets = {}
+
         super().__init__(
             graph = graph,
             deploymentPlatform = deploymentPlatform,
             inputTypes = inputTypes,
             loweringOptimizer = loweringOptimizer,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4978 and 48b2ff5.

📒 Files selected for processing (5)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • add_aliases (331-350)
  • get_aliases_of (352-364)
  • lookup (788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2579-2596)
  • lookup (788-820)
  • parseNode (1101-1116)
  • parseNodeCtxt (1119-1144)
  • NetworkContext (574-1088)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1204-1270)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Comment on lines 1044 to 1055
new_output_node_aliases = input_node.get_aliases_of()
new_output_node_aliases.append(input_node.name)

# Add new aliases
# Add new aliases to output node
output_node.add_aliases(aliases_to_add = new_output_node_aliases)

# Add output node as alias to its aliases (alias relationship is symmetric)
for alias in input_node.get_aliases_of():
alias_node = ctxt.lookup(alias)
alias_node.add_aliases(aliases_to_add = [
output_node.name,
])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Alias symmetry currently skips the source tensor

The second loop walks input_node.get_aliases_of(), which omits the source buffer itself. As a result the input buffer never learns about output_node, so aliasing remains asymmetric even though the new comment says otherwise. Include the input buffer when propagating aliases so both directions stay in sync.

A minimal fix is to iterate new_output_node_aliases (which already contains the input name) when updating the other buffers:

-        for alias in input_node.get_aliases_of():
+        for alias in new_output_node_aliases:
             alias_node = ctxt.lookup(alias)
             alias_node.add_aliases(aliases_to_add = [
                 output_node.name,
             ])
🤖 Prompt for AI Agents
In Deeploy/Targets/Generic/Parsers.py around lines 1044 to 1055, the loop that
propagates alias symmetry iterates input_node.get_aliases_of(), which omits the
input node itself and leaves the input buffer unaware of the new output alias;
change the propagation to iterate new_output_node_aliases (which already
includes input_node.name) so every alias, including the source, is updated to
add output_node.name via ctxt.lookup(...) and add_aliases(...). Ensure you keep
the existing add_aliases call and the symmetric update for each alias returned
by new_output_node_aliases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b2ff5 and 7cba0cf.

📒 Files selected for processing (2)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
  • contextagnostic (285-298)
  • ReplaceSequentialPatternPass (265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

250-250: LGTM! Improved group attribute handling.

Using .get('group', 1) properly handles the ONNX Conv default and is more robust than explicit attribute checking.


370-373: LGTM! Correctly handles FP DW Conv input transpose.

The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.


380-426: LGTM! Well-structured pass classes with clear separation of concerns.

The refactoring of PULPDWConvPass improves readability with explicit comments and sections. The new PULPFPDWConvPass class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function via partial().


505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.

The placement after PULPDWConvPass and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Feature

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants