TinyViT on non-tiled Siracusa #117

diaconuccalin · 2025-09-24T10:58:10Z

This PR brings the changes required for a working minimal TinyViT to work on the Siracusa platform, without tiling.

Added

PULP 2D FP DW conv Im2Col template and kernel, with bias support.
Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
PULP FP DW conv 2D parser.
FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
RemoveEmptyConvBiasPass to the PULP optimizer.

Changed

Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.

Fixed

Fixed bug in alias_of node parameter handling, that takes care of the lifetime of buffers in skip connection situations.
Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
Data types in the PULPOpen FP add and mul templates.

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

…ed closure naming issue

…ffer size computation. Added fixes to remove warnings

…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa

…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings

…lice and reduce mean bindings for Siracusa.

coderabbitai · 2025-09-24T11:05:06Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added floating-point depthwise convolution support with bias handling.
- Extended slice operations for floating-point types with optimized memory-aware variants.
- Enhanced floating-point reduce-mean operations with multi-type combinations.
- Added configurable multi-core support for improved parallelization.
Bug Fixes
- Fixed alias tracking for buffer lifetimes in skip connections.
- Corrected data type handling in floating-point operations.
- Fixed internal naming to prevent character-related issues.
Tests
- Added new floating-point convolution variants and model tests to CI pipeline.

Walkthrough

Adds FP32 depthwise Conv2D (Im2Col) with optional bias across parsers, bindings, templates, and kernels; introduces DMA-based Slice path and float ReduceMean bindings; fixes alias_of propagation and closure naming; propagates n_cores through contexts/deployers/tests; and adds Siracusa CI FP test variants.

Changes

Cohort / File(s)	Summary of changes
CI workflow updates `.github/workflows/ci-platform-siracusa.yml`	Added new Siracusa test names for FP conv/DW conv variants, reshape w/ skip-connection, and TinyViT demo.
Closure naming `Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py`	Prefix closure names with `_` before the closure suffix; propagate to derived identifiers.
NHWC lowering & DW passes `Deeploy/CommonExtensions/OptimizationPasses/.../LoweringOptimizationPasses.py`	Hardened group checks (use get('group',1)), added `PULPDWConvPass` and `PULPFPDWConvPass`, expanded DW input-transpose path for non-requantized convs, and wired FP DW pass into NHWC conversion sequence.
Alias tracking & n_cores in contexts `Deeploy/DeeployTypes.py`	Store copies of `alias_of`, propagate alias lists across buffer creation/hoisting, allow `TransientBuffer.size` as int
Reshape alias symmetry `Deeploy/Targets/Generic/Parsers.py`	Reshape parser now back-populates input node aliases with the output node alias to make alias relations symmetric.
PULPOpen bindings updates `Deeploy/Targets/PULPOpen/Bindings.py`	Add `FloatDataTypes`; add `PULPSliceBindings` and `PULPFloatDWConv2DBindings`; switch DMA slice to `DMASliceTemplate.referenceTemplate`; extend ReduceMean bindings to include float/int combos.
PULPOpen parsers `Deeploy/Targets/PULPOpen/Parsers.py`	FP Conv2D parser accepts 2–3 inputs and produces bias-aware context; added `PULPFPDWConv2DParser` for depthwise FP conv with bias detection.
PULPOpen platform wiring & passes `Deeploy/Targets/PULPOpen/Platform.py`	Add `RemoveEmptyConvBiasPass`; wire FP DW conv mapper/bindings; add `DMASliceMapper` alongside `SliceMapper`; expose new bindings/parsers and update `PULPMapping`.
Float Conv templates & ctxt buffer `Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py`	New `PULP2DFloatDWConvIm2ColTemplate` with dynamic per-core context buffer sizing/hoisting; templates updated to pass `ctxtBuffer`, `pSrcBias`, and `has_bias`; added DW Im2Col reference template.
Float Add / Mul templates (types & casts) `Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py`, `Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py`	Standardized core/index types to unsigned (`uint32_t`), added explicit casts in chunk/min calculations, and minor spacing/formatting adjustments.
Float GEMM / MatMul templates (batching) `Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py`, `Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py`	Add conditional handling for batched A/B with guarded updates and per-batch pointer selection.
Kernel headers: Conv prototypes (bias + DW) `TargetLibraries/PULPOpen/inc/kernel/Conv.h`	Updated prototypes to accept `const float32_t* pSrcBias` and `const bool has_bias`; added biased DW Im2Col prototype with `pContextBuffer`.
FP conv implementations `TargetLibraries/PULPOpen/src/Convolution_fp32.c`, `TargetLibraries/PULPOpen/src/DWConvolution_fp32.c`	Added bias-aware signatures and logic, per-core channel partitioning, and new `PULP_DW_Conv2d_Im2Col_fp32` implementation (DW Im2Col FP32).
Deployer / n_cores propagation `Deeploy/Targets/PULPOpen/Deployer.py`, `Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py`, `Deeploy/TilingExtension/TilerExtension.py`	Add `n_cores` parameter/field to deployers and sign-propagator; propagate `n_cores` into contexts and tiler wrapper (ctxt.n_cores assignment).
Test runner / CLI / mapping helpers `DeeployTest/testMVP.py`, `DeeployTest/testRunner_tiled_siracusa.py`, `DeeployTest/testUtils/testRunner.py`, `DeeployTest/testUtils/platformMapping.py`	Expose `-n_cores` CLI option; thread `n_cores` through mapDeployer, TestRunner, and deployer creation; include core count in CMake args for Siracusa tests.
CHANGELOG `CHANGELOG.md`	Documented new FP DW conv/template/kernel, bias support, DMA slice path, n_cores propagation, CI tests, and related fixes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant ONNX as ONNX Graph
  participant Lower as NHWC Lowering
  participant Map as PULPOpen Mapping
  participant Parser as PULPFPDWConv2DParser
  participant Bind as PULPFloatDWConv2DBindings
  participant Tmpl as DW Im2Col Template
  participant Kern as PULP_DW_Conv2d_Im2Col_fp32

  ONNX->>Lower: detect Conv (inspect group attr)
  Lower-->>Map: lowered node (transpose weights/inputs if needed)
  Map->>Parser: parseNode / parseNodeCtxt
  Parser-->>Map: context (has_bias, inputs, depthwise)
  Map->>Bind: select DW conv bindings
  Bind-->>Tmpl: emit template call (include pSrcBias, has_bias, ctxtBuffer)
  Tmpl->>Tmpl: compute/hoist per-core ctxtBuffer using ctxt.n_cores
  Tmpl->>Kern: call kernel(..., pSrcBias, has_bias, pContextBuffer)
  Kern-->>Tmpl: perform per-core DW Im2Col convolution and write outputs
  Tmpl-->>Map: generated code/artifacts

sequenceDiagram
  autonumber
  participant ONNX as ONNX Graph
  participant Map as PULPOpen Mapping
  participant SliceStd as SliceMapper (PULPSliceBindings)
  participant SliceDMA as DMASliceMapper (PULPDMASliceBindings)

  ONNX->>Map: dispatch Slice node
  alt standard slice path
    Map->>SliceStd: select PULPSliceBindings
    SliceStd-->>Map: generate standard slice code
  else DMA-capable slice path
    Map->>SliceDMA: select PULPDMASliceBindings
    SliceDMA-->>Map: generate DMA-based slice code
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Bug fixes, API Cleanup and Reduce Compiler Warning on PULP #112 — Overlapping changes touching closure naming, deployer/context wiring, and related templates/parsers.
Refactor tiling code generation #105 — Related work on lowering passes and DW/FP‑DW lowering/wiring.
Refactor Logging for Improved Debugging #115 — Related to deployer/context plumbing and n_cores propagation.

Suggested reviewers

Victor-Jung
Xeratec
lukamac

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "TinyViT on non-tiled Siracusa" directly reflects the main objective of the changeset, which is to enable minimal TinyViT support on the Siracusa platform without tiling. The title is concise, clear, and specific enough to convey the primary change—a teammate scanning commit history would immediately understand the PR's core purpose without needing to read the full description.
Description Check	✅ Passed	The description provides meaningful and comprehensive information about the changeset. It includes a clear summary statement, well-organized sections for Added/Changed/Fixed items, and covers multiple aspects of the PR including new templates, parsers, bindings, CI tests, optimization passes, and bug fixes. The description is directly related to the actual changes across 26 files and is neither vague nor generic.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cba0cf and da162b0.

📒 Files selected for processing (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-24T11:56:35.781Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

📚 Learning: 2025-09-24T12:49:17.889Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)

contextagnostic (285-298)

ReplaceSequentialPatternPass (265-282)

🔇 Additional comments (5)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5)

250-250: Good improvement: using .get() with ONNX default.

This change correctly handles the missing 'group' attribute by defaulting to 1, which aligns with the ONNX specification. This is more concise and Pythonic than the previous explicit membership check.

344-344: Consistent improvement matching line 250.

Good consistency applying the same .get('group', 1) pattern here for proper default handling.

370-373: FP DW Conv handling via input transpose.

This else-branch correctly handles the floating-point DW convolution case by transposing the input rather than the weight (as done for RequantizedConv at lines 364-369). The different approach aligns with the FP implementation requirements mentioned in the PR objectives.

380-425: Clean refactoring with good code reuse.

The refactoring of PULPDWConvPass and addition of PULPFPDWConvPass follows the existing pattern established by other passes in this file. Both classes effectively reuse _PULPDWNCHWtoNHWC_fun via different pattern graphs, which is elegant and maintainable. The pattern-based approach ensures:

PULPDWConvPass matches RequantizedConv nodes → weight transpose path

PULPFPDWConvPass matches Conv nodes → input transpose path

505-505: Proper integration of FP DW Conv pass.

The placement of PULPFPDWConvPass in the pass sequence is correct, running after the integer DW pass and before the dense conv passes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
344-346: Prevent KeyError when 'group' attr is absent

Depthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.
-    if opNode.attrs['group'] == 1:
-        return graph
+    group = opNode.attrs.get('group', 1)
+    if group == 1:
+        return graph
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

16-23: Loop index type should match buffer indexing

Use unsigned 32-bit for i to match array indexing and avoid UB on large sizes.

(Handled in the diff above.)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct

Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).

Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).

(bias, has_bias), padding order and context buffer position are already correct.

🧹 Nitpick comments (16)

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-11: Avoid libm at runtime: use integer log2 for NUM_CORES

Replace double log2() with __builtin_ctz() to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.

Apply this diff:
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two");
+#endif
+uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);
.github/workflows/ci-platform-siracusa.yml (1)

98-98: TinyViT demo test: validate runtime envelope on Siracusa

Great to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.

Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.

This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.
Deeploy/DeeployTypes.py (3)
320-324: Prefer None over mutable default and let ctor initialize alias_of.

Passing alias_of = [] creates a new list here (safe), but use None consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.
-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            alias_of = None,
+        )
469-474: Avoid sharing alias list between buffers; copy it.

Passing buffer.alias_of by reference couples future mutations between the source and new ConstantBuffer.
-        ret = cls(
+        ret = cls(
             name = buffer.name,
             shape = buffer.shape,
             values = values,
-            alias_of = buffer.alias_of,
+            alias_of = list(buffer.alias_of),
         )
1176-1180: Pass alias_of=None and let the ctor initialize.

Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.
-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    alias_of = None,
+                )
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)
137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop

im2col_idx % (P * Q) is redundant. Precompute a per-filter base pointer for tighter inner loops.

Apply this diff in both bias and non-bias paths:
-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum +=
-                  im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q);
+            for (uint32_t k = 0; k < P * Q; k++) {
+              sum += im2col_buffer[k] * local_w[k];
+            }
Also applies to: 236-240

47-48: Remove unused variable kernel_size

kernel_size is computed but never used.
-  uint32_t kernel_size = P * Q * F_total;
61-97: Consider simplifying padding handling with a single p/q nested loop

You can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

20-27: Document pContextBuffer size expectations and bias semantics

The new signatures look correct. Please add brief comments indicating required pContextBuffer sizes:

Conv2d_Im2Col: NUM_CORES * (CPQ) floats

DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
Also note that has_bias governs whether pSrcBias is read.

Also applies to: 29-36
Deeploy/Targets/PULPOpen/Bindings.py (2)
239-245: Constrain FP DWConv binding to supported float type(s).

FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.

Apply this diff:
-        ForkTransformer) for float_type in FloatDataTypes
+        ForkTransformer) for float_type in [float32_t]
299-302: ReduceMean (float): restrict axes integer type to int32.

ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.

Apply this diff:
-    for integer_type in SignedIntegerDataTypes
-    for float_type in FloatDataTypes
+    for integer_type in [int32_t]
+    for float_type in FloatDataTypes
Deeploy/Targets/PULPOpen/Platform.py (1)
108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.

If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.

Apply this diff to prioritize DWConv first:
-    'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]),
+    'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),
Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
42-51: Fix return typing and silence ARG004 (unused ctxt)

Return type now returns a str for size; widen the annotation to reflect this.

Rename ctxt to _ctxt to address Ruff ARG004.

Apply:
-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(_ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:
93-93: Typo in comment: “ChannelOout” → “ChannelOut”

Minor doc polish.
-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
124-126: Typo in comment: “ChannelOout” → “ChannelOut”

Same typo in DW template header.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa3c31 and de2752b.

📒 Files selected for processing (17)

.github/workflows/ci-platform-siracusa.yml (3 hunks)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2 hunks)
Deeploy/DeeployTypes.py (4 hunks)
Deeploy/Targets/Generic/Parsers.py (1 hunks)
Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
Deeploy/Targets/PULPOpen/Parsers.py (1 hunks)
Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
TargetLibraries/PULPOpen/inc/DeeployPULPMath.h (1 hunks)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/DeeployTypes.py (2)

add_aliases (326-345)

lookup (763-795)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

Deeploy/DeeployTypes.py (6)

NodeTemplate (82-224)

computeTransientBuffersSize (137-157)

NetworkContext (554-1063)

hoistTransientBuffers (160-180)

hoistTransientBuffer (864-883)

lookup (763-795)

TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)

PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)

Deeploy/Targets/PULPOpen/Bindings.py (3)

Deeploy/CommonExtensions/DataTypes.py (4)

float32_t (74-78)

int8_t (12-15)

int32_t (24-27)

uint8_t (36-39)

Deeploy/Targets/Generic/TypeCheckers.py (3)

SliceChecker (37-51)

ConvChecker (456-474)

ReduceMeanChecker (310-324)

Deeploy/AbstractDataTypes.py (1)

PointerClass (522-545)

Deeploy/Targets/PULPOpen/Platform.py (4)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

RemoveEmptyConvBiasPass (667-679)

Deeploy/Targets/PULPOpen/Parsers.py (1)

PULPFPDWConv2DParser (122-188)

Deeploy/DeeployTypes.py (1)

NodeMapper (1695-1838)

Deeploy/Targets/Generic/Parsers.py (1)

SliceParser (97-155)

Deeploy/Targets/PULPOpen/Parsers.py (2)

Deeploy/DeeployTypes.py (5)

inputs (2492-2509)

lookup (763-795)

parseNode (1076-1091)

parseNodeCtxt (1094-1119)

NetworkContext (554-1063)

Deeploy/Targets/Generic/Parsers.py (31)

Conv2DParser (1182-1248)

parseNode (19-27)

parseNode (49-58)

parseNode (81-94)

parseNode (102-112)

parseNode (163-169)

parseNode (193-207)

parseNode (229-251)

parseNode (291-304)

parseNode (326-338)

parseNode (375-387)

parseNode (422-426)

parseNode (449-460)

parseNode (484-495)

parseNode (531-535)

parseNode (551-554)

parseNodeCtxt (29-41)

parseNodeCtxt (60-76)

parseNodeCtxt (114-155)

parseNodeCtxt (171-185)

parseNodeCtxt (209-221)

parseNodeCtxt (253-283)

parseNodeCtxt (306-318)

parseNodeCtxt (340-367)

parseNodeCtxt (389-414)

parseNodeCtxt (428-441)

parseNodeCtxt (462-476)

parseNodeCtxt (497-523)

parseNodeCtxt (537-543)

parseNodeCtxt (556-573)

parseNodeCtxt (586-605)

🪛 Ruff (0.13.1)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

43-43: Unused static method argument: ctxt

(ARG004)

Deeploy/DeeployTypes.py

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (fail-input2)

🔇 Additional comments (23)

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: unsigned arithmetic for start/end bounds

Casting ${size} to uint32_t aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistently uint32_t.

.github/workflows/ci-platform-siracusa.yml (2)

75-75: Reshape + skip connection test: ensure alias_of fix is covered

Good addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.

61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo

.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)
27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.

Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).

39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.

If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:

enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or

gate the out advance with an explicit data_out_batched flag.

Example if such a flag is available:
-    ref_${data_out}_${data_out} += ${M} * ${O};
+    % if data_out_batched:
+    ref_${data_out}_${data_out} += ${M} * ${O};
+    % endif
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

370-374: Verify FP DW Conv layout: input-only transpose may be insufficient

For FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.

385-403: New PULPDWConvPass wiring looks good

Pattern graph + partial replacement is clear and consistent with the pipeline.

404-426: New PULPFPDWConvPass wiring looks good

Extends NHWC conversion to FP DW conv as intended.

505-505: Pass ordering LGTM

Adding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

41-43: Confirm context buffer sizing per core

This implementation expects pContextBuffer sized to NUM_CORES * (P*Q) floats. Please confirm templates/allocations provide that.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-16: LGTM on signature extension and core partitioning

Bias handling and per-core channel slicing are consistent. Weight and output indexing are correct.

Please ensure all call sites updated to pass pSrcBias, has_bias, and pContextBuffer (for the Im2Col variant).

Also applies to: 18-39

139-176: Good reuse of im2col buffer and contiguous weight blocks

The inner product uses a contiguous kernel_size slice, matching the buffer layout. Looks correct.

Also applies to: 179-218

Deeploy/Targets/PULPOpen/Parsers.py (3)

104-116: Bias handling looks correct; maps optional third input and flags has_bias

Parsing and context mapping are consistent with the new kernels.

Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.

127-159: DW FP2D parser shape/padding checks look reasonable

Depthwise gating via inputs count and padding symmetry is fine.

185-186: Depthwise condition check aligns with ONNX semantics

group == ch_im_in is the expected DW condition.

Deeploy/Targets/PULPOpen/Bindings.py (2)

143-153: DMA Slice binding switch looks good.

Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.

155-165: Float Slice mapping: validate starts/ends/axes/steps typing.

Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-79: FP DWConv mapper wiring: LGTM.

Parser and bindings are correctly introduced and exposed.

92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.

Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.

8-9: RemoveEmptyConvBiasPass import: LGTM.

Consistent with optimizer usage below.

235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.

Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

60-62: Good: transient buffer typed to input element

Ensures the generated C sees the correct pointee type instead of void. This helps alloc/codegen correctness.

81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWC

TargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.

.github/workflows/ci-platform-siracusa.yml

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

Deeploy/DeeployTypes.py

Deeploy/Targets/Generic/Parsers.py

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

TargetLibraries/PULPOpen/inc/DeeployPULPMath.h

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c

Signed-off-by: Călin Diaconu <[email protected]>

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Deeploy/DeeployTypes.py (1)
241-241: Fix remaining mutable default arguments.

VariableBuffer.__init__ still has mutable defaults (shape = [1], alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples or None) to follow Python best practices and avoid shared mutable state across calls.

Apply this diff:
-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
The defensive copy on line 260 already handles the None case correctly, so no further changes are needed there.

Based on past review feedback and static analysis (Ruff B006).

🧹 Nitpick comments (2)

Deeploy/DeeployTypes.py (2)
454-454: Optional: Avoid redundant defensive copy.

Since VariableBuffer.__init__ already defensively copies alias_of (line 260), passing list(alias_of) here results in double copying. You can simplify by passing the list directly:
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+        super().__init__(name, shape, alias_of)
This is a minor optimization and doesn't affect correctness.

490-495: Optional: Make alias propagation more explicit.

While correct (the constructor defensively copies alias_of), explicitly copying here would make the intent clearer:
     ret = cls(
         name = buffer.name,
         shape = buffer.shape,
         values = values,
-        alias_of = buffer.alias_of,
+        alias_of = list(buffer.alias_of),  # Explicit copy for clarity
     )
This is a minor clarity improvement and doesn't affect correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7e4c8 and 4165983.

📒 Files selected for processing (2)

CHANGELOG.md (4 hunks)
Deeploy/DeeployTypes.py (9 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

Deeploy/DeeployTypes.py (1)

DeeployTest/testUtils/dmaUtils.py (1)

size (72-73)

🪛 Ruff (0.13.3)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (4)

Deeploy/DeeployTypes.py (4)

325-329: LGTM!

Correctly initializes alias_of with an explicit empty list for each new instance, avoiding shared mutable state.

362-362: LGTM!

Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.

404-406: LGTM!

The updates to support size as Union[int, str] are consistent across TransientBuffer.__init__, _bufferRepresentation, and hoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.

Also applies to: 431-431, 888-888

1200-1204: LGTM!

Correctly initializes non-global outputs with an explicit empty alias_of list, maintaining consistency with the alias tracking implementation.

Deeploy/DeeployTypes.py

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

Deeploy/DeeployTypes.py (2)

241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)

Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.

Apply:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
         self.name: str = name  #: str: Canonical name that this buffer is registered as in the NetworkContext
-        self.shape: Sequence[
-            int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
+        self.shape: Sequence[int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
@@
-        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []
+        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

As per Ruff B006.

452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)

Use immutable defaults and forward alias_of safely.

Apply:

-    def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None):
-        # Pass a copy of alias_of to avoid shared references
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+    def __init__(self,
+                 name: str = '',
+                 shape: Sequence[int] = (1,),
+                 values: Union[Sequence, np.ndarray] = (0,),
+                 alias_of: Optional[List[str]] = None):
+        super().__init__(name, shape, alias_of)

(values processing via np.asarray below remains correct.)

🧹 Nitpick comments (4)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-24: Im2Col buffer bytes should derive from input element width (not weight).

The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.

Apply:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
-            'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \
+                     * ctxt.n_cores \
+                     * operatorRepresentation['ch_im_in'] \
+                     * operatorRepresentation['dim_kernel_x'] \
+                     * operatorRepresentation['dim_kernel_y']

43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.

Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
Comment says “value is only used as string”, but an int is produced. Update comment and/or value.

Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
@@
-        # Memory allocation for the im2col buffer can be dynamic, based on the number of cores.
-        # WARNING: This works because value is only used as string, in the allocate template.
-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS.
+        # Byte size is derived from the buffer's referenced type at allocation/codegen time.
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']

Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.

124-154: DW Im2Col call wiring is consistent; minor docstring typo.

The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

Deeploy/DeeployTypes.py (1)

2708-2714: Preserve context name when recreating NetworkContext during parse

Recreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.

Apply:

-        self.ctxt = NetworkContext(
+        self.ctxt = NetworkContext(
             variableBuffer = self.Platform.VariableBuffer,
             constantBuffer = self.Platform.ConstantBuffer,
             structBuffer = self.Platform.StructBuffer,
             transientBuffer = self.Platform.TransientBuffer,
-            n_cores = self.ctxt.n_cores,
+            n_cores = self.ctxt.n_cores,
+            name = self.name,
         )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4165983 and af78d75.

📒 Files selected for processing (9)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
Deeploy/DeeployTypes.py (15 hunks)
Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
Deeploy/TilingExtension/TilerExtension.py (1 hunks)
DeeployTest/testMVP.py (4 hunks)
DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
DeeployTest/testUtils/platformMapping.py (2 hunks)
DeeployTest/testUtils/testRunner.py (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-24T12:17:21.624Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

📚 Learning: 2025-09-24T12:49:17.889Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

🧬 Code graph analysis (4)

DeeployTest/testMVP.py (3)

Deeploy/DeeployTypes.py (1)

NetworkDeployer (3256-3633)

DeeployTest/testUtils/platformMapping.py (1)

mapDeployer (92-261)

DeeployTest/testSchedulingExtension.py (1)

_mockScheduler (29-33)

DeeployTest/testRunner_tiled_siracusa.py (1)

DeeployTest/testUtils/testRunner.py (1)

TestRunner (279-450)

DeeployTest/testUtils/platformMapping.py (2)

Deeploy/DeeployTypes.py (3)

DeploymentPlatform (2451-2494)

TopologyOptimizer (2249-2278)

NetworkDeployer (3256-3633)

Deeploy/Targets/PULPOpen/Deployer.py (1)

PULPDeployer (29-128)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

Deeploy/DeeployTypes.py (6)

NodeTemplate (87-229)

computeTransientBuffersSize (142-162)

NetworkContext (575-1090)

hoistTransientBuffers (165-185)

hoistTransientBuffer (891-910)

lookup (790-822)

🪛 GitHub Actions: CI • Snitch (Tiled)

DeeployTest/testUtils/testRunner.py

[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?

[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max

🪛 Ruff (0.13.3)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)

GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool
GitHub Check: siracusa-models / test-runner-siracusa
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: linting
GitHub Check: chimera-kernels / test-runner-chimera
GitHub Check: generic-kernels / test-runner-generic
GitHub Check: Test Type Inference (fail-input2)
GitHub Check: Test Type Inference (pass)
GitHub Check: Test Type Inference (fail-input0)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: mempool-kernels / test-runner-mempool

🔇 Additional comments (6)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

81-84: Bias/has_bias plumbing for direct conv looks good.

Bias flags are correctly passed to the FP Conv2d HWC call.

100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.

The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.

If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.

Deeploy/TilingExtension/TilerExtension.py (1)

987-989: LGTM: propagate core count to context

Assigning ctxt.n_cores ensures downstream kernels see the configured core count.

DeeployTest/testRunner_tiled_siracusa.py (1)

20-26: LGTM: cores threaded into TestRunner and CMake

Passing args.cores to TestRunner and exporting NUM_CORES is correct.

Also applies to: 28-28

DeeployTest/testUtils/platformMapping.py (1)

92-104: LGTM: add n_cores and forward to PULPDeployer

Signature and forwarding align with deployer changes.

Also applies to: 214-224

DeeployTest/testMVP.py (1)

57-60: LGTM: introduce and propagate n_cores

CLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.

Also applies to: 84-93, 201-206, 261-263

coderabbitai · 2025-10-13T10:25:33Z

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

+                 inputOffsets: Dict[str, int] = {},
+                 n_cores: int = 8):


🛠️ Refactor suggestion | 🟠 Major

Avoid mutable default for inputOffsets; initialize from None

Using {} as a default shares state across instances. Accept None and init per inputs. Also simplify the empty-check.

Apply:

- inputOffsets: Dict[str, int] = {}, - n_cores: int = 8): + inputOffsets: Optional[Dict[str, int]] = None, + n_cores: int = 8): @@ - if inputOffsets == {}: - for key in inputTypes.keys(): - inputOffsets[key] = 0 + if inputOffsets is None: + inputOffsets = {key: 0 for key in inputTypes.keys()}

As per Ruff B006.

Also applies to: 30-35

🧰 Tools

🪛 Ruff (0.13.3)

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents

In Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py around lines 25-26 and 30-35, the function/class signature uses a mutable default inputOffsets: Dict[str, int] = {} which shares state across calls; change the parameter default to inputOffsets: Optional[Dict[str,int]] = None and inside the body set inputOffsets = {} if inputOffsets is None, then use that local dict; also simplify any subsequent checks for emptiness to use simple truthiness (if not inputOffsets) or explicit length checks as appropriate.

coderabbitai · 2025-10-13T10:25:33Z

Deeploy/DeeployTypes.py

+                 name: str = 'DeeployNetwork',
+                 n_cores: int = 8):
        self.globalObjects = OrderedDict()
        self.localObjects = OrderedDict()
        self.VariableBuffer = variableBuffer
        self.ConstantBuffer = constantBuffer
        self.StructBuffer = structBuffer
        self.TransientBuffer = transientBuffer
+        self.n_cores = n_cores
        self.name = name
+        self.n_cores = n_cores



⚠️ Potential issue | 🟡 Minor

Duplicate assignment of n_cores in NetworkContext.init

self.n_cores is set twice. Remove one.

Apply:

- self.n_cores = n_cores self.name = name self.n_cores = n_cores

Optional: also replace mutable defaults for globalObjects/localObjects with None and init inside to avoid shared dicts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

name: str = 'DeeployNetwork',

n_cores: int = 8):

self.globalObjects = OrderedDict()

self.localObjects = OrderedDict()

self.VariableBuffer = variableBuffer

self.ConstantBuffer = constantBuffer

self.StructBuffer = structBuffer

self.TransientBuffer = transientBuffer

self.n_cores = n_cores

self.name = name

self.n_cores = n_cores

def __init__(self,

# … other parameters …

name: str = 'DeeployNetwork',

n_cores: int = 8):

self.globalObjects = OrderedDict()

self.localObjects = OrderedDict()

self.VariableBuffer = variableBuffer

self.ConstantBuffer = constantBuffer

self.StructBuffer = structBuffer

self.TransientBuffer = transientBuffer

- self.n_cores = n_cores

self.name = name

self.n_cores = n_cores

🤖 Prompt for AI Agents

In Deeploy/DeeployTypes.py around lines 586 to 597, NetworkContext.__init__ assigns self.n_cores twice; remove the redundant assignment so n_cores is only set once. Also (optional) avoid mutable default shared dicts by changing parameters defaulting to OrderedDict() to None and initializing self.globalObjects = OrderedDict() and self.localObjects = OrderedDict() inside __init__ when the corresponding arg is None.

@lukamac, got it! The duplicate self.n_cores = n_cores assignment has been noted. 👍

coderabbitai · 2025-10-13T10:25:33Z

Deeploy/Targets/PULPOpen/Deployer.py

+                 inputOffsets = {},
+                 n_cores: int = 8):
+        super().__init__(
+            graph = graph,
+            deploymentPlatform = deploymentPlatform,
+            inputTypes = inputTypes,
+            loweringOptimizer = loweringOptimizer,
+            scheduler = scheduler,
+            name = name,
+            default_channels_first = default_channels_first,
+            deeployStateDir = deeployStateDir,
+            inputOffsets = inputOffsets,
+            n_cores = n_cores,
+        )


🛠️ Refactor suggestion | 🟠 Major

Avoid mutable default for inputOffsets; forward None to base

Use Optional and let the base handle initialization.

Apply:

- inputOffsets = {}, - n_cores: int = 8): - super().__init__( + inputOffsets: Optional[Dict[str, int]] = None, + n_cores: int = 8): + super().__init__( @@ - inputOffsets = inputOffsets, + inputOffsets = inputOffsets, n_cores = n_cores, )

(Requires the SignPropDeployer change to accept Optional.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inputOffsets = {},

n_cores: int = 8):

super().__init__(

graph = graph,

deploymentPlatform = deploymentPlatform,

inputTypes = inputTypes,

loweringOptimizer = loweringOptimizer,

scheduler = scheduler,

name = name,

default_channels_first = default_channels_first,

deeployStateDir = deeployStateDir,

inputOffsets = inputOffsets,

n_cores = n_cores,

)

inputOffsets: Optional[Dict[str, int]] = None,

n_cores: int = 8):

super().__init__(

graph = graph,

deploymentPlatform = deploymentPlatform,

inputTypes = inputTypes,

loweringOptimizer = loweringOptimizer,

scheduler = scheduler,

name = name,

default_channels_first = default_channels_first,

deeployStateDir = deeployStateDir,

inputOffsets = inputOffsets,

n_cores = n_cores,

)

🧰 Tools

🪛 Ruff (0.13.3)

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents

In Deeploy/Targets/PULPOpen/Deployer.py around lines 40 to 53, the constructor uses a mutable default argument inputOffsets = {} which should be avoided; change the parameter type to Optional[dict] and default it to None, update any necessary typing imports, and forward that None to the base class (letting the base initializer create the default dict) so you no longer use a shared mutable default; ensure the SignPropDeployer/base constructor is updated to accept Optional as noted.

coderabbitai · 2025-10-13T10:25:33Z

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

+        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
+                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_name = operatorRepresentation['nodeName'] + "_buffer"
+        return [(im2col_name, im2col_dim)]


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Avoid potential double-accounting of bytes for DW Im2Col buffer.

Here you compute bytes (via typeWidth//8) and also set the buffer’s referencedType. If the allocator multiplies element count by sizeof(type) when a concrete type is set (as used elsewhere), this could oversize the buffer by a factor of element size.

Option A (recommended): Keep im2col_dim as element count (remove × bytes) and retain the referencedType assignment.

Option B: Keep im2col_dim in bytes and do not rely on per-type multiplication in the allocation path for this buffer.

Please verify allocator/codegen behavior. Run:

If multiplication by sizeof(type) is present for typed transient buffers, switch to element-count sizing here to avoid double-accounting. Based on learnings.

Also applies to: 59-65

🏁 Script executed:

#!/bin/bash # Inspect how ctxtBufferSize and TransientBuffer.size are consumed. rg -n -C3 -P 'ctxtBuffer(Size)?\b|TransientBuffer\(|hoistTransientBuffer\(' Deeploy rg -n -C3 -P '(alloc|allocate|malloc|rt_alloc|pi_l2_malloc|sizeof)\\s*\\(' # Look for size computations mixing buffer.type and size rg -n -C4 -P '(?:ctx|ctxt|buffer).*size|size.*type|referencedType.*size' Deeploy

Length of output: 119828

Remove the typeWidth//8 factor from im2col_dim
Drop the //8 multiplication in the im2col_dim calculation (FloatConvTemplate.py lines 48–51 and 59–65). PULP’s allocation templates already do sizeof(type) * size, so passing element counts avoids double‐sizing.

🤖 Prompt for AI Agents

In Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py around lines 48–51 and 59–65, the im2col_dim computation multiplies by operatorRepresentation["weight_type"].typeWidth // 8 which double-sizes the allocation; remove the typeWidth // 8 factor so im2col_dim represents element count only (leave the remaining factors: ctxt.n_cores * dim_kernel_x * dim_kernel_y and the name construction unchanged), updating both occurrences accordingly.

coderabbitai · 2025-10-13T10:25:34Z

DeeployTest/testUtils/testRunner.py

 class TestRunner():

-    def __init__(self,
-                 platform: str,
-                 simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
-                 tiling: bool,
-                 argument_parser: TestRunnerArgumentParser,
-                 gen_args: str = "",
-                 cmake_args: str = ""):
+    def __init__(
+        self,
+        platform: str,
+        simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],
+        tiling: bool,
+        argument_parser: TestRunnerArgumentParser,
+        gen_args: str = "",
+        cmake_args: str = "",
+        cores: int = 8,
+    ):


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix AttributeError and command construction in TestRunner.generate_test

Typo: use self.n_cores (not self._cores).

Remove redundant command reassignment.

Apply:

- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}" - - command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" + command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" @@ - if self._tiling is True: - command += f" -n_cores {self._cores}" + if self._tiling is True: + command += f" -n_cores {self.n_cores}"

This resolves the CI failure.

Run to confirm no remaining _cores typos:

Also applies to: 349-357

🏁 Script executed:

#!/bin/bash rg -nP '_cores\b' -C2

Length of output: 21522

Fix duplicate command and typo in TestRunner.generate_test

In testUtils/testRunner.py (generate_test):

Remove the redundant initial command = … {self.gen_args} assignment.

Replace self._cores with self.n_cores when appending -n_cores.

- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}" - - command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" + command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" @@ - if self._tiling is True: - command += f" -n_cores {self._cores}" + if self._tiling is True: + command += f" -n_cores {self.n_cores}"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

class TestRunner():

def __init__(self,

platform: str,

simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],

tiling: bool,

argument_parser: TestRunnerArgumentParser,

gen_args: str = "",

cmake_args: str = ""):

def __init__(

self,

platform: str,

simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'],

tiling: bool,

argument_parser: TestRunnerArgumentParser,

gen_args: str = "",

cmake_args: str = "",

cores: int = 8,

):

command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"

if self._tiling is True:

command += f" -n_cores {self.n_cores}"

🤖 Prompt for AI Agents

In DeeployTest/testUtils/testRunner.py around lines 279 to 290, the generate_test builds the same command twice and uses the wrong attribute name for cores; remove the redundant initial "command = … {self.gen_args}" assignment so the command is constructed only once, and replace any use of self._cores with self.n_cores when appending the "-n_cores" flag so the correct attribute is used.

lukamac

This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py

lukamac · 2025-09-29T07:10:49Z

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

+        else:
+            inputTransposeNode, inputTransposeOutput = _appendTransposeNode(inputNode, name + "_TransposeIn", inPermute)
+            opNode.inputs[0] = inputTransposeOutput
+            graph.nodes.append(inputTransposeNode)


Can you clarify why we are transposing the input on non-RequantizedConvs? which cases are that?

I understood it from our private conversation, it's the floating-point implementation of the dw conv. As suggested somewhere else, I would separate it into a dedicated function NCHWtoNHWC_dw_fun for clarity, or just wait for my PR to land.

I think we can wait for your PR.

lukamac · 2025-10-14T13:19:51Z

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

                 deeployStateDir: str = "DeeployState",
-                 inputOffsets: Dict[str, int] = {}):
+                 inputOffsets: Dict[str, int] = {},
+                 n_cores: int = 8):


Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.

The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.

lukamac · 2025-10-14T13:23:02Z

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py


            # Non DW-Type:
-            if opNode.attrs['group'] == 1:
+            if 'group' in opNode.attrs and opNode.attrs['group'] == 1:


This should better be covered by a get with a default because ONNX Conv spec defines the default for group to be 1:

if opNode.attrs.get('group', 1):

P.S. I have a PR refactoring this file so you might have some merge conflicts

I will replace it with a get, but even if the default is group=1, this also handles the RequantizedConv, which is "non-ONNX-canonical". If I remember correctly, it was the RequantConv that was missing this attribute and raising an error in those cases.

Although, looking again at it, I can't implement a getter function, since the opNode is a Node object, implemented in the ONNX Graph Surgeon library. I may implement a private getter somewhere in this file, but to me it seems superfluous, since it would do pretty much the same operation as the current if, and it would only be used once. So let me know please if and how you'd like this to be modified.

you don't need to implement a get function, it's already implemented for the Python dictionaries (which node.attrs is).
And yes, RequantizedConv is not a standard ONNX node, but it is kinda inheriting a lot of the stuff from the Conv node sooooo, imo, I would say it's ok to assume that default.

lukamac · 2025-10-14T13:43:25Z

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

+
+        # Initialize Pass
+        super().__init__(pattern = graph,
+                         replacement_fn = partial(_PULPDWNCHWtoNHWC_fun,


I recommend writing another NCHWtoNHWC_dw function for the PULP conv kernels, and maybe even check that it's an fp kernel, just to make it even clearer that it differs from the integer one. The transposition of the input is quite a big difference that imo deserves a separate function.

As said in a comment above, if it's ok with you, I'll wait for your PR and make the changes afterwards.

lukamac · 2025-10-14T13:56:50Z

Deeploy/DeeployTypes.py

+                 name: str = 'DeeployNetwork',
+                 n_cores: int = 8):
        self.globalObjects = OrderedDict()
        self.localObjects = OrderedDict()
        self.VariableBuffer = variableBuffer
        self.ConstantBuffer = constantBuffer
        self.StructBuffer = structBuffer
        self.TransientBuffer = transientBuffer
+        self.n_cores = n_cores
        self.name = name
+        self.n_cores = n_cores



Deeploy/Targets/Generic/Parsers.py

Deeploy/Targets/PULPOpen/Deployer.py

Deeploy/Targets/PULPOpen/Parsers.py

lukamac · 2025-10-14T14:20:18Z

Deeploy/Targets/PULPOpen/Parsers.py

        return ctxt, False


+class PULPFPDWConv2DParser(Conv2DParser):


if this parser does basically the same thing as the above FP Conv parser, you can inherit it, and just check for the additional group condition. You can immediately check in the parseNode if the group attribute is present and equal to the node.inputs[1].shape[<ch idx>]

They are mainly the same, but with a key direct incompatibility, because of the self.operatorRepresentation['group'] == 1, condition in regular Conv2d.

That condition does not really make sense in the context of the ONNX Conv definition, but grouping is not handled in the FP kernel of PULPOpen, so for now I think it's a good check to keep it in place (which would mean that we need to keep the 2 parsers, regular and DW, separate for now).

Let me know if you'd like me to proceed differently.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Deeploy/Targets/PULPOpen/Deployer.py (1)
31-53: Avoid the shared mutable default for inputOffsets

Keeping inputOffsets = {} means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safe None default and materialize a new dict per instance before calling super().
-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
+        if inputOffsets is None:
+            inputOffsets = {}
+
         super().__init__(
             graph = graph,
             deploymentPlatform = deploymentPlatform,
             inputTypes = inputTypes,
             loweringOptimizer = loweringOptimizer,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4978 and 48b2ff5.

📒 Files selected for processing (5)

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
Deeploy/DeeployTypes.py (15 hunks)
Deeploy/Targets/Generic/Parsers.py (1 hunks)
Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py

🧰 Additional context used

🧬 Code graph analysis (2)

Deeploy/Targets/Generic/Parsers.py (1)

Deeploy/DeeployTypes.py (3)

add_aliases (331-350)

get_aliases_of (352-364)

lookup (788-820)

Deeploy/Targets/PULPOpen/Parsers.py (2)

Deeploy/DeeployTypes.py (5)

inputs (2579-2596)

lookup (788-820)

parseNode (1101-1116)

parseNodeCtxt (1119-1144)

NetworkContext (574-1088)

Deeploy/Targets/Generic/Parsers.py (31)

Conv2DParser (1204-1270)

parseNode (19-27)

parseNode (49-58)

parseNode (81-94)

parseNode (102-112)

parseNode (163-169)

parseNode (193-207)

parseNode (229-251)

parseNode (291-304)

parseNode (326-338)

parseNode (375-387)

parseNode (422-426)

parseNode (449-460)

parseNode (484-495)

parseNode (531-535)

parseNode (551-554)

parseNodeCtxt (29-41)

parseNodeCtxt (60-76)

parseNodeCtxt (114-155)

parseNodeCtxt (171-185)

parseNodeCtxt (209-221)

parseNodeCtxt (253-283)

parseNodeCtxt (306-318)

parseNodeCtxt (340-367)

parseNodeCtxt (389-414)

parseNodeCtxt (428-441)

parseNodeCtxt (462-476)

parseNodeCtxt (497-523)

parseNodeCtxt (537-543)

parseNodeCtxt (556-573)

parseNodeCtxt (586-605)

🪛 Ruff (0.14.0)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2025-10-17T12:37:35Z

Deeploy/Targets/Generic/Parsers.py

        new_output_node_aliases = input_node.get_aliases_of()
        new_output_node_aliases.append(input_node.name)

-        # Add new aliases
+        # Add new aliases to output node
        output_node.add_aliases(aliases_to_add = new_output_node_aliases)

+        # Add output node as alias to its aliases (alias relationship is symmetric)
+        for alias in input_node.get_aliases_of():
+            alias_node = ctxt.lookup(alias)
+            alias_node.add_aliases(aliases_to_add = [
+                output_node.name,
+            ])


⚠️ Potential issue | 🟠 Major

Alias symmetry currently skips the source tensor

The second loop walks input_node.get_aliases_of(), which omits the source buffer itself. As a result the input buffer never learns about output_node, so aliasing remains asymmetric even though the new comment says otherwise. Include the input buffer when propagating aliases so both directions stay in sync.

A minimal fix is to iterate new_output_node_aliases (which already contains the input name) when updating the other buffers:

- for alias in input_node.get_aliases_of(): + for alias in new_output_node_aliases: alias_node = ctxt.lookup(alias) alias_node.add_aliases(aliases_to_add = [ output_node.name, ])

🤖 Prompt for AI Agents

In Deeploy/Targets/Generic/Parsers.py around lines 1044 to 1055, the loop that propagates alias symmetry iterates input_node.get_aliases_of(), which omits the input node itself and leaves the input buffer unaware of the new output alias; change the propagation to iterate new_output_node_aliases (which already includes input_node.name) so every alias, including the source, is updated to add output_node.name via ctxt.lookup(...) and add_aliases(...). Ensure you keep the existing add_aliases call and the symmetric update for each alias returned by new_output_node_aliases.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b2ff5 and 7cba0cf.

📒 Files selected for processing (2)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
Deeploy/Targets/Generic/Parsers.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Deeploy/Targets/Generic/Parsers.py

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-24T11:56:35.781Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

📚 Learning: 2025-09-24T12:49:17.889Z

Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

🧬 Code graph analysis (1)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)

contextagnostic (285-298)

ReplaceSequentialPatternPass (265-282)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)

GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
GitHub Check: linting
GitHub Check: mempool-models / test-runner-mempool
GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
GitHub Check: softhier-kernels / test-runner-softhier
GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
GitHub Check: cortexm-models / test-runner-cortexm
GitHub Check: cortexm-kernels / test-runner-cortexm
GitHub Check: snitch-kernels / test-runner-snitch
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)

🔇 Additional comments (4)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

250-250: LGTM! Improved group attribute handling.

Using .get('group', 1) properly handles the ONNX Conv default and is more robust than explicit attribute checking.

370-373: LGTM! Correctly handles FP DW Conv input transpose.

The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.

380-426: LGTM! Well-structured pass classes with clear separation of concerns.

The refactoring of PULPDWConvPass improves readability with explicit comments and sections. The new PULPFPDWConvPass class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function via partial().

505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.

The placement after PULPDWConvPass and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

diaconuccalin added 17 commits July 29, 2025 14:00

Conv2D Bias Adaptation

63675c6

Adde PULPOpen support for Conv2D and partially working DW Conv2D. Fix…

549a052

…ed closure naming issue

DW 2D Float Conv for PULPOpen platform now working. Updated im2col bu…

b32a357

…ffer size computation. Added fixes to remove warnings

Optimized the PULPOpen DW 2D fp32 Convolution and fixed the bias vers…

340d058

…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa

Updated float reshape with skip connection test to a smaller one

00da44e

Fixed generic platform alias_of bug

a7f0fc0

Fixed the PULPOpen FloatGemmTemplate (identical issue to the generic …

10b22ae

…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings

Working TinyViT Demo test. Added it to the CI pipeline. Added float s…

58a01ab

…lice and reduce mean bindings for Siracusa.

Added GEMM batched fix to MatMul template

77b055f

Fixed formatting

3079743

Fixes to avoid warnings

ff9a903

Merge remote-tracking branch 'origin/devel' into TinyViT_Siracusa

ad13fe2

Fix formatting

3a17aa4

Merge fix

8f3f74d

Dynamic buffer calculation fix. Other fixes

63f122d

Reformat

f012b1a

Added back CI tests removed by merge

de2752b

diaconuccalin self-assigned this Sep 24, 2025

diaconuccalin requested review from Victor-Jung, Xeratec and lukamac as code owners September 24, 2025 10:58

diaconuccalin added this to Deeploy Sep 24, 2025

diaconuccalin added the Feature label Sep 24, 2025

Updated changelog file

11d5293

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

Applied fixes suggested in the PR review

ce7e4c8

Xeratec added this to the Release 0.2.1 milestone Sep 29, 2025

Xeratec added the Bug Something isn't working label Sep 29, 2025

Xeratec moved this to Need Reviewer in Deeploy Sep 29, 2025

Xeratec moved this from Need Reviewer to In review in Deeploy Sep 29, 2025

Merge branch 'devel' into TinyViT_Siracusa

4165983

Signed-off-by: Călin Diaconu <[email protected]>

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

Deeploy/DeeployTypes.py Outdated Show resolved Hide resolved

Post-merge fixes

af78d75

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

diaconuccalin added 2 commits October 13, 2025 10:54

Quickfix

bc2119e

Merge branch 'pulp-platform:devel' into TinyViT_Siracusa

4db4978

lukamac reviewed Oct 14, 2025

View reviewed changes

PR fixes

48b2ff5

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

Addressed PR review. Minor fix for aliasing in reshape parser

7cba0cf

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py Outdated Show resolved Hide resolved

Minor fix based on PR review

da162b0

TinyViT on non-tiled Siracusa #117

Are you sure you want to change the base?

TinyViT on non-tiled Siracusa #117

Uh oh!

Conversation

diaconuccalin commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

lukamac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukamac Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diaconuccalin commented Sep 24, 2025 •

edited

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading

coderabbitai bot Oct 13, 2025 •

edited

Loading

lukamac Oct 14, 2025 •

edited

Loading

diaconuccalin Oct 17, 2025 •

edited

Loading