Skip to content

Conversation

@diaconuccalin
Copy link
Contributor

@diaconuccalin diaconuccalin commented Sep 24, 2025

This PR brings the changes required for a working minimal TinyViT on the Siracusa platform, without tiling.

Added

  • PULP 2D FP DW conv Im2Col template and kernel, with bias support.
  • Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
  • PULP FP DW conv 2D parser.
  • FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
  • FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
  • FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
  • RemoveEmptyConvBiasPass to the PULP optimizer.

Changed

  • Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.

Fixed

  • Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
  • Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
  • Data types in the PULPOpen FP add and mul templates.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR reviewed and approved.
  3. All checks are passing.
  4. The CHANGELOG.md file has been updated.
  5. If the docker was modified, change back its link after review.

@diaconuccalin diaconuccalin self-assigned this Sep 24, 2025
@diaconuccalin diaconuccalin added the Feature Addition of new features label Sep 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added depthwise convolution support for floating-point operations
    • Added bias parameter support for convolution operations
    • Added slice operation support for floating-point types
    • Added reduce mean operation support
    • Introduced configurable multi-core processing capability
  • Bug Fixes

    • Fixed batched element handling in floating-point GEMM and matmul templates
    • Corrected data type precision in floating-point arithmetic operations
  • Tests

    • Expanded CI pipeline with depthwise convolution, reshape, and neural network model tests

Walkthrough

Adds FP depthwise 2D conv (Im2Col) and bias support across templates, kernels, parsers, bindings, and platform mappings; propagates configurable core counts to PULP cluster engine and deployer; refactors engine-selection helpers (_mapNode -> _selectEngine); fixes datatype/templating issues; and extends Siracusa CI with new tests. (≈45 words)

Changes

Cohort / File(s) Summary
CI workflow
\.github/workflows/ci-platform-siracusa.yml
Added Siracusa test variants for FP Conv (regular & DW), bias/no-bias cases, reshape-with-skip, and TinyViT model.
Engine selection refactor
Deeploy/DeeployTypes.py, Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py, Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py
Renamed _mapNode_selectEngine, switched typing to DeploymentEngine, adjusted call sites to return engine object.
Closure naming
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
Prefix closureName with underscore to avoid invalid starting characters.
DW Conv templates & kernels
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py, Deeploy/Targets/PULPOpen/Templates/*, TargetLibraries/PULPOpen/inc/kernel/Conv.h, TargetLibraries/PULPOpen/src/Convolution_fp32.c, TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
Added PULP2DFloatDWConvIm2ColTemplate, dynamic transient buffer sizing, extended Conv APIs with pSrcBias/has_bias, implemented PULP_DW_Conv2d_Im2Col_fp32_... kernel and bias-aware per-core logic.
Parsers, bindings, platform mappings
Deeploy/Targets/PULPOpen/Parsers.py, Deeploy/Targets/PULPOpen/Bindings.py, Deeploy/Targets/PULPOpen/Platform.py
Added PULPFPDWConv2DParser, PULPFloatDWConv2DBindings, PULPSliceBindings; introduced DMASliceMapper/SliceMapper/FPDWConv2DMapper; updated PULPMapping entries and added RemoveEmptyConvBiasPass.
Slice & ReduceMean extensions
Deeploy/Targets/PULPOpen/Bindings.py, Deeploy/Targets/PULPOpen/Platform.py
Added PULPSliceBindings; switched DMASlice binding to DMASliceTemplate; extended PULPReduceMeanBindings for float+integer combos.
Reshape template changes
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py
Now inherits from _GenericReshapeTemplate; uses VariableBuffer lookup and aliasing (bufferOut._alias) instead of dead-marking indices.
Template type & logic fixes
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py, FloatMulTemplate.py, FloatGemmTemplate.py, FloatMatMulTemplate.py
Adjusted integer widths/casts to unsigned types, fixed MIN() casts, added batched guards for GEMM pointer updates, and minor formatting.
Data types tuple fix
Deeploy/CommonExtensions/DataTypes.py
Fixed IntegerDataTypes to be a flat tuple(sorted(...)) rather than a nested one-element tuple.
Deployer & cluster engine core propagation
Deeploy/Targets/PULPOpen/Deployer.py, DeeployTest/testMVP.py, DeeployTest/generateNetwork.py, DeeployTest/testUtils/testRunner.py
Added annotateNCores() and n_cores propagation to operator contexts; added --cores CLI flag and wiring; TestRunner appends --cores for Siracusa platforms.
CHANGELOG
CHANGELOG.md
Added entries describing DW conv/im2col, bias support, bindings/mappers, tests, type fixes, and other changes.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Test Runner / Caller
    participant Deployer as PULPDeployer
    participant Engine as PULPClusterEngine
    participant OpRep as OperatorRepresentation

    CLI->>Deployer: invoke bind / generate with --cores
    Deployer->>Engine: query/annotate n_cores
    Engine-->>Deployer: n_cores value
    Deployer->>OpRep: set n_cores in operator contexts
    Deployer->>Deployer: super().bind()  -- uses annotated contexts
Loading
sequenceDiagram
    participant Caller as Inference/Host
    participant DWKernel as PULP_DW_Conv2d_Im2Col_fp32
    participant Context as pContextBuffer / im2col
    participant Core as per-core compute

    Caller->>DWKernel: call(pSrcA, pSrcB, pSrcBias, has_bias, pDstC, pContextBuffer, pads...)
    DWKernel->>Core: partition outputs by core (ch_out_start/ch_out_stop)
    Core->>Context: allocate/use per-core im2col buffer
    Core->>Core: compute im2col, multiply-accumulate
    alt has_bias
        Core->>Core: add bias to outputs
    end
    Core->>Caller: write pDstC results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Areas to focus on:

  • DW convolution kernel and per-core partitioning (TargetLibraries/.../DWConvolution_fp32.c, Convolution_fp32.c).
  • Engine-selection refactor consistency and call-site updates (DeeployTypes.py, NetworkDeployerWrapper.py, EngineColoringDeployer.py).
  • Template buffer hoisting and transient sizing propagation (FloatConvTemplate.py).
  • Bindings/parsers alignment with kernel signatures (Bindings.py, Parsers.py) and RemoveEmptyConvBiasPass integration.
  • CLI/core propagation and deployer annotateNCores correctness (Deployer.py, testMVP.py, generateNetwork.py, testRunner.py).

Possibly related PRs

Suggested reviewers

  • Victor-Jung
  • Xeratec

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.08% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "TinyViT on non-tiled Siracusa" clearly captures the primary objective and outcome of this PR. The raw_summary confirms that the changeset comprises infrastructure changes (depthwise convolution support, bias handling, parsers, bindings, templates, bug fixes, and CI pipeline updates) all designed to enable running TinyViT on the Siracusa platform without tiling. The title is specific and informative—it identifies the model (TinyViT), platform (Siracusa), and deployment variant (non-tiled)—allowing a teammate scanning git history to quickly understand the PR's purpose without needing additional detail.
Description Check ✅ Passed The PR description is directly related to the changeset. It clearly states the objective ("changes required for a working minimal TinyViT on the Siracusa platform, without tiling") and then provides organized sections detailing what was added, changed, and fixed. The raw_summary confirms that all items listed in the description (DW convolution support, bias handling, new parsers and bindings, bug fixes in templates and closures, CI pipeline updates) are accurately represented in the actual code changes. The description is sufficiently detailed and purposeful.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49cd3b1 and dd40418.

📒 Files selected for processing (1)
  • DeeployTest/testUtils/testRunner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • DeeployTest/testUtils/testRunner.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)

344-346: Prevent KeyError when 'group' attr is absent

Depthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.

-    if opNode.attrs['group'] == 1:
-        return graph
+    group = opNode.attrs.get('group', 1)
+    if group == 1:
+        return graph
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)

16-23: Loop index type should match buffer indexing

Use unsigned 32-bit for i to match array indexing and avoid UB on large sizes.

(Handled in the diff above.)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

131-149: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct

  • Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
  • Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
  • (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-11: Avoid libm at runtime: use integer log2 for NUM_CORES

Replace double log2() with __builtin_ctz() to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.

Apply this diff:

-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two");
+#endif
+uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);
.github/workflows/ci-platform-siracusa.yml (1)

98-98: TinyViT demo test: validate runtime envelope on Siracusa

Great to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.

Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.

This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.

Deeploy/DeeployTypes.py (3)

320-324: Prefer None over mutable default and let ctor initialize alias_of.

Passing alias_of = [] creates a new list here (safe), but use None consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            alias_of = None,
+        )

469-474: Avoid sharing alias list between buffers; copy it.

Passing buffer.alias_of by reference couples future mutations between the source and new ConstantBuffer.

-        ret = cls(
+        ret = cls(
             name = buffer.name,
             shape = buffer.shape,
             values = values,
-            alias_of = buffer.alias_of,
+            alias_of = list(buffer.alias_of),
         )

1176-1180: Pass alias_of=None and let the ctor initialize.

Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    alias_of = None,
+                )
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)

137-141: Remove unnecessary modulo and precompute local weight pointer inside inner loop

im2col_idx % (P * Q) is redundant. Precompute a per-filter base pointer for tighter inner loops.

Apply this diff in both bias and non-bias paths:

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum +=
-                  im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q);
+            for (uint32_t k = 0; k < P * Q; k++) {
+              sum += im2col_buffer[k] * local_w[k];
+            }

Also applies to: 236-240


47-48: Remove unused variable kernel_size

kernel_size is computed but never used.

-  uint32_t kernel_size = P * Q * F_total;

61-97: Consider simplifying padding handling with a single p/q nested loop

You can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

20-27: Document pContextBuffer size expectations and bias semantics

The new signatures look correct. Please add brief comments indicating required pContextBuffer sizes:

  • Conv2d_Im2Col: NUM_CORES * (CPQ) floats
  • DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
    Also note that has_bias governs whether pSrcBias is read.

Also applies to: 29-36

Deeploy/Targets/PULPOpen/Bindings.py (2)

239-245: Constrain FP DWConv binding to supported float type(s).

FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.

Apply this diff:

-        ForkTransformer) for float_type in FloatDataTypes
+        ForkTransformer) for float_type in [float32_t]

299-302: ReduceMean (float): restrict axes integer type to int32.

ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.

Apply this diff:

-    for integer_type in SignedIntegerDataTypes
-    for float_type in FloatDataTypes
+    for integer_type in [int32_t]
+    for float_type in FloatDataTypes
Deeploy/Targets/PULPOpen/Platform.py (1)

108-108: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.

If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.

Apply this diff to prioritize DWConv first:

-    'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]),
+    'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),

Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

42-51: Fix return typing and silence ARG004 (unused ctxt)

  • Return type now returns a str for size; widen the annotation to reflect this.
  • Rename ctxt to _ctxt to address Ruff ARG004.

Apply:

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(_ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:

93-93: Typo in comment: “ChannelOout” → “ChannelOut”

Minor doc polish.

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

124-126: Typo in comment: “ChannelOout” → “ChannelOut”

Same typo in DW template header.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fa3c31 and de2752b.

📒 Files selected for processing (17)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2 hunks)
  • Deeploy/DeeployTypes.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • TargetLibraries/PULPOpen/inc/DeeployPULPMath.h (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • add_aliases (326-345)
  • lookup (763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (82-224)
  • computeTransientBuffersSize (137-157)
  • NetworkContext (554-1063)
  • hoistTransientBuffers (160-180)
  • hoistTransientBuffer (864-883)
  • lookup (763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (667-679)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (122-188)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1695-1838)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2492-2509)
  • lookup (763-795)
  • parseNode (1076-1091)
  • parseNodeCtxt (1094-1119)
  • NetworkContext (554-1063)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1182-1248)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

43-43: Unused static method argument: ctxt

(ARG004)

Deeploy/DeeployTypes.py

433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


433-433: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: unsigned arithmetic for start/end bounds

Casting ${size} to uint32_t aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistently uint32_t.

.github/workflows/ci-platform-siracusa.yml (2)

75-75: Reshape + skip connection test: ensure alias_of fix is covered

Good addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.


61-64: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo

.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.

Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)

27-37: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.

Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).


39-39: Unconditional out stride can desync when data_out aliases C but C is not batched.

If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:

  • enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
  • gate the out advance with an explicit data_out_batched flag.

Example if such a flag is available:

-    ref_${data_out}_${data_out} += ${M} * ${O};
+    % if data_out_batched:
+    ref_${data_out}_${data_out} += ${M} * ${O};
+    % endif
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

370-374: Verify FP DW Conv layout: input-only transpose may be insufficient

For FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.


385-403: New PULPDWConvPass wiring looks good

Pattern graph + partial replacement is clear and consistent with the pipeline.


404-426: New PULPFPDWConvPass wiring looks good

Extends NHWC conversion to FP DW conv as intended.


505-505: Pass ordering LGTM

Adding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

41-43: Confirm context buffer sizing per core

This implementation expects pContextBuffer sized to NUM_CORES * (P*Q) floats. Please confirm templates/allocations provide that.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-16: LGTM on signature extension and core partitioning

Bias handling and per-core channel slicing are consistent. Weight and output indexing are correct.

Please ensure all call sites updated to pass pSrcBias, has_bias, and pContextBuffer (for the Im2Col variant).

Also applies to: 18-39


139-176: Good reuse of im2col buffer and contiguous weight blocks

The inner product uses a contiguous kernel_size slice, matching the buffer layout. Looks correct.

Also applies to: 179-218

Deeploy/Targets/PULPOpen/Parsers.py (3)

104-116: Bias handling looks correct; maps optional third input and flags has_bias

Parsing and context mapping are consistent with the new kernels.

Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.


127-159: DW FP2D parser shape/padding checks look reasonable

Depthwise gating via inputs count and padding symmetry is fine.


185-186: Depthwise condition check aligns with ONNX semantics

group == ch_im_in is the expected DW condition.

Deeploy/Targets/PULPOpen/Bindings.py (2)

143-153: DMA Slice binding switch looks good.

Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.


155-165: Float Slice mapping: validate starts/ends/axes/steps typing.

Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.

Deeploy/Targets/PULPOpen/Platform.py (4)

77-79: FP DWConv mapper wiring: LGTM.

Parser and bindings are correctly introduced and exposed.


92-95: Two Slice mappers ordering: LGTM; confirm fallback behavior.

Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.


8-9: RemoveEmptyConvBiasPass import: LGTM.

Consistent with optimizer usage below.


235-236: Optimizer: adding RemoveEmptyConvBiasPass is sensible.

Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

60-62: Good: transient buffer typed to input element

Ensures the generated C sees the correct pointee type instead of void. This helps alloc/codegen correctness.


81-81: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWC

TargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.

@Xeratec Xeratec added this to the Release 0.2.1 milestone Sep 29, 2025
@Xeratec Xeratec added the Bug Something isn't working label Sep 29, 2025
@Xeratec Xeratec moved this to Need Reviewer in Deeploy Sep 29, 2025
@Xeratec Xeratec moved this from Need Reviewer to In review in Deeploy Sep 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)

241-241: Fix remaining mutable default arguments.

VariableBuffer.__init__ still has mutable defaults (shape = [1], alias_of: Optional[List[str]] = []). These should use immutable defaults (tuples or None) to follow Python best practices and avoid shared mutable state across calls.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):

The defensive copy on line 260 already handles the None case correctly, so no further changes are needed there.

Based on past review feedback and static analysis (Ruff B006).

🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)

454-454: Optional: Avoid redundant defensive copy.

Since VariableBuffer.__init__ already defensively copies alias_of (line 260), passing list(alias_of) here results in double copying. You can simplify by passing the list directly:

-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+        super().__init__(name, shape, alias_of)

This is a minor optimization and doesn't affect correctness.


490-495: Optional: Make alias propagation more explicit.

While correct (the constructor defensively copies alias_of), explicitly copying here would make the intent clearer:

     ret = cls(
         name = buffer.name,
         shape = buffer.shape,
         values = values,
-        alias_of = buffer.alias_of,
+        alias_of = list(buffer.alias_of),  # Explicit copy for clarity
     )

This is a minor clarity improvement and doesn't affect correctness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce7e4c8 and 4165983.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

325-329: LGTM!

Correctly initializes alias_of with an explicit empty list for each new instance, avoiding shared mutable state.


362-362: LGTM!

Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.


404-406: LGTM!

The updates to support size as Union[int, str] are consistent across TransientBuffer.__init__, _bufferRepresentation, and hoistTransientBuffer. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.

Also applies to: 431-431, 888-888


1200-1204: LGTM!

Correctly initializes non-global outputs with an explicit empty alias_of list, maintaining consistency with the alias tracking implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)

241-261: Fix mutable defaults in VariableBuffer.init (shape/alias_of)

Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.

Apply:

-    def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):
         self.name: str = name  #: str: Canonical name that this buffer is registered as in the NetworkContext
-        self.shape: Sequence[
-            int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
+        self.shape: Sequence[int] = shape  #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes
@@
-        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []
+        self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

As per Ruff B006.


452-456: Fix mutable defaults in ConstantBuffer.init (shape/values)

Use immutable defaults and forward alias_of safely.

Apply:

-    def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None):
-        # Pass a copy of alias_of to avoid shared references
-        super().__init__(name, shape, list(alias_of) if alias_of is not None else None)
+    def __init__(self,
+                 name: str = '',
+                 shape: Sequence[int] = (1,),
+                 values: Union[Sequence, np.ndarray] = (0,),
+                 alias_of: Optional[List[str]] = None):
+        super().__init__(name, shape, alias_of)

(values processing via np.asarray below remains correct.)

🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

21-24: Im2Col buffer bytes should derive from input element width (not weight).

The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.

Apply:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
-            'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \
+                     * ctxt.n_cores \
+                     * operatorRepresentation['ch_im_in'] \
+                     * operatorRepresentation['dim_kernel_x'] \
+                     * operatorRepresentation['dim_kernel_y']

43-51: Fix return type annotation and stale comment; ensure consistent sizing semantics.

  • Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
  • Comment says “value is only used as string”, but an int is produced. Update comment and/or value.

Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
@@
-        # Memory allocation for the im2col buffer can be dynamic, based on the number of cores.
-        # WARNING: This works because value is only used as string, in the allocate template.
-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS.
+        # Byte size is derived from the buffer's referenced type at allocation/codegen time.
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']

Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.


124-154: DW Im2Col call wiring is consistent; minor docstring typo.

The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
Deeploy/DeeployTypes.py (1)

2708-2714: Preserve context name when recreating NetworkContext during parse

Recreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.

Apply:

-        self.ctxt = NetworkContext(
+        self.ctxt = NetworkContext(
             variableBuffer = self.Platform.VariableBuffer,
             constantBuffer = self.Platform.ConstantBuffer,
             structBuffer = self.Platform.StructBuffer,
             transientBuffer = self.Platform.TransientBuffer,
-            n_cores = self.ctxt.n_cores,
+            n_cores = self.ctxt.n_cores,
+            name = self.name,
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4165983 and af78d75.

📒 Files selected for processing (9)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3256-3633)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2451-2494)
  • TopologyOptimizer (2249-2278)
  • NetworkDeployer (3256-3633)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (575-1090)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (891-910)
  • lookup (790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py

[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?


[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max

🪛 Ruff (0.13.3)
DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/DeeployTypes.py

452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


452-452: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: linting
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: Test Type Inference (fail-input2)
  • GitHub Check: Test Type Inference (pass)
  • GitHub Check: Test Type Inference (fail-input0)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

81-84: Bias/has_bias plumbing for direct conv looks good.

Bias flags are correctly passed to the FP Conv2d HWC call.


100-117: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.

The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.

If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.

Deeploy/TilingExtension/TilerExtension.py (1)

987-989: LGTM: propagate core count to context

Assigning ctxt.n_cores ensures downstream kernels see the configured core count.

DeeployTest/testRunner_tiled_siracusa.py (1)

20-26: LGTM: cores threaded into TestRunner and CMake

Passing args.cores to TestRunner and exporting NUM_CORES is correct.

Also applies to: 28-28

DeeployTest/testUtils/platformMapping.py (1)

92-104: LGTM: add n_cores and forward to PULPDeployer

Signature and forwarding align with deployer changes.

Also applies to: 214-224

DeeployTest/testMVP.py (1)

57-60: LGTM: introduce and propagate n_cores

CLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.

Also applies to: 84-93, 201-206, 261-263

Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)

deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.

Copy link
Collaborator

@lukamac lukamac Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)

31-53: Avoid the shared mutable default for inputOffsets

Keeping inputOffsets = {} means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safe None default and materialize a new dict per instance before calling super().

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
+        if inputOffsets is None:
+            inputOffsets = {}
+
         super().__init__(
             graph = graph,
             deploymentPlatform = deploymentPlatform,
             inputTypes = inputTypes,
             loweringOptimizer = loweringOptimizer,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4db4978 and 48b2ff5.

📒 Files selected for processing (5)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/DeeployTypes.py (15 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • add_aliases (331-350)
  • get_aliases_of (352-364)
  • lookup (788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2579-2596)
  • lookup (788-820)
  • parseNode (1101-1116)
  • parseNodeCtxt (1119-1144)
  • NetworkContext (574-1088)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1204-1270)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-251)
  • parseNode (291-304)
  • parseNode (326-338)
  • parseNode (375-387)
  • parseNode (422-426)
  • parseNode (449-460)
  • parseNode (484-495)
  • parseNode (531-535)
  • parseNode (551-554)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (253-283)
  • parseNodeCtxt (306-318)
  • parseNodeCtxt (340-367)
  • parseNodeCtxt (389-414)
  • parseNodeCtxt (428-441)
  • parseNodeCtxt (462-476)
  • parseNodeCtxt (497-523)
  • parseNodeCtxt (537-543)
  • parseNodeCtxt (556-573)
  • parseNodeCtxt (586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b2ff5 and 7cba0cf.

📒 Files selected for processing (2)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
  • contextagnostic (285-298)
  • ReplaceSequentialPatternPass (265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
  • GitHub Check: linting
  • GitHub Check: mempool-models / test-runner-mempool
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: softhier-kernels / test-runner-softhier
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
  • GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)

250-250: LGTM! Improved group attribute handling.

Using .get('group', 1) properly handles the ONNX Conv default and is more robust than explicit attribute checking.


370-373: LGTM! Correctly handles FP DW Conv input transpose.

The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.


380-426: LGTM! Well-structured pass classes with clear separation of concerns.

The refactoring of PULPDWConvPass improves readability with explicit comments and sections. The new PULPFPDWConvPass class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function via partial().


505-505: LGTM! Correctly integrates the FP DW Conv pass into the sequence.

The placement after PULPDWConvPass and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)

105-107: Compute NSize based on transA.

Current code assumes no transpose; wrong K/N leads to bad tiles.

-        NSize = buffA.shape[-1]
+        NSize = buffA.shape[-1] if transA == 0 else buffA.shape[-2]

215-223: Critical undefined identifiers remain in IntGEMMTileConstraint.serializeTilingSolution.

The verification reveals multiple undefined variables that will cause runtime errors:

  • Line 132: BSize used but never defined (should be BatchSize)
  • Lines 151, 155, 167, 171, 370, 374, 386, 390, 405, 407: BatchOffset used but never defined
  • Lines 156, 172, 375, 391, 407: BOffset used but never defined

In contrast, FloatGEMMTileConstraint.serializeTilingSolution correctly uses BatchSize at line 341, suggesting the IntGEMMTileConstraint version requires similar fixes. The undefined BatchOffset and BOffset offsets need to be extracted from cube.offset similar to how MOffset and OOffset are extracted at line 119.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

22-28: Avoid 16-bit overflow in channel partitioning.

Use 32-bit for ch_out_* to support large F_total; identical change recommended in both functions.

Apply:

-  uint16_t ch_out_chunk =
+  uint32_t ch_out_chunk =
       (F_total >> log2Core) + ((F_total & (NUM_CORES - 1)) != 0);
-  uint16_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
-  uint16_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
-  uint16_t ch_out_count = ch_out_stop - ch_out_start;
+  uint32_t ch_out_start = MIN(ch_out_chunk * core_id, F_total);
+  uint32_t ch_out_stop = MIN(ch_out_start + ch_out_chunk, F_total);
+  uint32_t ch_out_count = ch_out_stop - ch_out_start;

Repeat the same replacement in PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC.

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)

119-125: Mutable default list for passes.

Use None and initialize to avoid shared state across instances.

Apply:

-                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] = []):
+                 memoryLevelAnnotationPasses: List[NetworkOptimizationPass] | None = None):
@@
-        if len(memoryLevelAnnotationPasses) == 0:
+        if not memoryLevelAnnotationPasses:
             memoryLevelAnnotationPasses.append(AnnotateDefaultMemoryLevel(self.Platform.memoryHierarchy))
         self.memoryLevelAnnotationOptimizer = NetworkOptimizer(memoryLevelAnnotationPasses)

Repeat the same pattern in MemoryLevelAwareSignPropDeployer.init and MemoryDeployerWrapper.init.

♻️ Duplicate comments (5)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: n_cores addition looks reasonable; mutable default still present.

The n_cores parameter addition aligns with the PR's goal of supporting dynamic im2col buffer sizing. However, two observations:

  1. The mutable default inputOffsets: Dict[str, int] = {} remains unfixed from the previous review.
  2. The n_cores parameter lacks documentation and validation (e.g., ensuring it's positive).

Consider adding a docstring or inline comment explaining the n_cores parameter's purpose and optionally validate it:

                 inputOffsets: Dict[str, int] = {},
-                 n_cores: int = 8):
+                 n_cores: int = 8):  # Number of cores for im2col buffer sizing
+        if n_cores <= 0:
+            raise ValueError(f"n_cores must be positive, got {n_cores}")

For the mutable default issue, refer to the previous review comment suggesting inputOffsets: Optional[Dict[str, int]] = None.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

43-51: Fix DW Im2Col buffer sizing: remove typeWidth multiplier and correct return type annotation

In PULP2DFloatDWConvIm2ColTemplate.computeTransientBuffersSize() (lines 43–51), the size computation multiplies by typeWidth // 8 to produce byte count, but hoistTransientBuffers() (lines 59–61) then assigns the buffer a concrete type. The allocator subsequently multiplies by sizeof(type), causing double-accounting on tight L2 budgets. Additionally, the return type annotation incorrectly declares str instead of Union[int, IntVar].

Align with the existing PULP2DFloatConvIm2ColTemplate pattern (lines 18–24): compute element count (no typeWidth division) and let type-setting handle the byte multiplication.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(ctxt: NetworkContext,
+                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:

-        im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8
-                     ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
+        im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

61-97: Critical: signed/unsigned mix in im2col padding loops leads to out‑of‑bounds indexing

h_in/w_in are uint32_t while h_in_start/w_in_start are int32_t; expressions like (h_in - h_in_start) and (w_in - w_in_start) perform unsigned arithmetic when mixed, producing incorrect large indices. Use int32_t consistently and cast bounds.

Apply (pattern applies similarly to LEFT/RIGHT blocks):

-        for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) {
+        for (int32_t h_in = MAX((int32_t)H, h_in_start); h_in < (int32_t)(h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < (int32_t)(w_in_start + Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
           for (int32_t w_in = (int32_t)w_in_start;
                w_in < MIN(0, (int32_t)(w_in_start + Q)); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }
-        for (uint32_t h_in = MAX(0, h_in_start); h_in < MIN(H, h_in_start + P); h_in++) {
+        for (int32_t h_in = MAX(0, h_in_start); h_in < MIN((int32_t)H, h_in_start + (int32_t)P); h_in++) {
-          for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) {
+          for (int32_t w_in = MAX((int32_t)W, w_in_start); w_in < (int32_t)(w_in_start + (int32_t)Q); w_in++) {
             im2col_buffer[(h_in - h_in_start) * Q + (w_in - w_in_start)] = 0.0f;
           }
         }

Repeat the same signed fixes in the no‑bias branch (lines 170‑195).

Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets.

Switch to Optional and init when calling super.

Apply:

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 deeployStateDir: str = "DeeployStateDir",
-                 inputOffsets = {},
+                 deeployStateDir: str = "DeeployStateDir",
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
         super().__init__(
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets if inputOffsets is not None else {},
             n_cores = n_cores,
         )

If SignPropDeployer already accepts Optional[Dict[str, int]], you can pass it through directly and drop the or {}.

DeeployTest/testUtils/testRunner.py (1)

349-356: Remove duplicate command construction; keep single source of truth.

The first assignment is redundant; append -n_cores via self.n_cores once.

Apply:

-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
-        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+        command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
         if self._tiling is True:
             command += f" -n_cores {self.n_cores}"
         command += f" {self.gen_args}"
🧹 Nitpick comments (15)
CHANGELOG.md (1)

26-32: Vary sentence structure in Added section to improve readability.

The static analysis tool flagged repetitive sentence beginnings ("FP PULP DW conv", "PULP FP DW conv", "FP bindings"). While the entries are accurate and comprehensive, consider rephrasing for better variety. For example:

- PULP 2D FP DW conv Im2Col template and kernel, with bias support.
- Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
- PULP FP DW conv 2D parser.
- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
- FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
+ 2D FP depthwise convolution Im2Col template, kernel, and parser for PULP, with bias support.
+ Bias support for PULP 2D FP regular convolution Im2Col template and kernel.
+ FP convolution 2D (regular and depthwise), reshape, skip connection, and TinyViT demo tests in the non-tiled Siracusa CI pipeline.
+ FP bindings and mappings: PULP slice, depthwise convolution 2D, and reduce mean operations.
+ Depthwise convolution lowering optimization pass (FP variant) for PULP, analogous to the integer version.

This maintains factual accuracy while improving stylistic variety.

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1)

519-526: Tighten schedule pairing with zip(strict=True)

Ensure 1:1 correspondence between base input loads and requant cubes.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1)

515-517: Add zip(strict=True) to guard schedule alignment

Match counts between input and requant loads for WMEM DW RQS.

-        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes)]
+        } for schedule, requant in zip(tilingSchedule.inputLoadSchedule, inputRequantCubes, strict=True)]
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1)

295-299: Strengthen pairing with zip(strict=True) in PW RQS

Guarantee equal lengths when combining base and requant schedules.

-        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule)]
+        } for load, rqLoad in zip(tilingSchedule.inputLoadSchedule, requantSchedule, strict=True)]
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

10-14: Avoid FP log2; use bit ops and assert power-of-two NUM_CORES

log2() pulls in libm and risks FP rounding; use CTZ and add a compile-time check. Also removes any need for <math.h>/-lm.

-uint32_t ${nodeName}_core_id = pi_core_id();
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES);
+uint32_t ${nodeName}_core_id = pi_core_id();
+_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be a power of two");
+const uint32_t ${nodeName}_log2Core = __builtin_ctz(NUM_CORES);
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)

93-93: Fix typo in comment: “ChannelOout” → “ChannelOut”

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

125-154: DW Im2Col call: argument order LGTM; minor comment typo

-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

Based on learnings.


46-51: Clarify misleading comment about size-as-string

The value is currently computed as int, not string; please update the comment to avoid confusion.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

181-185: Use wider unsigned types for tiling sizes for consistency and headroom.

Dims rarely fit in int8; GEMM uses uint16_t/uint8_t. Align MatMul for consistency and to avoid overflow.

Apply:

-from Deeploy.CommonExtensions.DataTypes import int8_t
+from Deeploy.CommonExtensions.DataTypes import uint8_t, uint16_t
...
-        replacementTypes = {
-            "M": PointerClass(int8_t),
-            "N": PointerClass(int8_t),
-            "O": PointerClass(int8_t),
-            "batch": PointerClass(int8_t)
-        }
+        replacementTypes = {
+            "M": PointerClass(uint16_t),
+            "N": PointerClass(uint16_t),
+            "O": PointerClass(uint16_t),
+            "batch": PointerClass(uint8_t),
+        }

If other MatMul code relies on int8_t here, confirm and we can gate this by target or keep uint16_t only for sizes ≥ 256.

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)

83-87: Avoid assigning unused variable.

modulus is unused. Either drop assignment or use _ =.

-            modulus = tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
+            tilerModel.addTileSizeDivisibleConstraint(parseDict, 'O', BSecondDimVar, 16, prefix = "16_")
Deeploy/Targets/Generic/Parsers.py (1)

1071-1101: Reshape aliasing looks good; mark constant shape as non-deploy and use cached alias list

  • Alias flow is correct and symmetric; iterate the cached list to avoid relying on a second lookup.
  • If the shape input is a constant, mark it non‑live/non‑deploy to avoid useless allocations (mirrors Unsqueeze).

Apply near this block:

-        new_output_node_aliases = input_node.get_aliases_of()
+        new_output_node_aliases = list(input_node.get_aliases_of())
         new_output_node_aliases.append(input_node.name)
@@
-        for alias in output_node.get_aliases_of():
+        for alias in new_output_node_aliases:
             alias_node = ctxt.lookup(alias)
             alias_node.add_aliases(aliases_to_add = [
                 output_node.name,
             ])
+
+        # If shape is a constant tensor, exclude it from deployment
+        shape_buf = ctxt.lookup(node.inputs[inputs.index("shape")].name)
+        if hasattr(shape_buf, "values"):
+            shape_buf._live = False
+            shape_buf._deploy = False
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

137-145: Minor: remove redundant modulo and consider precomputing weight base

im2col_idx runs [0, PQ), so % (PQ) is redundant. Also, precompute (f - ch_out_start)PQ outside the inner loop.

-            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
-              sum += im2col_buffer[im2col_idx] *
-                  weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)];
-            }
+            uint32_t w_base = (f - ch_out_start) * P * Q;
+            for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) {
+              sum += im2col_buffer[im2col_idx] * weight_ptr[w_base + im2col_idx];
+            }

47-51: Nit: kernel_size is unused

kernel_size is computed but never used; remove to avoid confusion.

Deeploy/DeeployTypes.py (1)

3262-3317: Redundant n_cores field in child; base already stores it.

NetworkDeployer saves n_cores; removing the extra self.n_cores avoids duplication.

Apply:

-        self.prepared = False
-        self.n_cores = n_cores
+        self.prepared = False
DeeployTest/testMVP.py (1)

57-59: Consider removing the unused verbose parameter.

The verbose parameter is declared but never used in the function body. While the n_cores parameter addition looks good and aligns with the PR objectives, consider removing verbose to clean up the signature.

Apply this diff:

 def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel,
-                  defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace,
-                  n_cores: int) -> Tuple[NetworkDeployer, bool]:
+                  defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace,
+                  n_cores: int) -> Tuple[NetworkDeployer, bool]:

Note: Ensure to update the call site on line 256 as well if you apply this change.

Based on learnings

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da162b0 and 85edf76.

📒 Files selected for processing (30)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (5 hunks)
  • Deeploy/DeeployTypes.py (14 hunks)
  • Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (4 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (1 hunks)
  • Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (3 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
  • DeeployTest/testUtils/platformMapping.py
  • DeeployTest/testRunner_tiled_siracusa.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.

Applied to files:

  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (14)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (587-592)
Deeploy/Targets/PULPOpen/Parsers.py (2)
  • PULPFPConv2DParser (65-120)
  • PULPFPDWConv2DParser (123-189)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1736-1892)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Bindings.py (5)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2366-2400)
  • NodeBinding (1588-1733)
  • NodeTemplate (87-229)
Deeploy/FutureExtension/Bindings/AutoFutureBinding.py (1)
  • AutoFutureBinding (13-62)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ConvChecker (456-474)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2583-2600)
  • lookup (792-824)
  • parseNode (1105-1120)
  • parseNodeCtxt (1123-1148)
  • NetworkContext (578-1092)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1246-1312)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py (2)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/TilingExtension/TilingCodegen.py (2)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/DeeployTypes.py (3)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
  • sizeInBytes (181-182)
Deeploy/AbstractDataTypes.py (2)
  • typeWidth (399-400)
  • Pointer (303-388)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
  • lookup (792-824)
  • get_aliases_of (352-364)
  • add_aliases (331-350)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (3)
  • TileConstraint (19-221)
  • serializeTilingSolution (184-221)
  • extractBaseAddr (56-74)
Deeploy/DeeployTypes.py (1)
  • lookup (792-824)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (3)
  • contextagnostic (285-298)
  • SequentialPass (201-210)
  • ReplaceSequentialPatternPass (265-282)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (1)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py (6)
Deeploy/TilingExtension/TilingCodegen.py (5)
  • TilingSchedule (53-122)
  • AbsoluteHyperRectangle (39-49)
  • VariableReplacementScheme (126-158)
  • HyperRectangle (24-35)
  • calculateFlatOffsetInBytes (239-242)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-402)
  • addTensorDimToModel (143-157)
  • getTensorDimVar (131-135)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (3)
  • uint8_t (36-39)
  • uint32_t (48-51)
  • uint16_t (42-45)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (2)
  • Conv2DTileConstraint (225-435)
  • computeInputCube (320-347)
Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py (1)
Deeploy/DeeployTypes.py (5)
  • _parseNode (2670-2677)
  • ONNXLayer (1895-2223)
  • NetworkContext (578-1092)
  • optimize (2260-2280)
  • optimize (2310-2329)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (578-1092)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (893-912)
  • lookup (792-824)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3258-3635)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
🪛 LanguageTool
CHANGELOG.md

[style] ~31-~31: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Neureka/TileConstraints/NeurekaPointwiseConstraint.py

233-233: Undefined name absoluteCube

(F821)

Deeploy/DeeployTypes.py

401-401: Redefinition of unused sizeInBytes from line 390

(F811)


447-447: Redefinition of unused sizeInBytes from line 444

(F811)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


456-456: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDenseConstraint.py

263-263: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


263-263: Undefined name inputRequantCubes

(F821)


281-281: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


289-289: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


294-294: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


317-317: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


320-320: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


325-325: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


326-326: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


327-327: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


328-328: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


331-331: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


413-413: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


413-413: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


414-414: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


426-426: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


522-522: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py

122-122: Undefined name BSize

(F821)


133-133: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


133-133: Undefined name BatchOffset

(F821)


133-133: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


136-139: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


137-137: Undefined name BatchOffset

(F821)


138-138: Undefined name BOffset

(F821)


140-143: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


142-142: Undefined name BSize

(F821)


149-149: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


149-149: Undefined name BatchOffset

(F821)


149-149: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


152-155: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


153-153: Undefined name BatchOffset

(F821)


154-154: Undefined name BOffset

(F821)


156-159: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


158-158: Undefined name BSize

(F821)

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py

305-306: Expected an indented block after if statement

(invalid-syntax)


310-310: Expected an indented block after if statement

(invalid-syntax)

Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py

132-132: Undefined name BSize

(F821)


151-151: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


151-151: Undefined name BatchOffset

(F821)


151-151: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


154-157: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


155-155: Undefined name BatchOffset

(F821)


156-156: Undefined name BOffset

(F821)


158-161: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


160-160: Undefined name BSize

(F821)


167-167: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


167-167: Undefined name BatchOffset

(F821)


167-167: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


170-173: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


171-171: Undefined name BatchOffset

(F821)


172-172: Undefined name BOffset

(F821)


174-177: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


176-176: Undefined name BSize

(F821)


370-370: Consider (BatchOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *AMatrixOffsets)

(RUF005)


370-370: Undefined name BatchOffset

(F821)


370-370: Consider (BatchSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, *AMatrixShape)

(RUF005)


373-376: Consider (BatchOffset, BOffset, *AMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *AMatrixOffsets)

(RUF005)


374-374: Undefined name BatchOffset

(F821)


375-375: Undefined name BOffset

(F821)


377-380: Consider (BatchSize, BSize, *AMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *AMatrixShape)

(RUF005)


379-379: Undefined name BSize

(F821)


386-386: Consider (BatchOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *BMatrixOffsets)

(RUF005)


386-386: Undefined name BatchOffset

(F821)


386-386: Consider (BatchSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, *BMatrixShape)

(RUF005)


389-392: Consider (BatchOffset, BOffset, *BMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *BMatrixOffsets)

(RUF005)


390-390: Undefined name BatchOffset

(F821)


391-391: Undefined name BOffset

(F821)


393-396: Consider (BatchSize, BSize, *BMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *BMatrixShape)

(RUF005)


395-395: Undefined name BSize

(F821)


405-405: Consider (BatchOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, *CMatrixOffsets)

(RUF005)


405-405: Undefined name BatchOffset

(F821)


405-405: Consider (BatchSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, *CMatrixShape)

(RUF005)


407-407: Consider (BatchOffset, BOffset, *CMatrixOffsets) instead of concatenation

Replace with (BatchOffset, BOffset, *CMatrixOffsets)

(RUF005)


407-407: Undefined name BatchOffset

(F821)


407-407: Undefined name BOffset

(F821)


407-407: Consider (BatchSize, BSize, *CMatrixShape) instead of concatenation

Replace with (BatchSize, BSize, *CMatrixShape)

(RUF005)


407-407: Undefined name BSize

(F821)

Deeploy/Targets/Neureka/TileConstraints/NeurekaDepthwiseConstraint.py

259-259: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


259-259: Undefined name inputRequantCubes

(F821)


277-277: Local variable dilation is assigned to but never used

Remove assignment to unused variable dilation

(F841)


314-314: Local variable weightBuffer is assigned to but never used

Remove assignment to unused variable weightBuffer

(F841)


317-317: Local variable inputBatchVar is assigned to but never used

Remove assignment to unused variable inputBatchVar

(F841)


320-320: Local variable inputChannelVar is assigned to but never used

Remove assignment to unused variable inputChannelVar

(F841)


322-322: Local variable outputBatchVar is assigned to but never used

Remove assignment to unused variable outputBatchVar

(F841)


323-323: Local variable outputHeightVar is assigned to but never used

Remove assignment to unused variable outputHeightVar

(F841)


324-324: Local variable outputWidthVar is assigned to but never used

Remove assignment to unused variable outputWidthVar

(F841)


325-325: Local variable outputChannelVar is assigned to but never used

Remove assignment to unused variable outputChannelVar

(F841)


328-328: Local variable padding is assigned to but never used

Remove assignment to unused variable padding

(F841)


408-408: Unpacked variable BatchOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable HOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable WOffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


408-408: Unpacked variable COffset is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


409-409: Unpacked variable BatchSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inBSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


421-421: Unpacked variable inHSize is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


517-517: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/MemoryLevelExtension/NetworkDeployers/MemoryLevelDeployer.py

131-131: Undefined name ONNXLayer

(F821)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

🔇 Additional comments (29)
.github/workflows/ci-platform-siracusa.yml (1)

57-75: Test additions well-integrated and properly organized.

The new kernel tests (lines 57–63 for Float2D convolutions with and without DW variants, line 75 for reshape with skip connection) and the model test (line 99 for TinyViT demo) follow established naming conventions, are logically grouped by operation type, and align with the PR's stated objectives to add FP test coverage.

Based on the established pattern (documented in learnings), these test names map to directories in DeeployTest/Tests/, and the prior review confirmed these directories exist and tests pass successfully.

Also applies to: 99-99

CHANGELOG.md (1)

100-103: Approve changelog entries for bug fixes.

The fixed section comprehensively documents the four key bug fixes (alias_of parameter handling, non-batched FP GEMM/matmul, closure naming, and FP add/mul data types) in alignment with the PR objectives. The descriptions are clear and concise.

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: LGTM!

The storage of n_cores is straightforward and correct.

Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (2)

127-131: Nice performance improvement.

The refactored implementation using iterative construction is O(n) instead of O(n²) with repeated .index() calls, and the logic is clearer.


338-383: New DW convolution passes look well-structured.

The NCHWtoNHWCDwConvPass and PULPFPDWConvPass classes follow the established pattern-based pass architecture correctly. However, they depend on _PULPDWNCHWtoNHWC_fun, which has a syntax error that must be fixed first.

Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)

13-14: LGTM: MIN casts prevent signed/unsigned mixups

Casting ${size} to uint32_t here is correct and avoids UB on MIN comparisons.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

100-117: Im2Col call argument order looks correct

X/Y, kernel, and stride ordering matches the expected implementation for these templates.

Based on learnings.


21-24: Conv2D Im2Col scratch sizing formula is correct

The formula matches the C kernel's pContextBuffer requirement. The buffer is correctly sized as (typeWidth // 8) * n_cores * ch_im_in * kx * ky bytes, which corresponds to the kernel logic where each core gets ch_im_in * kernel_x * kernel_y float elements, split across n_cores with pointer arithmetic advancing by element count. For fp32 (typeWidth=32), this yields the exact buffer needed: n_cores * ch_im_in * kx * ky * 4 bytes.

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

164-173: Review comment is incorrect: prior cleanup was not completed.

The first BCube construction block (lines 146-160) still exists as dead code. It is never appended to inputBCubes—only the second BCube constructed at line 172 is appended (line 173). The review comment incorrectly asserts that "earlier deleted branch already tried to build BCube," but this branch was never deleted. While lines 164-173 correctly represent the single active path, the orphaned BCube construction at lines 146-160 contradicts the cleanup claim.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-82: FP Conv2D: bias arity check LGTM

Allowing 2 or 3 inputs at parse time is correct and aligns with bias support.


105-117: FP Conv2D: bias flags wiring is consistent

Setting has_bias ("true"/"false") and mapping bias when present matches existing templates.

Please confirm FloatConvTemplate.reference2DIm2ColTemplate expects:

  • keys: data_in, weight, data_out, has_bias, bias.

123-190: Remove suggestion to validate DW early in parseNode; current design is sound.

The review comment suggests early depthwise validation in parseNode, but this is architecturally infeasible. The parseNode method receives only the node (without context), so ch_im_in—needed to verify group == ch_im_in—is unavailable. Input channel dimensions are only accessible in parseNodeCtxt via ctxt.lookup(), which is where the DW check correctly occurs (line ~167).

The mention of "F_total % C == 1" references an undefined formula not present anywhere in the codebase.

The current two-stage design (attribute validation in parseNode, context validation in parseNodeCtxt) correctly reflects the available information at each stage and is the proper approach.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/Bindings.py (5)

12-19: Import expansion LGTM

Float data types and new templates are correctly imported for the added FP bindings.


151-152: DMA Slice: template swap LGTM

Switch to DMASliceTemplate.referenceTemplate is consistent with DMA path.


163-172: Slice bindings for FP types: OK

SliceChecker + SliceTemplate with ForkTransformer over FloatDataTypes matches the Float path.


299-303: Float ReduceMean bindings: OK

Mixed float + integer axes signature bound to FloatReduceMeanTemplate with ClusterTransformer is consistent with parser behavior for FP opset.


239-245: FP DW Conv2D binding confirmed: template keys match parser

Verified that PULPFPDWConv2DParser.parseNodeCtxt() produces both has_bias and bias keys in operatorRepresentation (Parsers.py:175–176, 179), which are correctly expected by FloatConvTemplate.referenceDW2DIm2ColTemplate (FloatConvTemplate.py:81, 110, 142). The binding is correct.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Review comment is incorrect — no bug exists

NetworkContext.copy() uses copy.copy(self), which is a shallow copy that preserves instance attributes like n_cores (they remain in the object's __dict__). _convertCtxtToStaticSchedule returns the same ctxt instance modified in-place, so n_cores naturally survives. Line 988's reassignment is defensive programming but not required for correctness.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

36-39: Check argument order consistency at call sites.

Kernels accept (H, W, P, Q, SP, SQ); FloatConvTemplate uses X,Y ordering per prior notes. Please confirm bindings pass the correct order to avoid transposed dims.

Based on learnings

Also applies to: 134-138

Deeploy/DeeployTypes.py (1)

1205-1210: Constructor argument mismatch in parseOutputs.

Use aliases=[] instead of alias_of=[] to match VariableBuffer.init.

Apply:

-                nb = ctxt.VariableBuffer(
-                    name = name,
-                    shape = node.shape,
-                    alias_of = [],
-                )
+                nb = ctxt.VariableBuffer(
+                    name = name,
+                    shape = node.shape,
+                    aliases = [],
+                )

Likely an incorrect or invalid review comment.

DeeployTest/testMVP.py (3)

84-92: LGTM!

The n_cores parameter is correctly propagated to mapDeployer, and the formatting is clean and consistent.


201-205: LGTM!

The CLI argument is well-defined with a sensible default and clear help text. The parameter correctly flows through to the deployment pipeline.


256-262: LGTM!

The n_cores parameter is correctly passed from CLI arguments to setupDeployer, completing the propagation chain through the deployment pipeline.

Deeploy/Targets/PULPOpen/Platform.py (6)

8-9: LGTM!

All new imports are properly structured and used within the file. The additions support FP depthwise convolution, slice operations, and the new optimization pass.

Also applies to: 32-32, 35-36


77-77: LGTM!

The FPDWConv2DMapper definition follows the established pattern and correctly pairs the FP depthwise conv parser with its bindings.


92-94: LGTM!

Both slice mappers correctly use SliceParser with different bindings to support DMA-based and software-based slice implementations. The mapper defined first in the mapping list (line 134) will have precedence during binding selection.


234-235: LGTM!

The RemoveEmptyConvBiasPass is correctly positioned near the end of the optimization pipeline, after merge and requantization passes. This placement allows the pass to clean up any empty bias parameters that may have been introduced or remain after earlier transformations.


134-134: No action needed; original review comment is incorrect.

The mapper order does not affect precedence. PULPDMASliceBindings operates on IntegerDataTypes while PULPSliceBindings operates on FloatDataTypes—these are mutually exclusive sets. The SliceChecker type constraints determine which binding applies, not the array order. The current ordering [SliceMapper, DMASliceMapper] is safe.

Likely an incorrect or invalid review comment.


108-108: Based on my verification of the parser constraint logic:

FPConv2DMapper matches regular convolutions exclusively:

  • PULPFPConv2DParser.parseNode() explicitly requires self.operatorRepresentation['group'] == 1
  • This constraint rejects any depthwise convolution (where group == ch_im_in)

FPDWConv2DMapper matches depthwise convolutions:

  • PULPFPDWConv2DParser.parseNode() does not have a group==1 constraint
  • Depthwise validation occurs in parseNodeCtxt() at line 186: if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']

The mapper ordering [FPConv2DMapper, FPDWConv2DMapper] is correct: the more constrained parser (requiring group==1) is tried first, ensuring regular convolutions are matched before attempting depthwise matching. Any regular convolution caught by FPConv2DMapper will not reach FPDWConv2DMapper.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-245: Replace mutable default (shape=[1]) with immutable and type-annotate.

Avoid shared mutable defaults.

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):

379-388: Fix visited set creation in has_live_ancestors (bug).

set(self.name) splits into characters; use a singleton set.

-        visited = set(self.name)
+        visited = {self.name}
♻️ Duplicate comments (6)
CHANGELOG.md (1)

23-23: Remove duplicate PR entry (#105).

The “Refactor tiling code generation [#105]” line is duplicated. Remove the second occurrence to avoid confusion.

- - Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

25-26: Avoid mutable default for inputOffsets.

Use Optional and initialize inside to prevent shared state.

-                 inputOffsets: Dict[str, int] = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-        if inputOffsets == {}:
-            for key in inputTypes.keys():
-                inputOffsets[key] = 0
+        if inputOffsets is None:
+            inputOffsets = {key: 0 for key in inputTypes.keys()}
Deeploy/Targets/PULPOpen/Deployer.py (1)

40-53: Avoid mutable default for inputOffsets; forward Optional to base.

Align with SignPropDeployer’s fix.

-from typing import Callable, Dict, List, Type
+from typing import Callable, Dict, List, Optional, Type
@@
-                 inputOffsets = {},
+                 inputOffsets: Optional[Dict[str, int]] = None,
                  n_cores: int = 8):
@@
-            inputOffsets = inputOffsets,
+            inputOffsets = inputOffsets,
             n_cores = n_cores,
Deeploy/DeeployTypes.py (3)

325-329: Constructor arg mismatch: use ‘aliases’, not ‘alias_of’.

This raises a TypeError at runtime.

-        return (cls(
-            name = node.name,
-            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
-            alias_of = [],
-        ))
+        return cls(
+            name = node.name,
+            shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
+            aliases = [],
+        )

442-447: Avoid mutable defaults in ConstantBuffer.init.

Use immutable defaults and convert.

-    def __init__(self, name: str = '', shape = [1], values = [0]):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)):
         # Pass a copy of alias_of to avoid shared references
         super().__init__(name, shape)
 
-        values = np.asarray(values)
+        values = np.asarray(values)

331-365: Align alias helpers with self.aliases (set) and remove alias_of usage.

Current code references a non-existent attribute and never updates aliases.

-    def add_aliases(self, aliases_to_add: List[str]):
+    def add_aliases(self, aliases_to_add: List[str]):
@@
-        if not hasattr(self, "alias_of"):
-            return None
-
-        for alias in aliases_to_add:
-            if alias not in self.alias_of:
-                self.alias_of.append(alias)
+        if not hasattr(self, "aliases") or self.aliases is None:
+            self.aliases = set()
+        for alias in aliases_to_add:
+            self.aliases.add(alias)
         return None
@@
-    def get_aliases_of(self):
+    def get_aliases_of(self) -> List[str]:
@@
-        if hasattr(self, "alias_of"):
-            return list(self.alias_of)
-        else:
-            return list()
+        return list(self.aliases) if hasattr(self, "aliases") and self.aliases is not None else []
🧹 Nitpick comments (14)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1)

158-159: Consider defensive handling for edge cases in name prefixing.

The underscore prefix addresses names starting with numbers as intended. However, the implementation doesn't guard against:

  • Empty or None name values
  • Names already starting with _ (which would produce __, a reserved identifier)

Consider adding a check:

-        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
-        self.closureName = "_" + name + self.closureSuffix
+        # Prepend underscore to avoid name issues when beginning with problematic characters (like numbers)
+        # Avoid creating double-underscore if name already starts with underscore
+        prefix = "" if name and name[0] == "_" else "_"
+        self.closureName = prefix + name + self.closureSuffix
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)

11-21: LGTM! Correct fix for non-batched input handling.

The conditional logic properly addresses the bug by only advancing input pointers when the corresponding input is actually batched. When A_batched or B_batched is false, the base pointer is reused across all batch iterations, which correctly implements broadcasting semantics.

The pointer arithmetic for the batched case is correct:

  • A offset: b * M * N for shape [batch, M, N]
  • B offset: b * N * O for shape [batch, N, O]

Minor: Line 17 has trailing whitespace after the colon that could be cleaned up.

-    % if B_batched:                  
+    % if B_batched:
Deeploy/Targets/Generic/Parsers.py (1)

1071-1074: Consider adding strict=True to zip() calls for defensive programming.

The zip() calls iterate over inputs and outputs with expected fixed-length lists. While parseNode already validates these lengths, adding strict=True (Python 3.10+) provides an explicit safety guarantee and catches bugs if the validation logic changes.

Apply this diff if Python 3.10+ is used:

-    for tensor, symName in zip(node.inputs, ['data_in', 'shape']):
+    for tensor, symName in zip(node.inputs, ['data_in', 'shape'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name
-    for tensor, symName in zip(node.outputs, ['data_out']):
+    for tensor, symName in zip(node.outputs, ['data_out'], strict=True):
         self.operatorRepresentation[symName] = ctxt.lookup(tensor.name).name

Based on static analysis hints.

Deeploy/TilingExtension/TilerExtension.py (1)

988-989: Propagate n_cores earlier (and guard against missing attribute).

Set ctxt.n_cores when the context is created (or before tiling) to avoid surprises, and guard for attribute presence to prevent AttributeError if wrappers differ.

Example:

-        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
-        self.ctxt.n_cores = self.n_cores
+        self.ctxt = self.tiler._convertCtxtToStaticSchedule(self.ctxt, memoryMap)
+        if hasattr(self, "n_cores"):
+            self.ctxt.n_cores = self.n_cores
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (1)

35-35: Drop redundant self.n_cores assignment.

NetworkDeployer already stores n_cores; remove this duplicate to keep a single source of truth.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)

43-51: Fix return type annotation to match base (Union[int, IntVar]).

The function returns an int, but is annotated as str. Align with NodeTemplate’s signature.

-    def computeTransientBuffersSize(ctxt: NetworkContext,
-                                    operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]:
+    def computeTransientBuffersSize(
+        ctxt: NetworkContext,
+        operatorRepresentation: OperatorRepresentation,
+    ) -> List[Tuple[str, Union[int, IntVar]]]:

46-51: Update stale comment about using size as string.

The returned size is numeric and also passed to hoistTransientBuffer; the “used as string” warning is misleading. Remove or clarify it.


59-65: Prefer annotateType over mutating referencedType.

Setting _type.referencedType mutates the class and can have side effects. Use annotateType to set a precise Pointer type and instance.

-        # Manually set the type of the im2col buffer to match the input type, since it defaults to void for transient buffers
-        ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup(
-            operatorRepresentation['data_in'])._type.referencedType
+        # Set the buffer pointer type to match data_in
+        ctxt.annotateType(im2col_name, ctxt.lookup(operatorRepresentation['data_in'])._type)

93-101: Nit: fix comment typos (“ChannelOout” → “ChannelOut”).

Purely cosmetic; improves readability.

Also applies to: 124-131

Deeploy/DeeployTypes.py (1)

573-576: Avoid mutable dict defaults in NetworkContext.init.

Initialize dicts inside to prevent shared state.

-                 globalObjects = {},
-                 localObjects = {},
+                 globalObjects: Optional[OrderedDict] = None,
+                 localObjects: Optional[OrderedDict] = None,
@@
-        self.globalObjects = OrderedDict()
-        self.localObjects = OrderedDict()
+        self.globalObjects = OrderedDict() if globalObjects is None else globalObjects
+        self.localObjects = OrderedDict() if localObjects is None else localObjects
DeeployTest/testUtils/platformMapping.py (2)

92-104: Use int for n_cores (not Optional) since default is numeric.

-    n_cores: Optional[int] = 8,
+    n_cores: int = 8,

114-188: Propagate n_cores to all SignProp-based deployers for consistency.

CMSIS, MemPool, SoftHier, Generic deployers inherit SignPropDeployer and accept n_cores. Pass it through so CLI behavior is uniform.

Example for each block:

-        deployer = CMSISDeployer(...,
-                                 deeployStateDir = deeployStateDir,
-                                 inputOffsets = inputOffsets)
+        deployer = CMSISDeployer(...,
+                                 deeployStateDir = deeployStateDir,
+                                 inputOffsets = inputOffsets,
+                                 n_cores = n_cores)

Apply similarly to MemPoolDeployer, SoftHierDeployer, GenericDeployer.

Also applies to: 140-188

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

50-148: Code duplication acceptable for performance.

The has_bias and no-bias paths are duplicated (lines 50-148 vs 149-248), which increases maintainability burden but avoids branching inside tight loops. This trade-off is reasonable for performance-critical embedded kernels.

Consider adding a comment noting this is intentional for performance if future maintainers question the duplication.

TargetLibraries/PULPOpen/src/Convolution_fp32.c (1)

41-101: Code duplication acceptable for performance.

The has_bias and no-bias code paths are duplicated to avoid conditional branching inside the innermost loops (lines 47-64). This is a common optimization in performance-critical embedded code where avoiding branches can significantly improve execution speed.

Similar duplication exists in the im2col path (lines 140-218). Consider adding a brief comment explaining this trade-off for future maintainers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85edf76 and 945b610.

📒 Files selected for processing (24)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py (2 hunks)
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1 hunks)
  • Deeploy/DeeployTypes.py (12 hunks)
  • Deeploy/Targets/Generic/Parsers.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (6 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (7 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (1 hunks)
  • DeeployTest/testMVP.py (4 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/platformMapping.py (2 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
  • DeeployTest/testRunner_tiled_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
📚 Learning: 2025-09-24T11:43:47.236Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: .github/workflows/ci-platform-siracusa.yml:57-60
Timestamp: 2025-09-24T11:43:47.236Z
Learning: In the Deeploy test system, test names in CI workflows correspond to directory names under DeeployTest/Tests/, not Python function names. The TestRunner class executes tests by passing directory paths via the `-t` argument, where each directory contains test configurations and definitions.

Applied to files:

  • .github/workflows/ci-platform-siracusa.yml
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (9)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
  • inputs (2569-2586)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2569-2586)
  • lookup (778-810)
  • parseNode (1091-1106)
  • parseNodeCtxt (1109-1134)
  • NetworkContext (564-1078)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1220-1286)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
  • NetworkDeployer (3244-3629)
DeeployTest/testUtils/platformMapping.py (1)
  • mapDeployer (92-261)
DeeployTest/testSchedulingExtension.py (1)
  • _mockScheduler (29-33)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (564-1078)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (879-898)
  • lookup (778-810)
Deeploy/Targets/PULPOpen/Bindings.py (4)
Deeploy/CommonExtensions/DataTypes.py (4)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • uint8_t (36-39)
Deeploy/DeeployTypes.py (3)
  • CodeTransformation (2352-2386)
  • NodeBinding (1574-1719)
  • NodeTemplate (87-229)
Deeploy/Targets/Generic/TypeCheckers.py (2)
  • SliceChecker (37-51)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentPlatform (2439-2482)
  • TopologyOptimizer (2237-2266)
  • NetworkDeployer (3244-3629)
Deeploy/Targets/PULPOpen/Deployer.py (1)
  • PULPDeployer (29-127)
Deeploy/DeeployTypes.py (2)
DeeployTest/testUtils/dmaUtils.py (1)
  • size (72-73)
Deeploy/AbstractDataTypes.py (1)
  • Pointer (303-388)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (519-524)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-190)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1722-1878)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.1)
Deeploy/Targets/Generic/Parsers.py

1071-1071: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


1073-1073: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

DeeployTest/testMVP.py

58-58: Unused function argument: verbose

(ARG001)

Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Deeploy/Targets/PULPOpen/Deployer.py

40-40: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (18)
.github/workflows/ci-platform-siracusa.yml (1)

57-64: CI test coverage additions align well with PR features.

The new tests for FP 2D convolution (standard and depthwise variants with/without bias), reshape with skip connection, and TinyViT demo align cleanly with the PR's feature additions. Test naming is consistent with existing patterns, and per the earlier review resolution, these tests exist as directories in DeeployTest/Tests/. The additions integrate smoothly into the existing kernel and model test lists without logic changes or structural issues.

Also applies to: 75-75, 99-99

Deeploy/Targets/Generic/Parsers.py (1)

1070-1076: LGTM! Clear and Pythonic mapping implementation.

The explicit mapping of inputs to ['data_in', 'shape'] and outputs to ['data_out'] improves clarity and follows the pattern established by other parsers in the codebase (e.g., iRMSNormParser lines 65-71). The use of zip() is more Pythonic than index-based access.

Deeploy/DeeployTypes.py (2)

879-899: Ensure hoistTransientBuffer size is always int.

If any templates pass symbolic strings, they’ll break here. Confirm all computeTransientBuffersSize implementations return int/IntVar as per signature.


2696-2702: Good: preserve n_cores on context reinit.

Reconstructing NetworkContext with n_cores=self.ctxt.n_cores avoids losing state on parse.

Deeploy/Targets/PULPOpen/Parsers.py (3)

80-81: LGTM! Bias support correctly integrated.

The relaxed input count check now allows optional bias (2 inputs without bias, 3 inputs with bias), which aligns with the bias handling added in parseNodeCtxt.


106-118: LGTM! Clean bias handling implementation.

The conditional bias handling correctly maps inputs based on the node's input count, setting appropriate flags for downstream code generation.


124-191: LGTM! Well-structured depthwise convolution parser.

The parser correctly validates depthwise convolution constraints:

  • Square padding requirement
  • Optional bias support (2 or 3 inputs)
  • Depthwise verification via group == ch_im_in check (line 187)

The structure mirrors the regular FP Conv2D parser while adding depthwise-specific validation.

DeeployTest/testMVP.py (1)

57-59: LGTM! Clean n_cores parameter propagation.

The n_cores parameter is consistently threaded through the deployment setup:

  1. Added to setupDeployer signature
  2. Passed to mapDeployer
  3. Exposed via CLI argument (default: 8)
  4. Propagated from parsed arguments

This enables proper sizing of per-core buffers (e.g., im2col buffers) in multi-core PULP deployments.

Also applies to: 84-92, 201-205, 261-262

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (2)

10-36: LGTM! Proper per-core work distribution.

The function correctly partitions output channels across cores:

  • Computes per-core channel chunks with remainder handling
  • Early-exits when no channels assigned to a core
  • Adjusts weight and im2col buffer pointers per core

This enables efficient parallel execution on multi-core PULP clusters.


103-145: LGTM! Correct depthwise convolution logic.

The input channel iteration (lines 103-104) correctly computes the range based on output channels assigned to the core, accounting for the depthwise constraint where each input channel produces F_total/C output channels (the "group" parameter in ONNX Conv).

The output channel range computation (lines 118-130) properly clamps to the core's assigned range, ensuring correct work distribution.

Deeploy/Targets/PULPOpen/Bindings.py (3)

151-151: LGTM! Clear separation of DMA and non-DMA slice operations.

The update clarifies the distinction:

  • PULPDMASliceBindings (line 151): Uses DMASliceTemplate for asynchronous DMA-based slicing returning a Future
  • PULPSliceBindings (lines 163-172): Uses SliceTemplate for synchronous float slicing with ForkTransformer

This separation enables appropriate code transformation for each use case.

Also applies to: 163-172


239-245: LGTM! Depthwise convolution bindings correctly configured.

The new PULPFloatDWConv2DBindings:

  • Support all float types via FloatDataTypes iteration
  • Use FloatConvTemplate.referenceDW2DIm2ColTemplate (the new DW im2col template)
  • Apply ForkTransformer for multi-core execution

This aligns with the new PULPFPDWConv2DParser and enables FP depthwise convolution support.


298-302: LGTM! Extended ReduceMean support for mixed float/integer inputs.

The addition supports ReduceMean operations with float data and integer axes/keepdims parameters, using FloatReduceMeanTemplate. This is a natural extension of the existing integer-only ReduceMean bindings.

Deeploy/Targets/PULPOpen/Platform.py (2)

8-9: LGTM! Proper wiring of new depthwise and slice capabilities.

The platform changes correctly integrate the new functionality:

  1. FPDWConv2DMapper (line 77) combines the new parser and bindings for FP depthwise convolution
  2. Conv layer (line 108) now supports both FPConv2DMapper and FPDWConv2DMapper
  3. Slice operation (line 134) supports both regular and DMA-based mappers
  4. All necessary imports added for new parsers, bindings, and optimization passes

This completes the integration stack for FP depthwise convolution on PULP.

Also applies to: 32-32, 35-36, 77-77, 92-94, 108-108, 134-134


234-235: LGTM! RemoveEmptyConvBiasPass addition is appropriate.

Adding RemoveEmptyConvBiasPass to the PULPOptimizer makes sense when introducing bias support. This pass likely removes unnecessary bias parameters when the bias tensor is empty or zero, optimizing the generated code.

The placement after PULPAddRequantMergePass is reasonable in the optimization pass sequence.

TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)

12-27: LGTM! Header declarations consistent with implementations.

The function signatures correctly extend the existing FP32 convolution API:

  1. PULP_Conv2d_fp32_fp32_fp32_HWC (lines 12-18): Added pSrcBias and has_bias parameters
  2. PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 20-27): Added bias parameters and pContextBuffer
  3. PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (lines 29-36): New entry point for depthwise im2col convolution

All declarations match the implementations in Convolution_fp32.c and DWConvolution_fp32.c. Based on learnings.

Also applies to: 29-36

TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)

10-34: LGTM! Proper multi-core parallelization added.

The updated function now includes:

  1. Bias parameter support (pSrcBias, has_bias)
  2. Per-core output channel distribution (lines 18-31)
  3. Core-specific weight pointer adjustment (line 34)
  4. Early exit for cores with no assigned channels (lines 29-31)

This enables efficient parallel execution while maintaining correctness. Based on learnings.


66-67: LGTM! Correct per-core output indexing and bias application.

The output indexing (lines 66-67, 96-97, 173, 212-214) correctly accounts for:

  1. Global output index calculation: (h_out * W_out + w_out) * F_total + ...
  2. Core-specific channel offset: ch_out_start + f or just f (depending on loop variable)
  3. Bias application using the global channel index (line 67: pSrcBias[f + ch_out_start], line 175: pSrcBias[f])

This ensures correct results when work is distributed across multiple cores.

Also applies to: 175-175

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/DeeployTypes.py (2)

241-241: Complete the mutable default fix for shape parameter.

The aliases parameter correctly uses None, but shape = [1] is still a mutable default. This was flagged in past reviews and by static analysis.

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, aliases: Optional[List[str]] = None):
         self.name: str = name
-        self.shape: Sequence[int] = shape
+        self.shape: Sequence[int] = shape if shape is not None else (1,)

As per past review feedback and static analysis (Ruff B006).


437-439: Fix mutable default arguments in ConstantBuffer.__init__.

Both shape = [1] and values = [0] are mutable defaults, flagged by past reviews and static analysis (Ruff B006).

Apply this diff:

-    def __init__(self, name: str = '', shape = [1], values = [0]):
-        super().__init__(name, shape)
+    def __init__(self, name: str = '', shape: Optional[Sequence[int]] = None, values: Union[Sequence, np.ndarray, None] = None):
+        super().__init__(name, shape if shape is not None else (1,))
-        values = np.asarray(values)
+        values = np.asarray(values if values is not None else [0])

As per past review feedback and static analysis.

♻️ Duplicate comments (2)
CHANGELOG.md (1)

23-23: Remove duplicate PR reference.

Line 23 duplicates the PR #105 entry already present at line 18. Remove the redundant entry to maintain changelog clarity.

- Use Docker digests instead of arch-specific tags [#106](https://github.com/pulp-platform/Deeploy/pull/106)
- Refactor tiling code generation [#105](https://github.com/pulp-platform/Deeploy/pull/105)
- Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)
Deeploy/DeeployTypes.py (1)

331-350: Critical bug: cannot append to a set.

Line 260 initializes self.aliases as a set, but line 348 calls .append() which doesn't exist on sets. This will raise an AttributeError at runtime.

Apply this diff:

     def add_aliases(self, aliases_to_add: List[str]):
         """
         Adds list of aliases to the aliases attribute.
         Parameters
         ----------
         alias_to_add : List[str]
             List of names of aliases to add to the aliases attribute.
         Returns
         -------
         None
         """
 
         if not hasattr(self, "aliases"):
             return None
 
         for alias in aliases_to_add:
-            if alias not in self.aliases:
-                self.aliases.append(alias)
+            self.aliases.add(alias)
 
         return None
🧹 Nitpick comments (1)
CHANGELOG.md (1)

30-32: Reduce successive FP prefix repetition.

Three consecutive entries (lines 30–32) begin with "FP", which reduces readability. Consider rewording one or more entries to vary the opening phrase.

Example refactors:

  • Line 30: "Conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests..."
  • Line 31: "Bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations."
  • Line 32: "DW conv lowering optimization pass (FP variant) similar to the existent one for integer version."
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba1e15a and ffe7a27.

📒 Files selected for processing (2)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (9 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (114)
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
  • GitHub Check: deeploy-memory-allocation
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8_parallel, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (60000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (5000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (microLlama/microLlama8, 60000, 10000, 5000, 8) / test-runner-siracusa-tiled (10000)
  • GitHub Check: siracusa-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
  • GitHub Check: snitch-kernels / test-runner-snitch
  • GitHub Check: siracusa-models / test-runner-siracusa
  • GitHub Check: siracusa-kernels / test-runner-siracusa
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)

558-567: LGTM: n_cores parameter added correctly.

The n_cores parameter is properly added to NetworkContext.__init__ with a default value of 8 and stored as an instance variable. No duplicate assignment.


2479-2519: LGTM: n_cores propagated correctly to NetworkContext.

The n_cores parameter is properly added to NetworkContainer.__init__ and correctly passed through to the NetworkContext constructor at lines 2518-2519.


2679-2685: LGTM: NetworkContext recreation preserves n_cores.

The new NetworkContext is correctly created with n_cores = self.ctxt.n_cores, preserving the core count from the existing context.


3231-3285: LGTM: n_cores propagated through NetworkDeployer.

The n_cores parameter is properly added to NetworkDeployer.__init__, passed to the parent NetworkContainer via super().__init__(), and stored as an instance variable at line 3285.

@diaconuccalin diaconuccalin requested a review from lukamac October 27, 2025 18:58
Copy link
Collaborator

@lukamac lukamac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I reviewed the whole PR. There are still some changes I would like to see specifically around the n_cores and some remnant changes around reshape aliasing. I left some comments also in the older threads so check those too. Thanks!

deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Copy link
Collaborator

@lukamac lukamac Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a universal feature, meaning some targets don't have multiple cores, for example CortexM or Generic. You should add this just to the specific target's deployer, i.e., PULPDeployer

localObjects = {},
name: str = 'DeeployNetwork'):
name: str = 'DeeployNetwork',
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, add this to the target that uses it, not everyone

name: str = 'DeeployNetwork',
deeployStateDir: str = "DeeployState"):
deeployStateDir: str = "DeeployState",
n_cores: int = 8):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you will have to revert 99% of your changes to DeeployTypes.py

operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]:
im2col_dim = 4 * 8 * (operatorRepresentation['ch_im_in'] * operatorRepresentation['dim_kernel_x'] *
operatorRepresentation['dim_kernel_y'])
im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way of getting the n_cores information might be through the operatorRepresentation which you add in the parse step of the PULP deployer. This might be interesting for the future if we ever want to have the number of cores variable per layer. I'm kinda leaning more towards that approach. Let me know what you think

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xeratec any opinions on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this information should be added at the operator level. In a future multi-cluster application, the number of cores will only be relevant for executing a kernel, not for the entire network. Therefore, this information should be associated with the engine and reflected in the operator representation.

I’m fine with merging a temporary solution for now. However, the change should only affect the platform currently under consideration and be linked to the DeploymentEngine, where we ultimately want it to reside anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xeratec @lukamac I couldn't find a good way of linking the core information to the DeploymentEngine. I moved however the core number information to the operatorRepresentation, but it still needs to pass through the NetworkContext. I also restrained its usage to the PULP platform, since it's the only one that currently needs it. Let me know if you'd like me to change anything on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to ReshapeTemplate.py should be unnecessary after #125, no?

Copy link
Contributor Author

@diaconuccalin diaconuccalin Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's actually porting the changes you made in the generic template, since this one is used in the Siracusa tests. Without this, the ReshapeWithSkipConnection test (and also others containing reshape if I remember correctly) would fail on Siracusa.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I change my question to: Why are we not using the generic reshape template in the first place? 😅 could you please try with the generic one and confirm that it's still doing the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an extra _alias leftover parameter that is still used by the tiler, which breaks if removed. I will remove the unnecessary extra code by inheriting the template from generic and only adding this extra parameter.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)

42-48: Consider removing redundant buffer retrieval.

The parent class _GenericReshapeTemplate.alignToContext() already retrieves and validates these buffers (lines 29-33 in Generic template). Since you call super().alignToContext() on line 40, you can retrieve the buffers directly from the returned ctxt without re-fetching them.

Apply this diff to streamline the code:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
+        # Get buffers (already retrieved and validated by parent)
+        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
+        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
+

Or even more concisely, inline them:

-        # Get buffers
-        bufferIn = ctxt.lookup(operatorRepresentation['data_in'])
-        assert isinstance(bufferIn, VariableBuffer)
-
-        bufferOut = ctxt.lookup(operatorRepresentation['data_out'])
-        assert isinstance(bufferOut, VariableBuffer)
-
         # Linking required for tiling
-        bufferOut._alias = bufferIn.name
+        ctxt.lookup(operatorRepresentation['data_out'])._alias = ctxt.lookup(operatorRepresentation['data_in']).name
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ea12db and 71b7e79.

📒 Files selected for processing (2)
  • Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (3)
  • NodeTemplate (87-229)
  • lookup (722-754)
  • VariableBuffer (232-360)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)
Deeploy/DeeployTypes.py (4)
  • NetworkContext (508-1022)
  • VariableBuffer (232-360)
  • alignToContext (119-139)
  • lookup (722-754)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (2)
  • _GenericReshapeTemplate (10-39)
  • alignToContext (15-39)
🔇 Additional comments (3)
Deeploy/Targets/Generic/Templates/ReshapeTemplate.py (1)

10-10: LGTM! Clean refactoring to clarify template hierarchy.

The rename from _ReshapeTemplate to _GenericReshapeTemplate improves clarity and aligns well with the inheritance pattern where the PULP-specific template now extends this base class.

Also applies to: 42-42

Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (2)

28-32: LGTM! Inheritance approach correctly implemented.

The addition of the VariableBuffer import and inheritance from _GenericReshapeTemplate properly implements the agreed-upon design from the past review discussion.


49-50: LGTM! Tiler-specific aliasing correctly preserved.

The _alias attribute assignment is correctly preserved for tiler compatibility, as discussed in the past review comments. This works alongside the parent's bidirectional aliases set mechanism.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71b7e79 and e9e6395.

📒 Files selected for processing (8)
  • Deeploy/DeeployTypes.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • DeeployTest/generateNetwork.py (2 hunks)
  • DeeployTest/testMVP.py (5 hunks)
  • DeeployTest/testRunner_siracusa.py (1 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Parsers.py
  • Deeploy/Targets/PULPOpen/Deployer.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
PR: pulp-platform/Deeploy#105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (3)
DeeployTest/testRunner_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
  • TestRunner (279-448)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (508-1020)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (821-840)
  • lookup (720-752)
DeeployTest/testMVP.py (1)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPPlatform (251-259)
🪛 Ruff (0.14.2)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

44-44: Unused static method argument: ctxt

(ARG004)

DeeployTest/testMVP.py

59-59: Unused function argument: verbose

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (84)
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
  • GitHub Check: deeploy-typing
  • GitHub Check: cortexm-kernels / test-runner-cortexm
  • GitHub Check: cortexm-models / test-runner-cortexm
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (45000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (16000)
  • GitHub Check: siracusa-models-tiled-singlebuffer-L3 (simpleRegression, 45000, 30000, 16000, 8, L3) / test-runner-siracusa-tiled (30000)
  • GitHub Check: chimera-kernels / test-runner-chimera
  • GitHub Check: generic-kernels / test-runner-generic
  • GitHub Check: generic-models / test-runner-generic
  • GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
🔇 Additional comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (4)

21-23: LGTM: Buffer sizing now accounts for runtime core count.

The dynamic n_cores factor correctly sizes the Im2Col buffer for parallel execution across cores.


47-50: Correct: DW conv buffer omits ch_im_in factor.

Unlike regular convolution, depthwise convolution processes each channel independently, so the Im2Col buffer is correctly sized per-core with kernel dimensions only, without the ch_im_in factor.


61-62: Clarify: Why set buffer type if allocation size is already in bytes?

The code explicitly sets the buffer's referencedType, but according to earlier comments, "Buffer variables don't support type assignment and require specifying the total size." Since im2col_dim already includes byte-size calculation (line 48-50: typeWidth // 8), what purpose does this type assignment serve?

If it's for C codegen or downstream type checking, consider adding a comment explaining the intent.


82-82: LGTM: Templates correctly updated with bias support and proper argument ordering.

The additions are consistent:

  • Bias parameters (${bias}, ${has_bias}) added to all convolution calls
  • Im2Col templates correctly pass ${ctxtBuffer} for scratch space
  • Argument ordering (X,Y for dimensions/kernels/strides) matches the kernel prototypes

Based on learnings.

Also applies to: 101-123, 125-155

@diaconuccalin diaconuccalin requested a review from lukamac October 30, 2025 15:02
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
CHANGELOG.md (1)

27-33: Consider rewording consecutive "FP" entries for stylistic improvement.

Lines 30–32 in the Added section begin with "FP," which creates successive repetition. While this is minor in a technical changelog, consider rewording one or two entries for better readability without losing clarity.

Example rewording (optional):

  • Line 30: Keep as-is or rephrase to emphasize the TinyViT contribution (e.g., "Conv 2D (simple & DW) tests, reshape & skip connection tests, and TinyViT demo tests for...")
  • Line 31: Rephrase to reduce "FP" lead (e.g., "Slice, DW conv 2D, and reduce mean bindings and mappings in floating-point...")
  • Line 32: Keep as-is since it emphasizes the parallel with the integer version
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdfd09c and 5e30cdf.

📒 Files selected for processing (1)
  • CHANGELOG.md (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • CHANGELOG.md
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (1)
CHANGELOG.md (1)

7-7: PR entry added correctly.

The addition of PR #117 to the List of Pull Requests is properly formatted and positioned at the top, matching the project's convention.

Comment on lines +105 to +107
- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
- Data types in the PULPOpen FP add and mul templates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix sentence fragment in Fixed section.

Line 107 is missing the verb "Fixed," creating a sentence fragment that's inconsistent with the other entries in this section.

Apply this diff to correct it:

- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
- Data types in the PULPOpen FP add and mul templates.
+ Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
+ Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
+ Fixed data types in the PULPOpen FP add and mul templates.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
- Data types in the PULPOpen FP add and mul templates.
- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
- Fixed data types in the PULPOpen FP add and mul templates.
🤖 Prompt for AI Agents
In CHANGELOG.md around lines 105 to 107, the third bullet (line 107) is a
sentence fragment missing the leading verb "Fixed"; update that line to start
with "Fixed" so it matches the style of the other entries (e.g., "Fixed data
types in the PULPOpen FP add and mul templates.").

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

71-96: Fix bottom/right padding loops to avoid unsigned wrap

When h_in_start or w_in_start is negative (typical at padded borders), the expressions MAX(H, h_in_start) / MAX(W, w_in_start) promote the signed value to uint32_t. For example with h_in_start = -2, the start becomes 0xFFFFFFFE, so the for condition h_in < h_in_start + P is immediately false. The bottom/right loops never execute, stale im2col data survives, and border outputs accumulate garbage. Please keep the math in signed space before iterating.

Apply this pattern to both the bias and no-bias blocks:

-        for (uint32_t h_in = MAX(H, h_in_start); h_in < h_in_start + P; h_in++) {
+        int32_t h_pad_start = MAX(h_in_start, (int32_t)H);
+        int32_t h_pad_end = h_in_start + (int32_t)P;
+        for (int32_t h_in = h_pad_start; h_in < h_pad_end; ++h_in) {
@@
-        for (uint32_t w_in = MAX(W, w_in_start); w_in < w_in_start + Q; w_in++) {
+        int32_t w_pad_start = MAX(w_in_start, (int32_t)W);
+        int32_t w_pad_end = w_in_start + (int32_t)Q;
+        for (int32_t w_in = w_pad_start; w_in < w_pad_end; ++w_in) {

Mirror the same change in the else branch so every padding loop uses int32_t bounds before casting to indices.

Also applies to: 170-195

🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)

49-51: Technical debt: _alias assignment is a tiler workaround.

The comment explicitly notes this is a HACK to support tiling that wasn't updated in PR #125. While the workaround is functional, consider tracking this as a follow-up task to properly update the tiler and remove this manual aliasing.

DeeployTest/testMVP.py (1)

59-60: Consider removing unused verbose parameter.

Static analysis correctly identifies that the verbose parameter is never used within setupDeployer. The CodeGenVerbosity object is only passed later during deployer.prepare() and deployer.generateFunction() calls at the module level (lines 286, 292, 304).

If verbose is not needed in the setup phase, remove it from the signature:

 def setupDeployer(graph: gs.Graph, memoryHierarchy: MemoryHierarchy, defaultTargetMemoryLevel: MemoryLevel,
-                  defaultIoMemoryLevel: MemoryLevel, verbose: CodeGenVerbosity, args: argparse.Namespace,
+                  defaultIoMemoryLevel: MemoryLevel, args: argparse.Namespace,
                   n_cores: int) -> Tuple[NetworkDeployer, bool]:

Also remove it from the call site at lines 259-265.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e30cdf and d5bd025.

📒 Files selected for processing (26)
  • .github/workflows/ci-platform-siracusa.yml (3 hunks)
  • CHANGELOG.md (6 hunks)
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py (1 hunks)
  • Deeploy/CommonExtensions/DataTypes.py (1 hunks)
  • Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2 hunks)
  • Deeploy/DeeployTypes.py (4 hunks)
  • Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Bindings.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (8 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1 hunks)
  • DeeployTest/generateNetwork.py (2 hunks)
  • DeeployTest/testMVP.py (5 hunks)
  • DeeployTest/testRunner_siracusa.py (1 hunks)
  • DeeployTest/testRunner_tiled_siracusa.py (1 hunks)
  • DeeployTest/testUtils/testRunner.py (3 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h (1 hunks)
  • TargetLibraries/PULPOpen/inc/kernel/gemm.h (1 hunks)
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c (4 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • DeeployTest/generateNetwork.py
  • .github/workflows/ci-platform-siracusa.yml
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
  • Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
  • DeeployTest/testRunner_tiled_siracusa.py
  • DeeployTest/testRunner_siracusa.py
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
  • CHANGELOG.md
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • TargetLibraries/PULPOpen/inc/kernel/Conv.h
  • TargetLibraries/PULPOpen/src/Convolution_fp32.c
  • Deeploy/Targets/PULPOpen/Parsers.py
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (12)
DeeployTest/testMVP.py (2)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPPlatform (256-264)
Deeploy/DeeployTypes.py (2)
  • CodeGenVerbosity (49-55)
  • NetworkDeployer (3170-3548)
Deeploy/Targets/PULPOpen/Templates/ReshapeTemplate.py (1)
Deeploy/DeeployTypes.py (4)
  • NetworkContext (508-1020)
  • VariableBuffer (232-360)
  • alignToContext (119-139)
  • lookup (720-752)
Deeploy/CommonExtensions/DataTypes.py (1)
Deeploy/AbstractDataTypes.py (2)
  • IntegerImmediate (173-224)
  • typeWidth (399-400)
Deeploy/Targets/PULPOpen/Deployer.py (2)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPClusterEngine (244-253)
Deeploy/DeeployTypes.py (6)
  • _selectEngine (2565-2569)
  • bind (1600-1625)
  • bind (1767-1791)
  • bind (2064-2108)
  • bind (2723-2759)
  • hoistGlobalDefinition (842-855)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (5)
  • float32_t (74-78)
  • int8_t (12-15)
  • int32_t (24-27)
  • int64_t (30-33)
  • uint8_t (36-39)
Deeploy/Targets/Generic/TypeCheckers.py (3)
  • SliceChecker (37-51)
  • ReshapeChecker (122-136)
  • ReduceMeanChecker (310-324)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (518-523)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-182)
Deeploy/DeeployTypes.py (1)
  • NodeMapper (1660-1816)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (508-1020)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (821-840)
  • lookup (720-752)
Deeploy/DeeployTypes.py (2)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)
  • _selectEngine (69-70)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
  • _selectEngine (51-58)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
  • PULP_Conv2d_fp32_fp32_fp32_HWC (10-102)
  • PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC (104-219)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
  • PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC (10-251)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (2)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2)
  • NetworkDeployerWrapper (10-76)
  • _selectEngine (69-70)
Deeploy/DeeployTypes.py (3)
  • DeploymentEngine (2327-2374)
  • NetworkDeployer (3170-3548)
  • _selectEngine (2565-2569)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (2)
Deeploy/DeeployTypes.py (3)
  • DeploymentEngine (2327-2374)
  • NetworkDeployer (3170-3548)
  • _selectEngine (2565-2569)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
  • _selectEngine (51-58)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2503-2520)
  • lookup (720-752)
  • parseNode (1033-1048)
  • parseNodeCtxt (1051-1076)
  • NetworkContext (508-1020)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1218-1284)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.2)
DeeployTest/testMVP.py

59-59: Unused function argument: verbose

(ARG001)

Deeploy/Targets/PULPOpen/Deployer.py

83-83: Local variable autoTransposePass is assigned to but never used

Remove assignment to unused variable autoTransposePass

(F841)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

48-48: Unused static method argument: ctxt

(ARG004)

🔇 Additional comments (8)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (2)

10-11: LGTM! Improved type safety for core indexing.

The change from int8_t to uint32_t for core_id and log2Core is appropriate because:

  • Core IDs are naturally non-negative
  • Supports more than 127 cores (int8_t maximum)
  • Consistent with unsigned arithmetic in the chunk calculation (line 12)
  • Explicit cast makes the truncation of log2(NUM_CORES) intentional and clear

13-14: LGTM! Consistent unsigned arithmetic.

The explicit (uint32_t) casts for ${size} in the MIN() operations ensure:

  • Type consistency across all comparison operands
  • Prevention of signed/unsigned comparison warnings
  • Correct behavior regardless of the template parameter's original type

This aligns with the uint32_t types used for chunk, start, and end variables.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)

48-48: Static analysis: unused ctxt parameter is a false positive.

The ctxt parameter is required by the computeTransientBuffersSize interface defined in NodeTemplate (see Deeploy/DeeployTypes.py lines 142-161). Keeping it ensures signature consistency across all template implementations, even when a specific template doesn't need the context.

DeeployTest/testMVP.py (1)

127-128: LGTM: Platform-specific n_cores assignment is well-guarded.

The conditional type check ensures n_cores is only set for PULPPlatform instances where the cluster engine requires it for im2col buffer sizing. This prevents unintended side effects on other platforms.

Deeploy/Targets/PULPOpen/Deployer.py (1)

69-76: Nice propagation of n_cores metadata.

Annotating the operator representation before binding keeps the PULP kernels aware of the cluster configuration without reselecting engines downstream.

Deeploy/Targets/PULPOpen/Parsers.py (1)

124-182: Depthwise parser reuse looks solid.

Layering the DW-specific guards and bias plumbing on top of the shared Conv2D parser keeps the operator representation consistent while avoiding duplication.

Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)

51-58: Returning the engine simplifies metadata access.

Surfacing the DeploymentEngine object lets callers annotate shared properties (like n_cores) without reconstituting mapper state.

Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)

68-70: Wrapper delegation stays aligned.

Forwarding _selectEngine preserves the new return type and keeps engine selection consistent through wrappers.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)

327-349: Bug: visited set built from characters, not names.

set(self.name) splits the string. Use a singleton set.

-        visited = set(self.name)
+        visited = {self.name}
🧹 Nitpick comments (10)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

30-38: Type the regular Im2Col buffer like the DW path for consistent C code generation.

DW assigns the im2col buffer’s referenced type; regular Im2Col doesn’t. Align both to avoid a void* in generated code.

Apply:

   def hoistTransientBuffers(self, ctxt: NetworkContext,
                             operatorRepresentation: OperatorRepresentation) -> Tuple[NetworkContext, Dict, List[str]]:
     im2col_name, im2col_dim = PULP2DFloatConvIm2ColTemplate.computeTransientBuffersSize(
         ctxt, operatorRepresentation)[0]
     ctxt.hoistTransientBuffer(im2col_name, im2col_dim)

+    # Match im2col buffer element type to input
+    ctxt.lookup(im2col_name)._type.referencedType = ctxt.lookup(
+        operatorRepresentation['data_in'])._type.referencedType
+
     operatorRepresentation['ctxtBuffer'] = im2col_name
     operatorRepresentation['ctxtBufferSize'] = im2col_dim
     return ctxt, operatorRepresentation, [im2col_name]

99-101: Fix typos in comments (“ChannelOout” → “ChannelOut”).

Purely cosmetic.

-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp})
+// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})

Also applies to: 131-132

DeeployTest/testMVP.py (1)

80-83: Validate --cores > 0 before applying.

Avoids silent misconfiguration.

-    clusters = [engine for engine in platform.engines if isinstance(engine, PULPClusterEngine)]
-    for cluster in clusters:
-        cluster.n_cores = args.cores
+    clusters = [engine for engine in platform.engines if isinstance(engine, PULPClusterEngine)]
+    if args.cores <= 0:
+        raise ValueError("--cores must be >= 1")
+    for cluster in clusters:
+        cluster.n_cores = args.cores
DeeployTest/generateNetwork.py (1)

87-90: Guard against non-positive cores.

Same rationale as in testMVP.py.

-    clusters = [engine for engine in platform.engines if isinstance(engine, PULPClusterEngine)]
-    for cluster in clusters:
-        cluster.n_cores = args.cores
+    clusters = [engine for engine in platform.engines if isinstance(engine, PULPClusterEngine)]
+    if args.cores <= 0:
+        raise ValueError("--cores must be >= 1")
+    for cluster in clusters:
+        cluster.n_cores = args.cores
Deeploy/DeeployTypes.py (2)

241-245: Avoid mutable default for shape.

Use an immutable default for safety.

-    def __init__(self, name: str = '', shape = [1], aliases: Optional[List[str]] = None):
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), aliases: Optional[List[str]] = None):

398-401: Avoid mutable defaults in ConstantBuffer.

Use immutable defaults and convert.

-    def __init__(self, name: str = '', shape = [1], values = [0]):
-        super().__init__(name, shape)
-        values = np.asarray(values)
+    def __init__(self, name: str = '', shape: Sequence[int] = (1,), values: Union[Sequence, np.ndarray] = (0,)):
+        super().__init__(name, shape)
+        values = np.asarray(values)
Deeploy/Targets/PULPOpen/Deployer.py (2)

81-83: Remove unused variable.

autoTransposePass is assigned but not used.

-    autoTransposePass = AutoTransposeMergePass()
-    #self.ctxt, self.layerBinding = autoTransposePass.apply(self.ctxt, self.graph, self.layerBinding)
+    # autoTransposePass = AutoTransposeMergePass()
+    # self.ctxt, self.layerBinding = autoTransposePass.apply(self.ctxt, self.graph, self.layerBinding)

88-92: Simplify bind flow.

super().bind() raises on failure; the boolean check is redundant.

-        if not super().bind():
-            return False
-
-        self.ctxt.hoistGlobalDefinition("cluster_dev", "extern struct pi_device cluster_dev;")
-        return True
+        super().bind()
+        self.ctxt.hoistGlobalDefinition("cluster_dev", "extern struct pi_device cluster_dev;")
+        return True
CHANGELOG.md (1)

105-107: Fix sentence fragment.

Prepend “Fixed” for consistency.

-- Data types in the PULPOpen FP add and mul templates.
+- Fixed data types in the PULPOpen FP add and mul templates.
Deeploy/Targets/PULPOpen/Parsers.py (1)

116-118: Use newCtxt.lookup for consistency after parse.

Avoid mixing contexts.

-            for idx, inputNode in enumerate(node.inputs):
-                self.operatorRepresentation[inputs[idx]] = ctxt.lookup(inputNode.name).name
+            for idx, inputNode in enumerate(node.inputs):
+                self.operatorRepresentation[inputs[idx]] = newCtxt.lookup(inputNode.name).name
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5bd025 and 49cd3b1.

📒 Files selected for processing (11)
  • CHANGELOG.md (6 hunks)
  • Deeploy/DeeployTypes.py (3 hunks)
  • Deeploy/Targets/PULPOpen/Deployer.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Parsers.py (2 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (8 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (5 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1 hunks)
  • DeeployTest/generateNetwork.py (3 hunks)
  • DeeployTest/testMVP.py (3 hunks)
  • DeeployTest/testUtils/testRunner.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • DeeployTest/testUtils/testRunner.py
  • Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
  • Deeploy/Targets/PULPOpen/Parsers.py
  • CHANGELOG.md
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (7)
Deeploy/Targets/PULPOpen/Deployer.py (2)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPClusterEngine (245-254)
Deeploy/DeeployTypes.py (6)
  • _selectEngine (2565-2569)
  • bind (1600-1625)
  • bind (1767-1791)
  • bind (2064-2108)
  • bind (2729-2765)
  • hoistGlobalDefinition (842-855)
Deeploy/DeeployTypes.py (2)
Deeploy/CommonExtensions/NetworkDeployers/NetworkDeployerWrapper.py (1)
  • _selectEngine (69-70)
Deeploy/EngineExtension/NetworkDeployers/EngineColoringDeployer.py (1)
  • _selectEngine (51-58)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
  • NodeTemplate (87-229)
  • computeTransientBuffersSize (142-162)
  • NetworkContext (508-1020)
  • hoistTransientBuffers (165-185)
  • hoistTransientBuffer (821-840)
  • lookup (720-752)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
  • inputs (2503-2520)
  • lookup (720-752)
  • parseNode (1033-1048)
  • parseNodeCtxt (1051-1076)
  • NetworkContext (508-1020)
Deeploy/Targets/Generic/Parsers.py (31)
  • Conv2DParser (1218-1284)
  • parseNode (19-27)
  • parseNode (49-58)
  • parseNode (81-94)
  • parseNode (102-112)
  • parseNode (163-169)
  • parseNode (193-207)
  • parseNode (229-243)
  • parseNode (271-293)
  • parseNode (333-351)
  • parseNode (373-385)
  • parseNode (422-434)
  • parseNode (469-473)
  • parseNode (496-507)
  • parseNode (531-542)
  • parseNode (578-582)
  • parseNodeCtxt (29-41)
  • parseNodeCtxt (60-76)
  • parseNodeCtxt (114-155)
  • parseNodeCtxt (171-185)
  • parseNodeCtxt (209-221)
  • parseNodeCtxt (245-263)
  • parseNodeCtxt (295-325)
  • parseNodeCtxt (353-365)
  • parseNodeCtxt (387-414)
  • parseNodeCtxt (436-461)
  • parseNodeCtxt (475-488)
  • parseNodeCtxt (509-523)
  • parseNodeCtxt (544-570)
  • parseNodeCtxt (584-590)
  • parseNodeCtxt (603-620)
DeeployTest/generateNetwork.py (1)
Deeploy/Targets/PULPOpen/Platform.py (2)
  • PULPClusterEngine (245-254)
  • PULPPlatform (257-265)
Deeploy/Targets/PULPOpen/Platform.py (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
  • RemoveEmptyConvBiasPass (518-523)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-182)
Deeploy/Targets/Generic/Parsers.py (1)
  • SliceParser (97-155)
DeeployTest/testMVP.py (1)
Deeploy/Targets/PULPOpen/Platform.py (1)
  • PULPClusterEngine (245-254)
🪛 LanguageTool
CHANGELOG.md

[style] ~32-~32: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ... conv 2D, and reduce mean operations. - FP PULP DW conv lowering optimization pass...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.2)
Deeploy/Targets/PULPOpen/Deployer.py

81-81: Local variable autoTransposePass is assigned to but never used

Remove assignment to unused variable autoTransposePass

(F841)

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py

48-48: Unused static method argument: ctxt

(ARG004)

🔇 Additional comments (7)
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (1)

27-38: Correct fix for the non-batched operand bug.

The conditional pointer increments properly handle batched vs. non-batched operands. When an operand is shared across all batch iterations (non-batched), its pointer no longer advances incorrectly. The output pointer correctly advances unconditionally since each batch element writes to its own output space.

Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)

105-116: Arg ordering in Im2Col call looks correct (X,Y, kernelX,kernelY, strideX,strideY).

Intentional X/Y order here is consistent with the PULP Im2Col prototypes for FP. LGTM.


21-24: The review comment's core concern about call order is already addressed by the current implementation.

Verification confirms that annotateNCores() is called at line 85, immediately before super().bind() at line 88 in PULPDeployer.bind(). This ensures operatorRepresentation["n_cores"] is populated before the parent class processes templates. Both flagged locations (lines 21-24 and 51-54 in FloatConvTemplate.py) use the same pattern and would receive the pre-set value.

No alternate code paths that bypass annotateNCores() before template access were found.

DeeployTest/testMVP.py (1)

203-209: CLI addition looks good.

Clear help text and default. LGTM.

DeeployTest/generateNetwork.py (1)

191-196: CLI flag addition LGTM.

Deeploy/DeeployTypes.py (1)

2565-2569: Engine selection refactor LGTM.

Returning the engine and using its Mapping at bind-time is clean and extensible.

Also applies to: 2585-2587

Deeploy/Targets/PULPOpen/Platform.py (1)

77-79: FP DWConv2D mapping and optimizer updates LGTM.

  • Conv now includes the FP DW path.
  • Slice split into DMA vs. CPU binding is a nice separation.
  • RemoveEmptyConvBiasPass addition is appropriate.

Also applies to: 107-110, 134-136, 235-236

Comment on lines +178 to +182
# Check if DW
if self.operatorRepresentation['group'] == self.operatorRepresentation['ch_im_in']:
return newCtxt, True

return ctxt, False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Confirm DW channel multiplier assumption.

If the DW FP kernel only supports multiplier=1, add a check that ch_im_out == ch_im_in. Otherwise, document supported multipliers.


🏁 Script executed:

#!/bin/bash
# Inspect DW FP prototypes/usages to confirm supported channel multiplier.
rg -n -C3 -P 'PULP_DW_Conv2d_Im2Col_fp\d+.*HWC\s*\(' TargetLibraries/PULPOpen
rg -n -C2 -P 'ch_im_in|ch_im_out|group' TargetLibraries/PULPOpen

Length of output: 2468


I need to gather more information about the kernel implementation and the Parsers.py context.

[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Get full function signature of the DW kernel
cat TargetLibraries/PULPOpen/src/DWConvolution_fp32.c | head -20

# Search for any multiplier or assertion checks in DW kernel
rg -n -A5 -B5 'F_total|multiplier|assert' TargetLibraries/PULPOpen/src/DWConvolution_fp32.c

# Check the Parsers.py file around lines 178-182 and before/after for context
sed -n '160,200p' Deeploy/Targets/PULPOpen/Parsers.py

Length of output: 7446


Add validation that depthwise kernels enforce multiplier=1.

The kernel implementation assumes F_total == C (multiplier=1) by repeatedly dividing F_total / C in loop bounds and channel indexing. However, the parser only confirms DW operation on line 178 via group == ch_im_in, without validating that ch_im_out == ch_im_in. If a DW operation is submitted with multiplier != 1, the kernel would silently produce incorrect results due to non-integer channel calculations. Add a check that ch_im_out == ch_im_in for DW operations in the parser, or document the multiplier=1 constraint.

🤖 Prompt for AI Agents
In Deeploy/Targets/PULPOpen/Parsers.py around lines 178-182, the code detects
depthwise by comparing operatorRepresentation['group'] ==
operatorRepresentation['ch_im_in'] but does not validate multiplier; add a check
that operatorRepresentation['ch_im_out'] == operatorRepresentation['ch_im_in']
whenever a DW op is detected, and if they differ fail fast (raise a clear
ValueError/ParserError or return the failure path) with a message like
"depthwise multiplier != 1 not supported" so invalid configurations are rejected
rather than silently producing wrong kernels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Feature Addition of new features

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

3 participants