-
Notifications
You must be signed in to change notification settings - Fork 22
TinyViT on non-tiled Siracusa #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
TinyViT on non-tiled Siracusa #117
Conversation
…ed closure naming issue
…ffer size computation. Added fixes to remove warnings
…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa
…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings
…lice and reduce mean bindings for Siracusa.
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds FP32 depthwise Conv2D (Im2Col) with optional bias across parsers, bindings, templates, and kernels; introduces DMA-based Slice path and float ReduceMean bindings; fixes alias_of propagation and closure naming; propagates n_cores through contexts/deployers/tests; and adds Siracusa CI FP test variants. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant ONNX as ONNX Graph
participant Lower as NHWC Lowering
participant Map as PULPOpen Mapping
participant Parser as PULPFPDWConv2DParser
participant Bind as PULPFloatDWConv2DBindings
participant Tmpl as DW Im2Col Template
participant Kern as PULP_DW_Conv2d_Im2Col_fp32
ONNX->>Lower: detect Conv (inspect group attr)
Lower-->>Map: lowered node (transpose weights/inputs if needed)
Map->>Parser: parseNode / parseNodeCtxt
Parser-->>Map: context (has_bias, inputs, depthwise)
Map->>Bind: select DW conv bindings
Bind-->>Tmpl: emit template call (include pSrcBias, has_bias, ctxtBuffer)
Tmpl->>Tmpl: compute/hoist per-core ctxtBuffer using ctxt.n_cores
Tmpl->>Kern: call kernel(..., pSrcBias, has_bias, pContextBuffer)
Kern-->>Tmpl: perform per-core DW Im2Col convolution and write outputs
Tmpl-->>Map: generated code/artifacts
sequenceDiagram
autonumber
participant ONNX as ONNX Graph
participant Map as PULPOpen Mapping
participant SliceStd as SliceMapper (PULPSliceBindings)
participant SliceDMA as DMASliceMapper (PULPDMASliceBindings)
ONNX->>Map: dispatch Slice node
alt standard slice path
Map->>SliceStd: select PULPSliceBindings
SliceStd-->>Map: generate standard slice code
else DMA-capable slice path
Map->>SliceDMA: select PULPDMASliceBindings
SliceDMA-->>Map: generate DMA-based slice code
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📚 Learning: 2025-09-24T11:56:35.781Z
Applied to files:
📚 Learning: 2025-09-24T12:49:17.889Z
Applied to files:
🧬 Code graph analysis (1)Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
🔇 Additional comments (5)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
344-346
: Prevent KeyError when 'group' attr is absentDepthwise detection assumes 'group' exists. If missing, this pass will crash. Use a safe default (1) as elsewhere in the file.
- if opNode.attrs['group'] == 1: - return graph + group = opNode.attrs.get('group', 1) + if group == 1: + return graphDeeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py (1)
16-23
: Loop index type should match buffer indexingUse unsigned 32-bit for
i
to match array indexing and avoid UB on large sizes.(Handled in the diff above.)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
131-149
: DW Im2Col arg order — swap X/Y for image dims, kernel dims and strides; bias placement is correct
- Prototype (TargetLibraries/PULPOpen/inc/kernel/Conv.h): PULP_DW_Conv2d_Im2Col(..., uint32_t H, uint32_t W, uint32_t C, ..., uint32_t P, uint32_t Q, uint32_t SP, uint32_t SQ, const float32_t *pSrcBias, const bool has_bias, float32_t *pDstC, uint32_t pad_top, uint32_t pad_bottom, uint32_t pad_left, uint32_t pad_right, float32_t *pContextBuffer).
- Template call (Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py lines 131–149) currently passes (dim_im_in_x, dim_im_in_y), (dim_kernel_x, dim_kernel_y), (stride_x, stride_y). These should be swapped to (dim_im_in_y, dim_im_in_x), (dim_kernel_y, dim_kernel_x), (stride_y, stride_x).
- (bias, has_bias), padding order and context buffer position are already correct.
🧹 Nitpick comments (16)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
10-11
: Avoid libm at runtime: use integer log2 for NUM_CORESReplace double
log2()
with__builtin_ctz()
to avoid libm dependency and ensure compile‑time integer computation. Also assert the power‑of‑two assumption that the chunking math relies on.Apply this diff:
-uint32_t ${nodeName}_log2Core = (uint32_t) log2(NUM_CORES); +// NUM_CORES must be a power of two; compute log2(NUM_CORES) without libm +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L +_Static_assert((NUM_CORES & (NUM_CORES - 1)) == 0, "NUM_CORES must be power of two"); +#endif +uint32_t ${nodeName}_log2Core = (uint32_t)__builtin_ctz(NUM_CORES);.github/workflows/ci-platform-siracusa.yml (1)
98-98
: TinyViT demo test: validate runtime envelope on SiracusaGreat to see model‑level coverage. Consider job timeout safeguards if this grows. For now, please confirm the demo’s memory/latency fit the non‑tiled target on the selected runner.
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py (1)
11-21
: LGTM: per-iteration A/B selection handles mixed batched/unbatched inputs.This fixes the non-batched handling cleanly and mirrors expected strides (A: MN, B: NO). Consider applying the same per-iteration pointer style in GEMM for consistency, but not blocking.
Deeploy/DeeployTypes.py (3)
320-324
: Prefer None over mutable default and let ctor initializealias_of
.Passing
alias_of = []
creates a new list here (safe), but useNone
consistently and centralize initialization in the constructor to avoid future in-place mutations leaking via shared references.- return (cls( - name = node.name, - shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, - alias_of = [], - )) + return cls( + name = node.name, + shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape, + alias_of = None, + )
469-474
: Avoid sharing alias list between buffers; copy it.Passing
buffer.alias_of
by reference couples future mutations between the source and newConstantBuffer
.- ret = cls( + ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), )
1176-1180
: Passalias_of=None
and let the ctor initialize.Keeps alias handling consistent and avoids accidental shared lists if ctor defaults are ever reintroduced.
- nb = ctxt.VariableBuffer( - name = name, - shape = node.shape, - alias_of = [], - ) + nb = ctxt.VariableBuffer( + name = name, + shape = node.shape, + alias_of = None, + )TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (3)
137-141
: Remove unnecessary modulo and precompute local weight pointer inside inner loop
im2col_idx % (P * Q)
is redundant. Precompute a per-filter base pointer for tighter inner loops.Apply this diff in both bias and non-bias paths:
- for (uint32_t im2col_idx = 0; im2col_idx < P * Q; im2col_idx++) { - sum += - im2col_buffer[im2col_idx] * - weight_ptr[(f - ch_out_start) * P * Q + im2col_idx % (P * Q)]; - } + const float32_t *local_w = weight_ptr + (f - ch_out_start) * (P * Q); + for (uint32_t k = 0; k < P * Q; k++) { + sum += im2col_buffer[k] * local_w[k]; + }Also applies to: 236-240
47-48
: Remove unused variablekernel_size
kernel_size
is computed but never used.- uint32_t kernel_size = P * Q * F_total;
61-97
: Consider simplifying padding handling with a single p/q nested loopYou can mirror the im2col fill used in the regular Im2Col kernel (check Convolution_fp32.c), avoiding four separate padding loops and the type-casting hazards.
TargetLibraries/PULPOpen/inc/kernel/Conv.h (1)
20-27
: DocumentpContextBuffer
size expectations and bias semanticsThe new signatures look correct. Please add brief comments indicating required
pContextBuffer
sizes:
- Conv2d_Im2Col: NUM_CORES * (CPQ) floats
- DW_Conv2d_Im2Col: NUM_CORES * (P*Q) floats
Also note thathas_bias
governs whetherpSrcBias
is read.Also applies to: 29-36
Deeploy/Targets/PULPOpen/Bindings.py (2)
239-245
: Constrain FP DWConv binding to supported float type(s).FloatConvTemplate/referenceDW2DIm2ColTemplate likely only implements float32. Avoid advertising unsupported float types by narrowing the generator.
Apply this diff:
- ForkTransformer) for float_type in FloatDataTypes + ForkTransformer) for float_type in [float32_t]
299-302
: ReduceMean (float): restrict axes integer type to int32.ONNX axes are int64/int32; templates and parsers typically operate on int32 here. Generating bindings for all signed integer widths adds binder churn without benefit.
Apply this diff:
- for integer_type in SignedIntegerDataTypes - for float_type in FloatDataTypes + for integer_type in [int32_t] + for float_type in FloatDataTypesDeeploy/Targets/PULPOpen/Platform.py (1)
108-108
: Depthwise vs regular FP Conv: prevent mapper shadowing by ordering.If PULPFPConv2DParser accepts DW cases, FPConv2DMapper may shadow FPDWConv2DMapper due to precedence.
Apply this diff to prioritize DWConv first:
- 'Conv': ConvLayer([FPConv2DMapper, FPDWConv2DMapper]), + 'Conv': ConvLayer([FPDWConv2DMapper, FPConv2DMapper]),Alternatively, ensure PULPFPConv2DParser explicitly rejects depthwise.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
42-51
: Fix return typing and silence ARG004 (unusedctxt
)
- Return type now returns a
str
for size; widen the annotation to reflect this.- Rename
ctxt
to_ctxt
to address Ruff ARG004.Apply:
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(_ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, str, IntVar]]]:
93-93
: Typo in comment: “ChannelOout” → “ChannelOut”Minor doc polish.
-// 2D FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
124-126
: Typo in comment: “ChannelOout” → “ChannelOut”Same typo in DW template header.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
-
.github/workflows/ci-platform-siracusa.yml
(3 hunks) -
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
(1 hunks) -
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(2 hunks) -
Deeploy/DeeployTypes.py
(4 hunks) -
Deeploy/Targets/Generic/Parsers.py
(1 hunks) -
Deeploy/Targets/PULPOpen/Bindings.py
(5 hunks) -
Deeploy/Targets/PULPOpen/Parsers.py
(1 hunks) -
Deeploy/Targets/PULPOpen/Platform.py
(7 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatAddTemplate.py
(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
(4 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py
(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMatMulTemplate.py
(1 hunks) -
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py
(1 hunks) -
TargetLibraries/PULPOpen/inc/DeeployPULPMath.h
(1 hunks) -
TargetLibraries/PULPOpen/inc/kernel/Conv.h
(1 hunks) -
TargetLibraries/PULPOpen/src/Convolution_fp32.c
(4 hunks) -
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (2)
add_aliases
(326-345)lookup
(763-795)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate
(82-224)computeTransientBuffersSize
(137-157)NetworkContext
(554-1063)hoistTransientBuffers
(160-180)hoistTransientBuffer
(864-883)lookup
(763-795)
TargetLibraries/PULPOpen/inc/kernel/Conv.h (2)
TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
PULP_Conv2d_fp32_fp32_fp32_HWC
(10-102)PULP_Conv2d_Im2Col_fp32_fp32_fp32_HWC
(104-219)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
PULP_DW_Conv2d_Im2Col_fp32_fp32_fp32_HWC
(10-251)
Deeploy/Targets/PULPOpen/Bindings.py (3)
Deeploy/CommonExtensions/DataTypes.py (4)
float32_t
(74-78)int8_t
(12-15)int32_t
(24-27)uint8_t
(36-39)Deeploy/Targets/Generic/TypeCheckers.py (3)
SliceChecker
(37-51)ConvChecker
(456-474)ReduceMeanChecker
(310-324)Deeploy/AbstractDataTypes.py (1)
PointerClass
(522-545)
Deeploy/Targets/PULPOpen/Platform.py (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
RemoveEmptyConvBiasPass
(667-679)Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser
(122-188)Deeploy/DeeployTypes.py (1)
NodeMapper
(1695-1838)Deeploy/Targets/Generic/Parsers.py (1)
SliceParser
(97-155)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs
(2492-2509)lookup
(763-795)parseNode
(1076-1091)parseNodeCtxt
(1094-1119)NetworkContext
(554-1063)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser
(1182-1248)parseNode
(19-27)parseNode
(49-58)parseNode
(81-94)parseNode
(102-112)parseNode
(163-169)parseNode
(193-207)parseNode
(229-251)parseNode
(291-304)parseNode
(326-338)parseNode
(375-387)parseNode
(422-426)parseNode
(449-460)parseNode
(484-495)parseNode
(531-535)parseNode
(551-554)parseNodeCtxt
(29-41)parseNodeCtxt
(60-76)parseNodeCtxt
(114-155)parseNodeCtxt
(171-185)parseNodeCtxt
(209-221)parseNodeCtxt
(253-283)parseNodeCtxt
(306-318)parseNodeCtxt
(340-367)parseNodeCtxt
(389-414)parseNodeCtxt
(428-441)parseNodeCtxt
(462-476)parseNodeCtxt
(497-523)parseNodeCtxt
(537-543)parseNodeCtxt
(556-573)parseNodeCtxt
(586-605)
🪛 Ruff (0.13.1)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
43-43: Unused static method argument: ctxt
(ARG004)
Deeploy/DeeployTypes.py
433-433: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
433-433: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (fail-input2)
🔇 Additional comments (23)
Deeploy/Targets/PULPOpen/Templates/FloatMulTemplate.py (1)
13-14
: LGTM: unsigned arithmetic for start/end boundsCasting
${size}
touint32_t
aligns types and prevents signed/unsigned min mixups. Indices and loop vars are consistentlyuint32_t
..github/workflows/ci-platform-siracusa.yml (2)
75-75
: Reshape + skip connection test: ensure alias_of fix is coveredGood addition to exercise the alias/buffer lifetime fix. Please confirm this test indeed hits the alias_of scenario (non‑tiled) and fails without the fix.
61-64
: CI workflow lists new DW conv FP tests but those tests aren't registered in the repo.github/workflows/ci-platform-siracusa.yml (lines 61–64) lists testFloat2DDWConvolution, testFloat2DDWConvolutionBias, testFloat2DDWConvolutionZeroBias but a repo search found no matches — add/register these tests or update the workflow. After they exist, verify the shapes fit non-tiled Siracusa memory and that the zero-bias test exercises the RemoveEmptyConvBias pass.
Deeploy/Targets/PULPOpen/Templates/FloatGemmTemplate.py (2)
27-37
: Good: batch-gated A/B/C strides prevent OOB on non-batched inputs.Looks correct and aligns with the matmul change. Please confirm the strides match the in-memory layout for transposed cases (i.e., still MN for A and NO for B even when transA/transB are set).
39-39
: Unconditional out stride can desync when data_out aliases C but C is not batched.If data_out may alias C while C is reused (non-batched), advancing only the out pointer will diverge the pair across iterations and yield incorrect reads/writes. Either:
- enforce at mapping time that alias-of(C, data_out) implies identical batched flags, or
- gate the out advance with an explicit data_out_batched flag.
Example if such a flag is available:
- ref_${data_out}_${data_out} += ${M} * ${O}; + % if data_out_batched: + ref_${data_out}_${data_out} += ${M} * ${O}; + % endifDeeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
370-374
: Verify FP DW Conv layout: input-only transpose may be insufficientFor FP DW Conv, you transpose only the input and not the weights. Confirm kernel expectations: if weights are stored in channels-first layout, they likely also need DW-specific permutation (as above) to avoid semantic mismatch when setting channels_first to default.
385-403
: New PULPDWConvPass wiring looks goodPattern graph + partial replacement is clear and consistent with the pipeline.
404-426
: New PULPFPDWConvPass wiring looks goodExtends NHWC conversion to FP DW conv as intended.
505-505
: Pass ordering LGTMAdding PULPFPDWConvPass after PULPDWConvPass in the sequence is reasonable and non-overlapping.
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
41-43
: Confirm context buffer sizing per coreThis implementation expects
pContextBuffer
sized toNUM_CORES * (P*Q)
floats. Please confirm templates/allocations provide that.TargetLibraries/PULPOpen/src/Convolution_fp32.c (2)
10-16
: LGTM on signature extension and core partitioningBias handling and per-core channel slicing are consistent. Weight and output indexing are correct.
Please ensure all call sites updated to pass
pSrcBias
,has_bias
, andpContextBuffer
(for the Im2Col variant).Also applies to: 18-39
139-176
: Good reuse of im2col buffer and contiguous weight blocksThe inner product uses a contiguous
kernel_size
slice, matching the buffer layout. Looks correct.Also applies to: 179-218
Deeploy/Targets/PULPOpen/Parsers.py (3)
104-116
: Bias handling looks correct; maps optional third input and flags has_biasParsing and context mapping are consistent with the new kernels.
Confirm downstream templates use operatorRepresentation["has_bias"] and ["bias"] as introduced here.
127-159
: DW FP2D parser shape/padding checks look reasonableDepthwise gating via inputs count and padding symmetry is fine.
185-186
: Depthwise condition check aligns with ONNX semantics
group == ch_im_in
is the expected DW condition.Deeploy/Targets/PULPOpen/Bindings.py (2)
143-153
: DMA Slice binding switch looks good.Good move to DMASliceTemplate with MemoryAwareForkTransformer for integer tensors; aligns with the DMA path introduced in Platform.
155-165
: Float Slice mapping: validate starts/ends/axes/steps typing.Using uint8_t pointers for starts/ends/axes/steps is consistent with the DMA variant, but please verify SliceTemplate.referenceTemplate expects raw byte buffers for these (not int32_t). Also confirm this path covers only float inputs while integers fall back to DMA.
Deeploy/Targets/PULPOpen/Platform.py (4)
77-79
: FP DWConv mapper wiring: LGTM.Parser and bindings are correctly introduced and exposed.
92-95
: Two Slice mappers ordering: LGTM; confirm fallback behavior.Order ensures float Slice binds first; integers should fall back to DMA. Please confirm CI covers both float and int slice paths.
8-9
: RemoveEmptyConvBiasPass import: LGTM.Consistent with optimizer usage below.
235-236
: Optimizer: adding RemoveEmptyConvBiasPass is sensible.Placement at the end of the topology passes is fine as long as it runs pre-binding. If the intent is to normalize bias presence for FP convs (so templates always see 3 inputs), consider placing it before conv-related merges; otherwise current placement is acceptable.
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
60-62
: Good: transient buffer typed to input elementEnsures the generated C sees the correct pointee type instead of
void
. This helps alloc/codegen correctness.
81-81
: Bias args correct — signature matches PULP_Conv2d_fp32_fp32_fp32_HWCTargetLibraries/PULPOpen/src/Convolution_fp32.c and TargetLibraries/PULPOpen/inc/kernel/Conv.h declare pSrcBias, has_bias in that exact position before pDstC/padding, so adding ${bias}, ${has_bias} is correct.
...CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
Show resolved
Hide resolved
Signed-off-by: Călin Diaconu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/DeeployTypes.py (1)
241-241
: Fix remaining mutable default arguments.
VariableBuffer.__init__
still has mutable defaults (shape = [1]
,alias_of: Optional[List[str]] = []
). These should use immutable defaults (tuples orNone
) to follow Python best practices and avoid shared mutable state across calls.Apply this diff:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None):The defensive copy on line 260 already handles the
None
case correctly, so no further changes are needed there.Based on past review feedback and static analysis (Ruff B006).
🧹 Nitpick comments (2)
Deeploy/DeeployTypes.py (2)
454-454
: Optional: Avoid redundant defensive copy.Since
VariableBuffer.__init__
already defensively copiesalias_of
(line 260), passinglist(alias_of)
here results in double copying. You can simplify by passing the list directly:- super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + super().__init__(name, shape, alias_of)This is a minor optimization and doesn't affect correctness.
490-495
: Optional: Make alias propagation more explicit.While correct (the constructor defensively copies
alias_of
), explicitly copying here would make the intent clearer:ret = cls( name = buffer.name, shape = buffer.shape, values = values, - alias_of = buffer.alias_of, + alias_of = list(buffer.alias_of), # Explicit copy for clarity )This is a minor clarity improvement and doesn't affect correctness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md
(4 hunks)Deeploy/DeeployTypes.py
(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
Deeploy/DeeployTypes.py (1)
DeeployTest/testUtils/dmaUtils.py (1)
size
(72-73)
🪛 Ruff (0.13.3)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (4)
Deeploy/DeeployTypes.py (4)
325-329
: LGTM!Correctly initializes
alias_of
with an explicit empty list for each new instance, avoiding shared mutable state.
362-362
: LGTM!Correctly returns a defensive copy of the internal alias list, preventing external code from accidentally mutating the buffer's state.
404-406
: LGTM!The updates to support
size
asUnion[int, str]
are consistent acrossTransientBuffer.__init__
,_bufferRepresentation
, andhoistTransientBuffer
. This enables symbolic size expressions, which aligns with the PR's tiling and optimization goals.Also applies to: 431-431, 888-888
1200-1204
: LGTM!Correctly initializes non-global outputs with an explicit empty
alias_of
list, maintaining consistency with the alias tracking implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
♻️ Duplicate comments (2)
Deeploy/DeeployTypes.py (2)
241-261
: Fix mutable defaults in VariableBuffer.init (shape/alias_of)Defaulting shape to [1] and alias_of to [] risks shared state; use immutable defaults and copy incoming lists.
Apply:
- def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] = []): + def __init__(self, name: str = '', shape: Sequence[int] = (1,), alias_of: Optional[List[str]] = None): self.name: str = name #: str: Canonical name that this buffer is registered as in the NetworkContext - self.shape: Sequence[ - int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes + self.shape: Sequence[int] = shape #: Sequence[int]: Represents the dimensions of the underlying tensor as a sequence of dimension sizes @@ - self.alias_of: List[str] = list(alias_of) if alias_of is not None else [] + self.alias_of: List[str] = list(alias_of) if alias_of is not None else []As per Ruff B006.
452-456
: Fix mutable defaults in ConstantBuffer.init (shape/values)Use immutable defaults and forward alias_of safely.
Apply:
- def __init__(self, name: str = '', shape = [1], values = [0], alias_of: Optional[List[str]] = None): - # Pass a copy of alias_of to avoid shared references - super().__init__(name, shape, list(alias_of) if alias_of is not None else None) + def __init__(self, + name: str = '', + shape: Sequence[int] = (1,), + values: Union[Sequence, np.ndarray] = (0,), + alias_of: Optional[List[str]] = None): + super().__init__(name, shape, alias_of)(values processing via np.asarray below remains correct.)
🧹 Nitpick comments (4)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (3)
21-24
: Im2Col buffer bytes should derive from input element width (not weight).The im2col buffer stores input elements; computing bytes via weight_type may work today (fp32), but couples unrelated types and can break if they diverge. Prefer data_in_type width.
Apply:
- im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8) * ctxt.n_cores * operatorRepresentation[ - 'ch_im_in'] * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + im2col_dim = (operatorRepresentation["data_in_type"].referencedType.typeWidth // 8) \ + * ctxt.n_cores \ + * operatorRepresentation['ch_im_in'] \ + * operatorRepresentation['dim_kernel_x'] \ + * operatorRepresentation['dim_kernel_y']
43-51
: Fix return type annotation and stale comment; ensure consistent sizing semantics.
- Return type says List[Tuple[str, str]] but the code returns int; align with base signature (Union[int, IntVar]).
- Comment says “value is only used as string”, but an int is produced. Update comment and/or value.
Also, consider returning element count (NUM_CORES * Kx * Ky) and relying on the buffer’s referencedType to derive bytes, for consistency with the established pattern.
- def computeTransientBuffersSize(ctxt: NetworkContext, - operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, str]]: + def computeTransientBuffersSize(ctxt: NetworkContext, + operatorRepresentation: OperatorRepresentation) -> List[Tuple[str, Union[int, IntVar]]]: @@ - # Memory allocation for the im2col buffer can be dynamic, based on the number of cores. - # WARNING: This works because value is only used as string, in the allocate template. - im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 - ) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] + # Im2Col buffer scales with number of cores and kernel area; size below is in ELEMENTS. + # Byte size is derived from the buffer's referenced type at allocation/codegen time. + im2col_dim = ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y']Note: If you prefer to keep bytes here, update the comment accordingly and see the next comment on double-accounting risk.
124-154
: DW Im2Col call wiring is consistent; minor docstring typo.The DW Im2Col invocation mirrors the FP path with bias and ctxtBuffer. Typo: “ChannelOout” → “ChannelOut”.
-// 2D DW FP Conv HWC with Im2Col and ChannelOout parallelism (Name: ${nodeName}, Op: ${nodeOp}) +// 2D DW FP Conv HWC with Im2Col and ChannelOut parallelism (Name: ${nodeName}, Op: ${nodeOp})Deeploy/DeeployTypes.py (1)
2708-2714
: Preserve context name when recreating NetworkContext during parseRecreating self.ctxt drops custom names; pass name=self.name to keep mangling consistent.
Apply:
- self.ctxt = NetworkContext( + self.ctxt = NetworkContext( variableBuffer = self.Platform.VariableBuffer, constantBuffer = self.Platform.ConstantBuffer, structBuffer = self.Platform.StructBuffer, transientBuffer = self.Platform.TransientBuffer, - n_cores = self.ctxt.n_cores, + n_cores = self.ctxt.n_cores, + name = self.name, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
(2 hunks)Deeploy/DeeployTypes.py
(15 hunks)Deeploy/Targets/PULPOpen/Deployer.py
(2 hunks)Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
(5 hunks)Deeploy/TilingExtension/TilerExtension.py
(1 hunks)DeeployTest/testMVP.py
(4 hunks)DeeployTest/testRunner_tiled_siracusa.py
(1 hunks)DeeployTest/testUtils/platformMapping.py
(2 hunks)DeeployTest/testUtils/testRunner.py
(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T12:17:21.624Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:46-0
Timestamp: 2025-09-24T12:17:21.624Z
Learning: In Deeploy's PULP templates, transient buffer size calculation can return element counts as strings from computeTransientBuffersSize(), and then manually set the buffer type in hoistTransientBuffers() using ctxt.lookup(buffer_name)._type.referencedType = input_type. The allocation system automatically multiplies the element count by the element size when the buffer type is properly set, achieving correct byte allocation.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py
🧬 Code graph analysis (4)
DeeployTest/testMVP.py (3)
Deeploy/DeeployTypes.py (1)
NetworkDeployer
(3256-3633)DeeployTest/testUtils/platformMapping.py (1)
mapDeployer
(92-261)DeeployTest/testSchedulingExtension.py (1)
_mockScheduler
(29-33)
DeeployTest/testRunner_tiled_siracusa.py (1)
DeeployTest/testUtils/testRunner.py (1)
TestRunner
(279-450)
DeeployTest/testUtils/platformMapping.py (2)
Deeploy/DeeployTypes.py (3)
DeploymentPlatform
(2451-2494)TopologyOptimizer
(2249-2278)NetworkDeployer
(3256-3633)Deeploy/Targets/PULPOpen/Deployer.py (1)
PULPDeployer
(29-128)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (1)
Deeploy/DeeployTypes.py (6)
NodeTemplate
(87-229)computeTransientBuffersSize
(142-162)NetworkContext
(575-1090)hoistTransientBuffers
(165-185)hoistTransientBuffer
(891-910)lookup
(790-822)
🪛 GitHub Actions: CI • Snitch (Tiled)
DeeployTest/testUtils/testRunner.py
[error] 334-334: AttributeError: 'TestRunner' object has no attribute '_cores'. Did you mean: 'n_cores'?
[error] 334-334: Command failed with exit code 1. Failure occurred while executing testRunner_tiled_snitch.py with parameters: -t Tests/$testName --cores=9 --simulator=$simulator --l1 $L1_value --defaultMemLevel=L2 --memAllocStrategy=MiniMalloc --searchStrategy=random-max
🪛 Ruff (0.13.3)
DeeployTest/testMVP.py
58-58: Unused function argument: verbose
(ARG001)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
Deeploy/DeeployTypes.py
452-452: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
452-452: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py
25-25: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (150)
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
- GitHub Check: siracusa-models / test-runner-siracusa
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: linting
- GitHub Check: chimera-kernels / test-runner-chimera
- GitHub Check: generic-kernels / test-runner-generic
- GitHub Check: Test Type Inference (fail-input2)
- GitHub Check: Test Type Inference (pass)
- GitHub Check: Test Type Inference (fail-input0)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (miniMobileNet, 2000, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2-wmem / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: mempool-kernels / test-runner-mempool
🔇 Additional comments (6)
Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py (2)
81-84
: Bias/has_bias plumbing for direct conv looks good.Bias flags are correctly passed to the FP Conv2d HWC call.
100-117
: Im2Col call argument order (X,Y) acknowledged as correct for this kernel.The X,Y ordering for dims, kernel, and stride matches the intended prototype in this codebase.
If you changed any kernel prototypes elsewhere in this PR, ensure consistency across integer/float and DW/non-DW variants.
Deeploy/TilingExtension/TilerExtension.py (1)
987-989
: LGTM: propagate core count to contextAssigning ctxt.n_cores ensures downstream kernels see the configured core count.
DeeployTest/testRunner_tiled_siracusa.py (1)
20-26
: LGTM: cores threaded into TestRunner and CMakePassing args.cores to TestRunner and exporting NUM_CORES is correct.
Also applies to: 28-28
DeeployTest/testUtils/platformMapping.py (1)
92-104
: LGTM: add n_cores and forward to PULPDeployerSignature and forwarding align with deployer changes.
Also applies to: 214-224
DeeployTest/testMVP.py (1)
57-60
: LGTM: introduce and propagate n_coresCLI option, setupDeployer signature, and forwarding to mapDeployer are consistent.
Also applies to: 84-93, 201-206, 261-263
inputOffsets: Dict[str, int] = {}, | ||
n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Avoid mutable default for inputOffsets; initialize from None
Using {} as a default shares state across instances. Accept None and init per inputs. Also simplify the empty-check.
Apply:
- inputOffsets: Dict[str, int] = {},
- n_cores: int = 8):
+ inputOffsets: Optional[Dict[str, int]] = None,
+ n_cores: int = 8):
@@
- if inputOffsets == {}:
- for key in inputTypes.keys():
- inputOffsets[key] = 0
+ if inputOffsets is None:
+ inputOffsets = {key: 0 for key in inputTypes.keys()}
As per Ruff B006.
Also applies to: 30-35
🧰 Tools
🪛 Ruff (0.13.3)
25-25: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🤖 Prompt for AI Agents
In Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py around lines
25-26 and 30-35, the function/class signature uses a mutable default
inputOffsets: Dict[str, int] = {} which shares state across calls; change the
parameter default to inputOffsets: Optional[Dict[str,int]] = None and inside the
body set inputOffsets = {} if inputOffsets is None, then use that local dict;
also simplify any subsequent checks for emptiness to use simple truthiness (if
not inputOffsets) or explicit length checks as appropriate.
name: str = 'DeeployNetwork', | ||
n_cores: int = 8): | ||
self.globalObjects = OrderedDict() | ||
self.localObjects = OrderedDict() | ||
self.VariableBuffer = variableBuffer | ||
self.ConstantBuffer = constantBuffer | ||
self.StructBuffer = structBuffer | ||
self.TransientBuffer = transientBuffer | ||
self.n_cores = n_cores | ||
self.name = name | ||
self.n_cores = n_cores | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate assignment of n_cores in NetworkContext.init
self.n_cores is set twice. Remove one.
Apply:
- self.n_cores = n_cores
self.name = name
self.n_cores = n_cores
Optional: also replace mutable defaults for globalObjects/localObjects with None and init inside to avoid shared dicts.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
name: str = 'DeeployNetwork', | |
n_cores: int = 8): | |
self.globalObjects = OrderedDict() | |
self.localObjects = OrderedDict() | |
self.VariableBuffer = variableBuffer | |
self.ConstantBuffer = constantBuffer | |
self.StructBuffer = structBuffer | |
self.TransientBuffer = transientBuffer | |
self.n_cores = n_cores | |
self.name = name | |
self.n_cores = n_cores | |
def __init__(self, | |
# … other parameters … | |
name: str = 'DeeployNetwork', | |
n_cores: int = 8): | |
self.globalObjects = OrderedDict() | |
self.localObjects = OrderedDict() | |
self.VariableBuffer = variableBuffer | |
self.ConstantBuffer = constantBuffer | |
self.StructBuffer = structBuffer | |
self.TransientBuffer = transientBuffer | |
- self.n_cores = n_cores | |
self.name = name | |
self.n_cores = n_cores |
🤖 Prompt for AI Agents
In Deeploy/DeeployTypes.py around lines 586 to 597, NetworkContext.__init__
assigns self.n_cores twice; remove the redundant assignment so n_cores is only
set once. Also (optional) avoid mutable default shared dicts by changing
parameters defaulting to OrderedDict() to None and initializing
self.globalObjects = OrderedDict() and self.localObjects = OrderedDict() inside
__init__ when the corresponding arg is None.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukamac, got it! The duplicate self.n_cores = n_cores
assignment has been noted. 👍
inputOffsets = {}, | ||
n_cores: int = 8): | ||
super().__init__( | ||
graph = graph, | ||
deploymentPlatform = deploymentPlatform, | ||
inputTypes = inputTypes, | ||
loweringOptimizer = loweringOptimizer, | ||
scheduler = scheduler, | ||
name = name, | ||
default_channels_first = default_channels_first, | ||
deeployStateDir = deeployStateDir, | ||
inputOffsets = inputOffsets, | ||
n_cores = n_cores, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Avoid mutable default for inputOffsets; forward None to base
Use Optional and let the base handle initialization.
Apply:
- inputOffsets = {},
- n_cores: int = 8):
- super().__init__(
+ inputOffsets: Optional[Dict[str, int]] = None,
+ n_cores: int = 8):
+ super().__init__(
@@
- inputOffsets = inputOffsets,
+ inputOffsets = inputOffsets,
n_cores = n_cores,
)
(Requires the SignPropDeployer change to accept Optional.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
inputOffsets = {}, | |
n_cores: int = 8): | |
super().__init__( | |
graph = graph, | |
deploymentPlatform = deploymentPlatform, | |
inputTypes = inputTypes, | |
loweringOptimizer = loweringOptimizer, | |
scheduler = scheduler, | |
name = name, | |
default_channels_first = default_channels_first, | |
deeployStateDir = deeployStateDir, | |
inputOffsets = inputOffsets, | |
n_cores = n_cores, | |
) | |
inputOffsets: Optional[Dict[str, int]] = None, | |
n_cores: int = 8): | |
super().__init__( | |
graph = graph, | |
deploymentPlatform = deploymentPlatform, | |
inputTypes = inputTypes, | |
loweringOptimizer = loweringOptimizer, | |
scheduler = scheduler, | |
name = name, | |
default_channels_first = default_channels_first, | |
deeployStateDir = deeployStateDir, | |
inputOffsets = inputOffsets, | |
n_cores = n_cores, | |
) |
🧰 Tools
🪛 Ruff (0.13.3)
40-40: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🤖 Prompt for AI Agents
In Deeploy/Targets/PULPOpen/Deployer.py around lines 40 to 53, the constructor
uses a mutable default argument inputOffsets = {} which should be avoided;
change the parameter type to Optional[dict] and default it to None, update any
necessary typing imports, and forward that None to the base class (letting the
base initializer create the default dict) so you no longer use a shared mutable
default; ensure the SignPropDeployer/base constructor is updated to accept
Optional as noted.
im2col_dim = (operatorRepresentation["weight_type"].typeWidth // 8 | ||
) * ctxt.n_cores * operatorRepresentation['dim_kernel_x'] * operatorRepresentation['dim_kernel_y'] | ||
im2col_name = operatorRepresentation['nodeName'] + "_buffer" | ||
return [(im2col_name, im2col_dim)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid potential double-accounting of bytes for DW Im2Col buffer.
Here you compute bytes (via typeWidth//8) and also set the buffer’s referencedType. If the allocator multiplies element count by sizeof(type) when a concrete type is set (as used elsewhere), this could oversize the buffer by a factor of element size.
- Option A (recommended): Keep im2col_dim as element count (remove × bytes) and retain the referencedType assignment.
- Option B: Keep im2col_dim in bytes and do not rely on per-type multiplication in the allocation path for this buffer.
Please verify allocator/codegen behavior. Run:
If multiplication by sizeof(type) is present for typed transient buffers, switch to element-count sizing here to avoid double-accounting. Based on learnings.
Also applies to: 59-65
🏁 Script executed:
#!/bin/bash
# Inspect how ctxtBufferSize and TransientBuffer.size are consumed.
rg -n -C3 -P 'ctxtBuffer(Size)?\b|TransientBuffer\(|hoistTransientBuffer\(' Deeploy
rg -n -C3 -P '(alloc|allocate|malloc|rt_alloc|pi_l2_malloc|sizeof)\\s*\\('
# Look for size computations mixing buffer.type and size
rg -n -C4 -P '(?:ctx|ctxt|buffer).*size|size.*type|referencedType.*size' Deeploy
Length of output: 119828
Remove the typeWidth//8 factor from im2col_dim
Drop the //8
multiplication in the im2col_dim
calculation (FloatConvTemplate.py lines 48–51 and 59–65). PULP’s allocation templates already do sizeof(type) * size
, so passing element counts avoids double‐sizing.
🤖 Prompt for AI Agents
In Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py around lines 48–51
and 59–65, the im2col_dim computation multiplies by
operatorRepresentation["weight_type"].typeWidth // 8 which double-sizes the
allocation; remove the typeWidth // 8 factor so im2col_dim represents element
count only (leave the remaining factors: ctxt.n_cores * dim_kernel_x *
dim_kernel_y and the name construction unchanged), updating both occurrences
accordingly.
class TestRunner(): | ||
|
||
def __init__(self, | ||
platform: str, | ||
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'], | ||
tiling: bool, | ||
argument_parser: TestRunnerArgumentParser, | ||
gen_args: str = "", | ||
cmake_args: str = ""): | ||
def __init__( | ||
self, | ||
platform: str, | ||
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'], | ||
tiling: bool, | ||
argument_parser: TestRunnerArgumentParser, | ||
gen_args: str = "", | ||
cmake_args: str = "", | ||
cores: int = 8, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Fix AttributeError and command construction in TestRunner.generate_test
- Typo: use self.n_cores (not self._cores).
- Remove redundant command reassignment.
Apply:
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+ command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
@@
- if self._tiling is True:
- command += f" -n_cores {self._cores}"
+ if self._tiling is True:
+ command += f" -n_cores {self.n_cores}"
This resolves the CI failure.
Run to confirm no remaining _cores
typos:
Also applies to: 349-357
🏁 Script executed:
#!/bin/bash
rg -nP '_cores\b' -C2
Length of output: 21522
Fix duplicate command and typo in TestRunner.generate_test
In testUtils/testRunner.py (generate_test):
- Remove the redundant initial
command = … {self.gen_args}
assignment. - Replace
self._cores
withself.n_cores
when appending-n_cores
.
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform} {self.gen_args}"
-
- command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
+ command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}"
@@
- if self._tiling is True:
- command += f" -n_cores {self._cores}"
+ if self._tiling is True:
+ command += f" -n_cores {self.n_cores}"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class TestRunner(): | |
def __init__(self, | |
platform: str, | |
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'], | |
tiling: bool, | |
argument_parser: TestRunnerArgumentParser, | |
gen_args: str = "", | |
cmake_args: str = ""): | |
def __init__( | |
self, | |
platform: str, | |
simulator: Literal['gvsoc', 'banshee', 'qemu', 'vsim', 'vsim.gui', 'host', 'none'], | |
tiling: bool, | |
argument_parser: TestRunnerArgumentParser, | |
gen_args: str = "", | |
cmake_args: str = "", | |
cores: int = 8, | |
): | |
command = f"python {generation_script} -d {self._dir_gen} -t {self._dir_test} -p {self._platform}" | |
if self._tiling is True: | |
command += f" -n_cores {self.n_cores}" |
🤖 Prompt for AI Agents
In DeeployTest/testUtils/testRunner.py around lines 279 to 290, the
generate_test builds the same command twice and uses the wrong attribute name
for cores; remove the redundant initial "command = … {self.gen_args}" assignment
so the command is constructed only once, and replace any use of self._cores with
self.n_cores when appending the "-n_cores" flag so the correct attribute is
used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first round of review. There is already a few comments that need your answers. I haven't managed to look at the actual templates/kernels. I'll do that tomorrow.
Good work :)
else: | ||
inputTransposeNode, inputTransposeOutput = _appendTransposeNode(inputNode, name + "_TransposeIn", inPermute) | ||
opNode.inputs[0] = inputTransposeOutput | ||
graph.nodes.append(inputTransposeNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why we are transposing the input on non-RequantizedConvs? which cases are that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood it from our private conversation, it's the floating-point implementation of the dw conv. As suggested somewhere else, I would separate it into a dedicated function NCHWtoNHWC_dw_fun
for clarity, or just wait for my PR to land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can wait for your PR.
deeployStateDir: str = "DeeployState", | ||
inputOffsets: Dict[str, int] = {}): | ||
inputOffsets: Dict[str, int] = {}, | ||
n_cores: int = 8): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add the n_cores
argument? We have the N_CORES
define passed through the cmake script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.
|
||
# Non DW-Type: | ||
if opNode.attrs['group'] == 1: | ||
if 'group' in opNode.attrs and opNode.attrs['group'] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should better be covered by a get with a default because ONNX Conv spec defines the default for group
to be 1:
if opNode.attrs.get('group', 1):
P.S. I have a PR refactoring this file so you might have some merge conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will replace it with a get, but even if the default is group=1, this also handles the RequantizedConv, which is "non-ONNX-canonical". If I remember correctly, it was the RequantConv that was missing this attribute and raising an error in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, looking again at it, I can't implement a getter function, since the opNode is a Node object, implemented in the ONNX Graph Surgeon library. I may implement a private getter somewhere in this file, but to me it seems superfluous, since it would do pretty much the same operation as the current if, and it would only be used once. So let me know please if and how you'd like this to be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to implement a get
function, it's already implemented for the Python dictionaries (which node.attrs is).
And yes, RequantizedConv
is not a standard ONNX node, but it is kinda inheriting a lot of the stuff from the Conv node sooooo, imo, I would say it's ok to assume that default.
|
||
# Initialize Pass | ||
super().__init__(pattern = graph, | ||
replacement_fn = partial(_PULPDWNCHWtoNHWC_fun, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend writing another NCHWtoNHWC_dw function for the PULP conv kernels, and maybe even check that it's an fp kernel, just to make it even clearer that it differs from the integer one. The transposition of the input is quite a big difference that imo deserves a separate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said in a comment above, if it's ok with you, I'll wait for your PR and make the changes afterwards.
name: str = 'DeeployNetwork', | ||
n_cores: int = 8): | ||
self.globalObjects = OrderedDict() | ||
self.localObjects = OrderedDict() | ||
self.VariableBuffer = variableBuffer | ||
self.ConstantBuffer = constantBuffer | ||
self.StructBuffer = structBuffer | ||
self.TransientBuffer = transientBuffer | ||
self.n_cores = n_cores | ||
self.name = name | ||
self.n_cores = n_cores | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
return ctxt, False | ||
|
||
|
||
class PULPFPDWConv2DParser(Conv2DParser): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this parser does basically the same thing as the above FP Conv parser, you can inherit it, and just check for the additional group
condition. You can immediately check in the parseNode if the group attribute is present and equal to the node.inputs[1].shape[<ch idx>]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are mainly the same, but with a key direct incompatibility, because of the self.operatorRepresentation['group'] == 1,
condition in regular Conv2d.
That condition does not really make sense in the context of the ONNX Conv definition, but grouping is not handled in the FP kernel of PULPOpen, so for now I think it's a good check to keep it in place (which would mean that we need to keep the 2 parsers, regular and DW, separate for now).
Let me know if you'd like me to proceed differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/Targets/PULPOpen/Deployer.py (1)
31-53
: Avoid the shared mutable default forinputOffsets
Keeping
inputOffsets = {}
means every deployer instance shares the same dict, so one run can mutate another’s offsets. Since this signature just changed, please switch to a safeNone
default and materialize a new dict per instance before callingsuper()
.-from typing import Callable, Dict, List, Type +from typing import Callable, Dict, List, Optional, Type @@ - inputOffsets = {}, + inputOffsets: Optional[Dict[str, int]] = None, n_cores: int = 8): + if inputOffsets is None: + inputOffsets = {} + super().__init__( graph = graph, deploymentPlatform = deploymentPlatform, inputTypes = inputTypes, loweringOptimizer = loweringOptimizer,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
(1 hunks)Deeploy/DeeployTypes.py
(15 hunks)Deeploy/Targets/Generic/Parsers.py
(1 hunks)Deeploy/Targets/PULPOpen/Deployer.py
(1 hunks)Deeploy/Targets/PULPOpen/Parsers.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/CommonExtensions/CodeTransformationPasses/Closure.py
🧰 Additional context used
🧬 Code graph analysis (2)
Deeploy/Targets/Generic/Parsers.py (1)
Deeploy/DeeployTypes.py (3)
add_aliases
(331-350)get_aliases_of
(352-364)lookup
(788-820)
Deeploy/Targets/PULPOpen/Parsers.py (2)
Deeploy/DeeployTypes.py (5)
inputs
(2579-2596)lookup
(788-820)parseNode
(1101-1116)parseNodeCtxt
(1119-1144)NetworkContext
(574-1088)Deeploy/Targets/Generic/Parsers.py (31)
Conv2DParser
(1204-1270)parseNode
(19-27)parseNode
(49-58)parseNode
(81-94)parseNode
(102-112)parseNode
(163-169)parseNode
(193-207)parseNode
(229-251)parseNode
(291-304)parseNode
(326-338)parseNode
(375-387)parseNode
(422-426)parseNode
(449-460)parseNode
(484-495)parseNode
(531-535)parseNode
(551-554)parseNodeCtxt
(29-41)parseNodeCtxt
(60-76)parseNodeCtxt
(114-155)parseNodeCtxt
(171-185)parseNodeCtxt
(209-221)parseNodeCtxt
(253-283)parseNodeCtxt
(306-318)parseNodeCtxt
(340-367)parseNodeCtxt
(389-414)parseNodeCtxt
(428-441)parseNodeCtxt
(462-476)parseNodeCtxt
(497-523)parseNodeCtxt
(537-543)parseNodeCtxt
(556-573)parseNodeCtxt
(586-605)
🪛 Ruff (0.14.0)
Deeploy/Targets/PULPOpen/Deployer.py
40-40: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
new_output_node_aliases = input_node.get_aliases_of() | ||
new_output_node_aliases.append(input_node.name) | ||
|
||
# Add new aliases | ||
# Add new aliases to output node | ||
output_node.add_aliases(aliases_to_add = new_output_node_aliases) | ||
|
||
# Add output node as alias to its aliases (alias relationship is symmetric) | ||
for alias in input_node.get_aliases_of(): | ||
alias_node = ctxt.lookup(alias) | ||
alias_node.add_aliases(aliases_to_add = [ | ||
output_node.name, | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alias symmetry currently skips the source tensor
The second loop walks input_node.get_aliases_of()
, which omits the source buffer itself. As a result the input buffer never learns about output_node
, so aliasing remains asymmetric even though the new comment says otherwise. Include the input buffer when propagating aliases so both directions stay in sync.
A minimal fix is to iterate new_output_node_aliases
(which already contains the input name) when updating the other buffers:
- for alias in input_node.get_aliases_of():
+ for alias in new_output_node_aliases:
alias_node = ctxt.lookup(alias)
alias_node.add_aliases(aliases_to_add = [
output_node.name,
])
🤖 Prompt for AI Agents
In Deeploy/Targets/Generic/Parsers.py around lines 1044 to 1055, the loop that
propagates alias symmetry iterates input_node.get_aliases_of(), which omits the
input node itself and leaves the input buffer unaware of the new output alias;
change the propagation to iterate new_output_node_aliases (which already
includes input_node.name) so every alias, including the source, is updated to
add output_node.name via ctxt.lookup(...) and add_aliases(...). Ensure you keep
the existing add_aliases call and the symmetric update for each alias returned
by new_output_node_aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
(4 hunks)Deeploy/Targets/Generic/Parsers.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Deeploy/Targets/Generic/Parsers.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-24T11:56:35.781Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py:364-0
Timestamp: 2025-09-24T11:56:35.781Z
Learning: When reviewing pull requests, focus comments only on code that was actually changed (lines with + or - in diffs), not on unchanged context code that appears in the diff.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
PR: pulp-platform/Deeploy#117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py
🧬 Code graph analysis (1)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (1)
Deeploy/CommonExtensions/OptimizationPasses/PassClasses.py (2)
contextagnostic
(285-298)ReplaceSequentialPatternPass
(265-282)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (120)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
- GitHub Check: linting
- GitHub Check: mempool-models / test-runner-mempool
- GitHub Check: siracusa-kernels-tiled-doublebuffer-L2 / test-runner-siracusa-tiled
- GitHub Check: softhier-kernels / test-runner-softhier
- GitHub Check: snitch-kernels-tiled-singlebuffer-L2 / test-runner-snitch-tiled
- GitHub Check: cortexm-models / test-runner-cortexm
- GitHub Check: cortexm-kernels / test-runner-cortexm
- GitHub Check: snitch-kernels / test-runner-snitch
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (Attention, 3500, 8, true, L3, true) / test-runner-siracusa-neureka-tiled (3500)
- GitHub Check: siracusa-neureka-models-tiled-doublebuffer-L3-wmem (microLlama/microLlama1, 10000, 8, true, L3, t... / test-runner-siracusa-neureka-tiled (10000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (Transformer, 15000, 8, L3) / test-runner-siracusa-neureka-tiled (15000)
🔇 Additional comments (4)
Deeploy/CommonExtensions/OptimizationPasses/TopologyOptimizationPasses/LoweringOptimizationPasses.py (4)
250-250
: LGTM! Improved group attribute handling.Using
.get('group', 1)
properly handles the ONNX Conv default and is more robust than explicit attribute checking.
370-373
: LGTM! Correctly handles FP DW Conv input transpose.The else block appropriately transposes the input for non-RequantizedConv DW convolutions (e.g., FP Conv), while the RequantizedConv path transposes weights. This distinction aligns with the requirements for adding FP DW Conv support.
380-426
: LGTM! Well-structured pass classes with clear separation of concerns.The refactoring of
PULPDWConvPass
improves readability with explicit comments and sections. The newPULPFPDWConvPass
class properly mirrors the structure for FP DW Conv handling, using the 'Conv' op pattern and a distinct pass name while sharing the same replacement function viapartial()
.
505-505
: LGTM! Correctly integrates the FP DW Conv pass into the sequence.The placement after
PULPDWConvPass
and before dense conv passes is logical, ensuring DW convolutions (both quantized and FP) are handled before dense convolutions.
This PR brings the changes required for a working minimal TinyViT to work on the Siracusa platform, without tiling.
Added
Changed
Fixed
PR Merge Checklist
devel
commit and pointing todevel
.CHANGELOG.md
file has been updated.