Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
63675c6
Conv2D Bias Adaptation
diaconuccalin Jul 29, 2025
549a052
Adde PULPOpen support for Conv2D and partially working DW Conv2D. Fix…
diaconuccalin Aug 7, 2025
b32a357
DW 2D Float Conv for PULPOpen platform now working. Updated im2col bu…
diaconuccalin Sep 18, 2025
340d058
Optimized the PULPOpen DW 2D fp32 Convolution and fixed the bias vers…
diaconuccalin Sep 19, 2025
00da44e
Updated float reshape with skip connection test to a smaller one
diaconuccalin Sep 19, 2025
a7f0fc0
Fixed generic platform alias_of bug
diaconuccalin Sep 22, 2025
10b22ae
Fixed the PULPOpen FloatGemmTemplate (identical issue to the generic …
diaconuccalin Sep 22, 2025
58a01ab
Working TinyViT Demo test. Added it to the CI pipeline. Added float s…
diaconuccalin Sep 22, 2025
77b055f
Added GEMM batched fix to MatMul template
diaconuccalin Sep 23, 2025
3079743
Fixed formatting
diaconuccalin Sep 23, 2025
ff9a903
Fixes to avoid warnings
diaconuccalin Sep 23, 2025
ad13fe2
Merge remote-tracking branch 'origin/devel' into TinyViT_Siracusa
diaconuccalin Sep 23, 2025
3a17aa4
Fix formatting
diaconuccalin Sep 23, 2025
8f3f74d
Merge fix
diaconuccalin Sep 23, 2025
63f122d
Dynamic buffer calculation fix. Other fixes
diaconuccalin Sep 24, 2025
f012b1a
Reformat
diaconuccalin Sep 24, 2025
de2752b
Added back CI tests removed by merge
diaconuccalin Sep 24, 2025
11d5293
Updated changelog file
diaconuccalin Sep 24, 2025
ce7e4c8
Applied fixes suggested in the PR review
diaconuccalin Sep 24, 2025
4165983
Merge branch 'devel' into TinyViT_Siracusa
diaconuccalin Oct 13, 2025
af78d75
Post-merge fixes
diaconuccalin Oct 13, 2025
bc2119e
Quickfix
diaconuccalin Oct 13, 2025
4db4978
Merge branch 'pulp-platform:devel' into TinyViT_Siracusa
diaconuccalin Oct 14, 2025
48b2ff5
PR fixes
diaconuccalin Oct 17, 2025
7cba0cf
Addressed PR review. Minor fix for aliasing in reshape parser
diaconuccalin Oct 21, 2025
da162b0
Minor fix based on PR review
diaconuccalin Oct 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/ci-platform-siracusa.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,15 @@ jobs:
testBacktracking
testFloatAdder
testFloatGEMM

testFloat2DConvolution
testFloat2DConvolutionBias
testFloat2DConvolutionZeroBias

testFloat2DDWConvolution
testFloat2DDWConvolutionBias
testFloat2DDWConvolutionZeroBias

testFloatLayerNorm
testFloatRelu
testFloatMaxPool
Expand All @@ -64,6 +72,7 @@ jobs:
Quant
Dequant
testFloatReduceSum
testFloatReshapeWithSkipConnection
testFloatSoftmaxGrad
testFloatSoftmaxCrossEntropy
testFloatSoftmaxCrossEntropyGrad
Expand All @@ -87,4 +96,5 @@ jobs:
CCT/CCT_1_16_16_8
CCT/CCT_2_32_32_128_Opset20
testTrainCCT/CCT1_Classifier_Training/CCT_1_16_16_8
testFloatDemoTinyViT
num-cores: 8
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
## Unreleased (Planned Release Target: v0.2.1)

### List of Pull Requests
- TinyViT on non-tiled Siracusa [#117](https://github.com/pulp-platform/Deeploy/pull/117)
- Refactor Logging for Improved Debugging [#115](https://github.com/pulp-platform/Deeploy/pull/115)
- Add reuse-tool as an SPDX license header linter [#113](https://github.com/pulp-platform/Deeploy/pull/113)
- Bug fixes, API Cleanup and Reduce Compiler Warning on PULP [#112](https://github.com/pulp-platform/Deeploy/pull/112)
Expand All @@ -17,6 +18,13 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- Fix `Unsqueeze` Op. when using ONNX opset 13 or higher (from attribute to input) [#119](https://github.com/pulp-platform/Deeploy/pull/119)

### Added
- PULP 2D FP DW conv Im2Col template and kernel, with bias support.
- Bias support for PULP 2D FP regular conv Im2Col in template & kernel.
- PULP FP DW conv 2D parser.
- FP conv 2D (simple & DW), reshape & skip connection, and TinyViT demo tests to the non-tiled Siracusa CI pipeline.
- FP bindings and mappings for PULP slice, DW conv 2D, and reduce mean operations.
- FP PULP DW conv lowering optimization pass similar to the existent one for integer version.
- RemoveEmptyConvBiasPass to the PULP optimizer.
- Add manual type inference feature (CLI: `--input-type-map`/`--input-offset-map`) to resolve ambiguities when test inputs are not representative enough
- Added a `testTypeInferenceDifferentTypes` test case to validate type inference for different input types
- Added `_mangleNodeNames` function to avoid duplicate node mappings
Expand Down Expand Up @@ -48,6 +56,7 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- Memory/I/O summaries and input/output logging in deployers

### Changed
- Reduced size of reshape & skip connection test, for non-tiled Siracusa memory compatibility.
- Replaced platform-specific tags (`*-amd64`, `*-arm64`) with direct digest references in `Noelware/docker-manifest-action`.
- mchan HAL is now reduced to bare-bones
- refactor of the IntrospectiveCodeTransformation to work on the Mako template
Expand Down Expand Up @@ -75,6 +84,10 @@ This file contains the changelog for the Deeploy project. The changelog is divid
- Deployer workflow now uses `prepare(...)` instead of `generateFunction(...)`.

### Fixed
- Fixed bug in alias_of node parameter handling, that takes care of the lifetime of buffers in skip connection situations.
- Fixed bug for non-batched elements in the PULPOpen FP GEMM and matmul templates.
- Added underscore to the beginning of closure names to avoid naming issues when they start with unsupported first characters (like numbers).
- Data types in the PULPOpen FP add and mul templates.
- Prevent node duplication for graphs generated via GraphSurgeon
- Resolved issue with missing `id` in the `Build Cache for Docker` step, used in the `Inject build-cache` step.
- Fix license CI check and prevent potential issues with `jq` installation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,8 @@ def apply(self,
executionBlock: ExecutionBlock,
name: str,
verbose: CodeGenVerbosity = _NoVerbosity) -> Tuple[NetworkContext, ExecutionBlock]:
self.closureName = name + self.closureSuffix
# Add underscore to avoid name issues when beginning with problematic characters (like numbers)
self.closureName = "_" + name + self.closureSuffix
self.functionCall = executionBlock.generate(ctxt)
self._generateClosureStruct(ctxt, executionBlock)
ctxt = self._generateClosureCtxt(ctxt, name)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ def __init__(self,
name: str = 'DeeployNetwork',
default_channels_first: bool = True,
deeployStateDir: str = "DeeployState",
inputOffsets: Dict[str, int] = {}):
inputOffsets: Dict[str, int] = {},
n_cores: int = 8):
Comment on lines +25 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid mutable default for inputOffsets; initialize from None

Using {} as a default shares state across instances. Accept None and init per inputs. Also simplify the empty-check.

Apply:

-                 inputOffsets: Dict[str, int] = {},
-                 n_cores: int = 8):
+                 inputOffsets: Optional[Dict[str, int]] = None,
+                 n_cores: int = 8):
@@
-        if inputOffsets == {}:
-            for key in inputTypes.keys():
-                inputOffsets[key] = 0
+        if inputOffsets is None:
+            inputOffsets = {key: 0 for key in inputTypes.keys()}

As per Ruff B006.

Also applies to: 30-35

🧰 Tools
🪛 Ruff (0.13.3)

25-25: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In Deeploy/CommonExtensions/NetworkDeployers/SignPropDeployer.py around lines
25-26 and 30-35, the function/class signature uses a mutable default
inputOffsets: Dict[str, int] = {} which shares state across calls; change the
parameter default to inputOffsets: Optional[Dict[str,int]] = None and inside the
body set inputOffsets = {} if inputOffsets is None, then use that local dict;
also simplify any subsequent checks for emptiness to use simple truthiness (if
not inputOffsets) or explicit length checks as appropriate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add the n_cores argument? We have the N_CORES define passed through the cmake script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of cores is needed to dinamically compute the size of the im2col buffer, for the regular and DW Conv2Ds. This was the method I found to pass it on to the network context (PULPDeployer inherits SignPropDeployer - this class -, which in turn inherits NetworkDeployer). Let me know if you think we should proceed differently with this.

super().__init__(graph, deploymentPlatform, inputTypes, loweringOptimizer, scheduler, name,
default_channels_first, deeployStateDir)

Expand All @@ -31,6 +32,7 @@ def __init__(self,
inputOffsets[key] = 0

self.inputOffsets = inputOffsets
self.n_cores = n_cores

def _createIOBindings(self, ctxt, graph):
ctxt = super()._createIOBindings(ctxt, graph)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ def _NCHWtoNHWC_fun(graph: gs.Graph, match: Match, name: str, default_channels_f
if node_op in ["RequantizedConv", "Conv"]:

# Non DW-Type:
if opNode.attrs['group'] == 1:
if opNode.attrs.get('group', 1) == 1:
weightNode = opNode.inputs[1]
weightTransposeNode, weightTransposeOutput = _appendTransposeNode(weightNode, name + "TransposeWeight",
inPermute)
Expand Down Expand Up @@ -341,7 +341,7 @@ def _PULPDWNCHWtoNHWC_fun(graph: gs.Graph, match: Match, name: str, default_chan
opNode = matched_nodes[0]
node_op = opNode.op

if opNode.attrs['group'] == 1:
if opNode.attrs.get('group', 1) == 1:
return graph

if (("channels_first" in opNode.attrs and opNode.attrs["channels_first"] != default_channels_first)
Expand All @@ -362,30 +362,67 @@ def _PULPDWNCHWtoNHWC_fun(graph: gs.Graph, match: Match, name: str, default_chan
graph.nodes.append(outputTransposeNode)

if node_op == "RequantizedConv":

weightNode = opNode.inputs[1]
weightTransposeNode, weightTransposeOutput = _appendTransposeNode(weightNode, name + "TransposeWeight",
inPermute)
opNode.inputs[1] = weightTransposeOutput
graph.nodes.append(weightTransposeNode)
else:
inputTransposeNode, inputTransposeOutput = _appendTransposeNode(inputNode, name + "_TransposeIn", inPermute)
opNode.inputs[0] = inputTransposeOutput
graph.nodes.append(inputTransposeNode)
Comment on lines +370 to +373
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why we are transposing the input on non-RequantizedConvs? which cases are that?

Copy link
Collaborator

@lukamac lukamac Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood it from our private conversation, it's the floating-point implementation of the dw conv. As suggested somewhere else, I would separate it into a dedicated function NCHWtoNHWC_dw_fun for clarity, or just wait for my PR to land.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can wait for your PR.


opNode.attrs["channels_first"] = default_channels_first

return graph


# Requantized DW Conv
@contextagnostic
class PULPDWConvPass(ReplaceSequentialPatternPass):

def __init__(self, default_channels_first: bool = True):
# Define pattern graph
graph = gs.Graph()

_input = gs.Variable(name = 'input_1')
output = graph.layer(inputs = [_input], outputs = ['convOut'], op = 'RequantizedConv', name = 'requantizedConv')

graph.outputs.append(output)
graph.inputs.append(_input)

name = "_NCHW_TO_NHWC_CONV_PASS"
super().__init__(graph, partial(_PULPDWNCHWtoNHWC_fun, default_channels_first = default_channels_first), name)
# Define name
name = "_NCHW_TO_NHWC_DW_CONV_PASS"

# Initialize Pass
super().__init__(pattern = graph,
replacement_fn = partial(_PULPDWNCHWtoNHWC_fun,
default_channels_first = default_channels_first),
name = name)


# Float DW Conv
@contextagnostic
class PULPFPDWConvPass(ReplaceSequentialPatternPass):

def __init__(self, default_channels_first: bool = True):
# Define pattern graph
graph = gs.Graph()

_input = gs.Variable(name = 'input_1')
output = graph.layer(inputs = [_input], outputs = ['convOut'], op = 'Conv', name = 'conv')

graph.outputs.append(output)
graph.inputs.append(_input)

# Define name
name = "_NCHW_TO_NHWC_FP_DW_CONV_PASS"

# Initialize Pass
super().__init__(pattern = graph,
replacement_fn = partial(_PULPDWNCHWtoNHWC_fun,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend writing another NCHWtoNHWC_dw function for the PULP conv kernels, and maybe even check that it's an fp kernel, just to make it even clearer that it differs from the integer one. The transposition of the input is quite a big difference that imo deserves a separate function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in a comment above, if it's ok with you, I'll wait for your PR and make the changes afterwards.

default_channels_first = default_channels_first),
name = name)


def _PULPDenseNCHWtoNHWC_fun(graph: gs.Graph, match: Match, name: str, default_channels_first: bool = True):
Expand Down Expand Up @@ -465,6 +502,7 @@ def __init__(self, default_channels_first: bool = True):
NCHWtoNHWCPadPass(default_channels_first),
NCHWtoNHWCMaxPoolPass(default_channels_first),
PULPDWConvPass(default_channels_first),
PULPFPDWConvPass(default_channels_first),
PULPNCHWtoNHWCDenseConvPass(default_channels_first),
PULPNCHWtoNHWCDenseRequantizedConvPass(default_channels_first),
]
Expand Down
83 changes: 59 additions & 24 deletions Deeploy/DeeployTypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def __init__(self, name: str = '', shape = [1], alias_of: Optional[List[str]] =
self.is_input: bool = False
self.is_output: bool = False

self.alias_of: List[str] = alias_of if alias_of is not None else []
self.alias_of: List[str] = list(alias_of) if alias_of is not None else []

def _bufferRepresentation(self) -> Dict:
return {"type": self._instance, "name": self.name, "size": int(np.prod(self.shape))}
Expand Down Expand Up @@ -322,7 +322,11 @@ def __getstate__(self):

@classmethod
def fromNode(cls, node: gs.Node):
return (cls(name = node.name, shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape))
return (cls(
name = node.name,
shape = node.shape if not isinstance(node, gs.Constant) else node.values.shape,
alias_of = [],
))

def add_aliases(self, aliases_to_add: List[str]):
"""
Expand Down Expand Up @@ -355,7 +359,7 @@ def get_aliases_of(self):
"""

if hasattr(self, "alias_of"):
return self.alias_of
return list(self.alias_of)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? seems unnecessary since we expect alias_of to be a list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the solution I found to the aliasing issue you handle in this PR. Apparently, "casting" like this, a list to a list, actually creates a new instance of the original list. If your PR will get merged first, then I will fix this, otherwise maybe you can remove this in your PR, since it's not going to be needed anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, now after refactoring that portion of the code, I understand the need for it 😅 but I prefer my solution much more to this. Let's wait a little bit more for my PRs to get merged

else:
return list()

Expand Down Expand Up @@ -399,7 +403,7 @@ class TransientBuffer(VariableBuffer):

def __init__(self, name: str = '', size = 0):
self.name = name
self.size = size #: int: Total BYTE size of this TransientBuffer
self.size = size # int: Total BYTE size

# Do not override - Should be written in the parsing passes
self._users = []
Expand Down Expand Up @@ -446,7 +450,9 @@ class ConstantBuffer(VariableBuffer):
"""

def __init__(self, name: str = '', shape = [1], values = [0]):
# Pass a copy of alias_of to avoid shared references
super().__init__(name, shape)

values = np.asarray(values)
# intArray = values.astype(int)
# assert (np.abs(values - intArray)).max() < 0.001, "Constant value {name} is NOT an integer!"
Expand Down Expand Up @@ -481,7 +487,11 @@ def _bufferRepresentation(self) -> Dict:

@classmethod
def fromVariableBuffer(cls, buffer: VariableBuffer, values):
ret = cls(name = buffer.name, shape = buffer.shape, values = values)
ret = cls(
name = buffer.name,
shape = buffer.shape,
values = values,
)

return ret

Expand Down Expand Up @@ -572,14 +582,16 @@ def __init__(self,
transientBuffer: Type[TransientBuffer],
globalObjects = {},
localObjects = {},
name: str = 'DeeployNetwork'):
name: str = 'DeeployNetwork',
n_cores: int = 8):
self.globalObjects = OrderedDict()
self.localObjects = OrderedDict()
self.VariableBuffer = variableBuffer
self.ConstantBuffer = constantBuffer
self.StructBuffer = structBuffer
self.TransientBuffer = transientBuffer
self.name = name
self.n_cores = n_cores

self._maxDynamicSize = {} #: int: Maximum dynamic memory size occupied by live buffers at any point in time
self._dynamicSize = {} #: int: Current dynamic memory size occupied by live buffers
Expand Down Expand Up @@ -874,7 +886,7 @@ def is_buffer(self, value: Any) -> bool:
obj = self.lookup(value)
return isinstance(obj, VariableBuffer)

def hoistTransientBuffer(self, name: str, size: int) -> str:
def hoistTransientBuffer(self, name: str, size: Union[int, str]) -> str:
"""Registers a new TransientBuffer in the local context

Parameters
Expand Down Expand Up @@ -1186,7 +1198,11 @@ def parseOutputs(cls, ctxt: NetworkContext, node: gs.Node) -> NetworkContext:

for node, name in zip(outputNodes, outputNames):
if not ctxt.is_global(name):
nb = ctxt.VariableBuffer(name = name, shape = node.shape)
nb = ctxt.VariableBuffer(
name = name,
shape = node.shape,
alias_of = [],
)
ctxt.add(nb, 'local')
else:
nb = ctxt.lookup(name)
Expand Down Expand Up @@ -2487,7 +2503,8 @@ def __init__(self,
inputTypes: Dict[str, Type[Pointer]],
scheduler: Callable[[gs.Graph], Schedule] = lambda graph: list(graph.nodes),
name: str = 'DeeployNetwork',
deeployStateDir: str = "DeeployState"):
deeployStateDir: str = "DeeployState",
n_cores: int = 8):
"""Initializes a new NetworkContainer and its NetworkContext

Parameters
Expand All @@ -2505,6 +2522,8 @@ def __init__(self,
Prefix to use in deployment to uniquify tensor names
deeployStateDir : str
Path to a directory to dump intermediate outputs
n_cores : int
The number of cores on which the network will be run


"""
Expand All @@ -2523,7 +2542,8 @@ def __init__(self,
self.ctxt = NetworkContext(variableBuffer = self.Platform.VariableBuffer,
constantBuffer = self.Platform.ConstantBuffer,
structBuffer = self.Platform.StructBuffer,
transientBuffer = self.Platform.TransientBuffer)
transientBuffer = self.Platform.TransientBuffer,
n_cores = n_cores)

self.deeployStateDir = deeployStateDir

Expand Down Expand Up @@ -2683,10 +2703,13 @@ def parse(self, default_channels_first: bool = True) -> bool:

"""

self.ctxt = NetworkContext(variableBuffer = self.Platform.VariableBuffer,
constantBuffer = self.Platform.ConstantBuffer,
structBuffer = self.Platform.StructBuffer,
transientBuffer = self.Platform.TransientBuffer)
self.ctxt = NetworkContext(
variableBuffer = self.Platform.VariableBuffer,
constantBuffer = self.Platform.ConstantBuffer,
structBuffer = self.Platform.StructBuffer,
transientBuffer = self.Platform.TransientBuffer,
n_cores = self.ctxt.n_cores,
)

log.debug(" - Create IO Bindings")
self.ctxt = self._createIOBindings(self.ctxt, self.graph)
Expand Down Expand Up @@ -3232,15 +3255,18 @@ class NetworkDeployer(NetworkContainer):
"""Deeploy abstraction to contain an entire network and all necessary information to deploy it
"""

def __init__(self,
graph: gs.Graph,
deploymentPlatform: DeploymentPlatform,
inputTypes: Dict[str, Type[Pointer]],
loweringOptimizer: TopologyOptimizer,
scheduler: Callable[[gs.Graph], Schedule] = lambda graph: list(graph.nodes),
name: str = 'DeeployNetwork',
default_channels_first: bool = True,
deeployStateDir: str = "DeeployState"):
def __init__(
self,
graph: gs.Graph,
deploymentPlatform: DeploymentPlatform,
inputTypes: Dict[str, Type[Pointer]],
loweringOptimizer: TopologyOptimizer,
scheduler: Callable[[gs.Graph], Schedule] = lambda graph: list(graph.nodes),
name: str = 'DeeployNetwork',
default_channels_first: bool = True,
deeployStateDir: str = "DeeployState",
n_cores: int = 8,
):
"""Initialize a new NetworkDeployer

Parameters
Expand Down Expand Up @@ -3269,12 +3295,21 @@ def __init__(self,


"""
super().__init__(graph, deploymentPlatform, inputTypes, scheduler, name, deeployStateDir = deeployStateDir)
super().__init__(
graph = graph,
platform = deploymentPlatform,
inputTypes = inputTypes,
scheduler = scheduler,
name = name,
deeployStateDir = deeployStateDir,
n_cores = n_cores,
)

self.loweringOptimizer = loweringOptimizer
self.default_channels_first = default_channels_first

self.prepared = False
self.n_cores = n_cores

def __repr__(self):
return super().__repr__(
Expand Down
Loading
Loading