Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
9ec13f9
Add classifier training support
runwangdl Mar 17, 2025
f1a0491
Fix L3 DMA and Maxpool Bugs
runwangdl Mar 3, 2025
8bfdb13
correct DMA lengh of copy assertion
runwangdl Mar 18, 2025
031dc79
delete redundant shell scripts
runwangdl Mar 19, 2025
58e18da
Merge branch 'devel' into PULPCCTL3_16_16_64
runwangdl Mar 19, 2025
ac2d879
Update node with multioutput to single output
runwangdl Mar 19, 2025
6a7198b
add softmaxcrossentropygrad tiling
runwangdl Mar 19, 2025
360aef7
Add softmaxcrossentropylossgrad tiling
runwangdl Mar 20, 2025
bc48582
Merge branch 'PULPCCTL3_16_16_64' into GEMM_training_tiled
runwangdl Mar 20, 2025
b6542ba
Fix CI issue
runwangdl Mar 20, 2025
fe208d0
Fix CI bugs
runwangdl Mar 20, 2025
4a21359
update CI
runwangdl Mar 20, 2025
91f12f0
Add and pass test for CCT gemmtraining 1_16_16_8 to 128
runwangdl Mar 20, 2025
d1e1ebf
update CI with 8-128 dim CCT last gemm training test
runwangdl Mar 20, 2025
86a2e99
Add SGD support for PULP Open
runwangdl Mar 20, 2025
bdacd2f
Update CCT training test with sgd
runwangdl Mar 20, 2025
99035f0
Update Changelog
runwangdl Mar 23, 2025
62e87d3
Merge branch 'devel' into GEMM_training_tiled
runwangdl Mar 23, 2025
15ea3ec
Solved issues caused by merging conflicts
runwangdl Mar 23, 2025
a644fdf
Solved Review Comments
runwangdl Mar 28, 2025
643e160
Resolving conflicts
runwangdl Mar 28, 2025
80a9518
Reresolve the conflict
runwangdl Mar 28, 2025
501775d
Solving CI issues
runwangdl Mar 28, 2025
65a56b7
fix linting errors
runwangdl Mar 28, 2025
03c3f4a
gelu sigmoid approximation
runwangdl Mar 24, 2025
7e141fd
gelu parallel + unroll
runwangdl Mar 24, 2025
c3ee783
Float Matmul Parallel on M
runwangdl Mar 24, 2025
47d8c19
Softmax Parallel and Softmax Op Support
runwangdl Mar 24, 2025
ccba380
conv parallel without im2col
runwangdl Mar 25, 2025
fafcedf
PULP Layernorm Parallel
runwangdl Mar 25, 2025
147e68f
Fixed CI issues
runwangdl Mar 28, 2025
6e07dc9
fixing linting
runwangdl Mar 28, 2025
8b2f685
Merge branch 'devel' into devel_CCT_Optim
runwangdl Apr 8, 2025
9c0b8f6
Enlarge CI floatconv tiling L1 size for 8 core and delete CCT 128 tes…
runwangdl Apr 8, 2025
4c36de2
matmul 1*4 unrolling
runwangdl Apr 24, 2025
28ec2ca
Add computeOp support for CCT necessary kernels
runwangdl Apr 24, 2025
bf1f8ae
Add openlibm expf
runwangdl Apr 13, 2025
deac9ce
add relu, mul, maxpool ops num
runwangdl May 4, 2025
3b12187
Optimize parallel for multiple kernels
runwangdl May 4, 2025
49da947
Merge branch 'devel' into devel_CCT_Optim
runwangdl May 4, 2025
47961b9
Merge branch 'devel' into devel_CCT_Optim
runwangdl May 6, 2025
8907532
Change ConvTileConstraint to only tile on outchannel
runwangdl May 6, 2025
133f9ae
Fix error in gelu
runwangdl May 6, 2025
f25127d
Fix Linting Issues
runwangdl May 6, 2025
6f3f585
Merge branch 'devel' into devel_CCT_Optim
runwangdl May 8, 2025
4ffea9b
Change CI tests
runwangdl May 8, 2025
e819626
Add RV32IMF Picolibc support for Siracusa platform
runwangdl May 8, 2025
fa0cc37
Build Docker for new gvsoc for testing
runwangdl May 8, 2025
ac56ca2
Gvsoc Small test
runwangdl May 8, 2025
fd6c99d
Add Redmule Platform, Engline, Tiler, and Deployer
runwangdl May 8, 2025
2862f29
Add rv32imf.txt to build docker
runwangdl May 8, 2025
9ef9cc2
Update GVSOC hash
runwangdl May 9, 2025
10de9f6
matmul delicate constraints for Redmule
runwangdl May 9, 2025
efab54c
Merge branch 'devel_CCT_Optim' into redmule_platform
runwangdl May 9, 2025
37670e6
conv with redmule
runwangdl May 9, 2025
08b7e23
Add CCT 32 test
runwangdl May 9, 2025
e42b3d6
xtensor gvsoc docker build
runwangdl May 9, 2025
c6e4890
Change Redmule Branch Pulp LLVM abi
runwangdl May 15, 2025
d998fc3
GEMM with Redmule
runwangdl May 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/BuildDocker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ jobs:
file: Container/Dockerfile
push: true
# JUNGVI: If you operate from a fork and want to build a new docker make sure to replace 'pulp-platform' by your uname.
tags: ghcr.io/pulp-platform/deeploy:main
tags: ghcr.io/runwangdl/deeploy:redmule
50 changes: 39 additions & 11 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:
- cron: "0 1 */6 * *"

env:
DOCKER_IMAGE: ghcr.io/pulp-platform/deeploy:main
DOCKER_IMAGE: ghcr.io/runwangdl/deeploy:redmule

jobs:

Expand Down Expand Up @@ -338,7 +338,7 @@ jobs:
},
{
"name": "testFloat2DConvolution",
"L1": [2000]
"L1": [8000]
},
{
"name": "testFloatLayerNorm",
Expand Down Expand Up @@ -420,7 +420,7 @@ jobs:
},
{
"name": "testFloat2DConvolution",
"L1": [4000]
"L1": [15000]
},
{
"name": "testFloatLayerNorm",
Expand Down Expand Up @@ -514,12 +514,8 @@ jobs:
L1: [64000]
- name: "CCT/CCT_1_16_16_64"
L1: [64000]
- name: "CCT/CCT_1_16_16_128"
L1: [64000]
- name: "testTrainCCT/CCT_Classifier_Training/CCT_1_16_16_64"
L1: [64000]
- name: "testTrainCCT/CCT_Classifier_Training/CCT_1_16_16_128"
L1: [64000]
num-cores:
- 8
default-memory-level:
Expand Down Expand Up @@ -559,12 +555,8 @@ jobs:
L1: [64000]
- name: "CCT/CCT_1_16_16_64"
L1: [64000]
- name: "CCT/CCT_1_16_16_128"
L1: [64000]
- name: "testTrainCCT/CCT_Classifier_Training/CCT_1_16_16_64"
L1: [64000]
- name: "testTrainCCT/CCT_Classifier_Training/CCT_1_16_16_128"
L1: [64000]
num-cores:
- 8
double-buffer:
Expand Down Expand Up @@ -748,6 +740,42 @@ jobs:
default-memory-level: ${{ matrix.default-memory-level }}
neureka-wmem: ${{ matrix.neureka-wmem }}

siracusa-redmule-kernels-tiled-singlebuffer-L2:
strategy:
fail-fast: false
matrix:
test-data:
- name: "testFloatMatmul"
L1: [8000]
num-cores:
- 8
uses: ./.github/workflows/TestRunnerTiledSiracusaWithRedmule.yml
needs: select-docker-image
with:
docker-image: ${{ needs.select-docker-image.outputs.image }}
test-name: ${{ matrix.test-data.name }}
num-cores: ${{ matrix.num-cores }}
L1: ${{ toJson(matrix.test-data.L1) }}

siracusa-redmule-kernels-tiled-doublebuffer-L2:
strategy:
fail-fast: false
matrix:
test-data:
- name: "testFloatMatmul"
L1: [8000]
num-cores:
- 8
double-buffer:
- true
uses: ./.github/workflows/TestRunnerTiledSiracusaWithRedmule.yml
needs: select-docker-image
with:
docker-image: ${{ needs.select-docker-image.outputs.image }}
test-name: ${{ matrix.test-data.name }}
num-cores: ${{ matrix.num-cores }}
L1: ${{ toJson(matrix.test-data.L1) }}
double-buffer: ${{ matrix.double-buffer }}

### Deeploy Extension and Internal Tests ###
deeploy-memory-allocation:
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/TestRunnerTiledSiracusaWithRedmule.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: TestRunnerTiledSiracusa

on:
workflow_call:
inputs:
docker-image:
required: true
type: string
test-name:
required: true
type: string
num-cores:
required: false
default: 8
type: number
L1:
required: false
default: "[64000]"
type: string
default-memory-level:
required: false
default: "L2"
type: string
double-buffer:
required: false
default: false
type: boolean
memory-allocation-strategy:
required: false
default: "MiniMalloc"
type: string
search-strategy:
required: false
default: "random-max"
type: string

jobs:

test-runner-siracusa-tiled:
strategy:
fail-fast: false
matrix:
L1: ${{ fromJSON(inputs.L1) }}
runs-on: ubuntu-22.04
container:
image: ${{ inputs.docker-image }}
steps:
- name: Checkout Repo
uses: actions/checkout@v4
with:
submodules: recursive
- name: Build Deeploy
run: pip install -e .
- name: Cache ccache
id: ccache-cache
uses: actions/cache@v4
with:
path: /app/.ccache
key: ${{ runner.os }}-ccache
- name: Run Test
uses: nick-fields/retry@v3
with:
timeout_minutes: 15
max_attempts: 3
retry_on: timeout
command: |
cd DeeployTest
mkdir -p /app/.ccache
export CCACHE_DIR=/app/.ccache
python testRunner_tiled_siracusa_w_redmule.py -t Tests/${{ inputs.test-name }} --cores=${{ inputs.num-cores }} --l1 ${{ matrix.L1 }} --defaultMemLevel=${{ inputs.default-memory-level }} ${{ inputs.double-buffer && '--doublebuffer' || '' }} --memAllocStrategy=${{ inputs.memory-allocation-strategy }} --searchStrategy=${{ inputs.search-strategy }}
shell: bash

16 changes: 15 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,4 +282,18 @@ Change main.c to use OUTPUTTYPE instead of float

### Changed
- The ISA for the Siracusa platform has been updated from rv32imc_zfinx_xpulpv2 to rv32imf_xpulpv2.
- All floating-point comparison tasks in deeploytest.c are now offloaded to Cluster 0 for execution.
- All floating-point comparison tasks in deeploytest.c are now offloaded to Cluster 0 for execution.

## Add RV32IMF Picolibc support for Siracusa platform

## Added
- Adds RV32IMF Picolib to the toolchain

## Parallelization and Optimization of CCT Inference and Training Kernels

### Added
- Parallel Matmul, Softmax, Gelu, Conv, Layernorm, Maxpool, Add
- Gelu with sigmoid approximation
- Im2col Conv
- Matmul with pulptrainlib with 1*7 unrolling performance aligned with pulptrainlib
- Compute op support for multiple float kernels: Maxpool, Relu, Mul
10 changes: 6 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ if(TOOLCHAIN STREQUAL GCC)
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION TRUE)
endif()

set(platform MemPool CACHE STRING "Platform (MemPool, QEMU, Siracusa, Siracusa_w_neureka, PULP-Open, Generic, Snitch)")
set_property(CACHE platform PROPERTY STRINGS MemPool QEMU Siracusa Siracusa_w_neureka PULP-Open Generic Snitch)
set(platform MemPool CACHE STRING "Platform (MemPool, QEMU, Siracusa, Siracusa_w_neureka, Siracusa_w_redmule, PULP-Open, Generic, Snitch)")
set_property(CACHE platform PROPERTY STRINGS MemPool QEMU Siracusa Siracusa_w_neureka Siracusa_w_redmule PULP-Open Generic Snitch)

if(platform STREQUAL MemPool)
message(STATUS "Building for platform 'MemPool'")
Expand All @@ -26,6 +26,8 @@ elseif(platform STREQUAL Siracusa)
message(STATUS "Building for platform 'Siracusa'")
elseif(platform STREQUAL Siracusa_w_neureka)
message(STATUS "Building for platform 'Siracusa_w_neureka'")
elseif(platform STREQUAL Siracusa_w_redmule)
message(STATUS "Building for platform 'Siracusa_w_redmule'")
elseif(platform STREQUAL PULPOpen)
message(STATUS "Building for platform 'PULP-Open'")
elseif(platform STREQUAL Generic)
Expand Down Expand Up @@ -148,7 +150,7 @@ if(platform STREQUAL QEMU-ARM)

endif()

if(platform STREQUAL Siracusa OR platform STREQUAL Siracusa_w_neureka OR platform STREQUAL PULPOpen)
if(platform STREQUAL Siracusa OR platform STREQUAL Siracusa_w_neureka OR platform STREQUAL Siracusa_w_redmule OR platform STREQUAL PULPOpen)

if(TOOLCHAIN STREQUAL LLVM)
set(CMAKE_TOOLCHAIN_FILE ${CMAKE_CURRENT_LIST_DIR}/cmake/pulp/toolchain_llvm.cmake)
Expand All @@ -158,7 +160,7 @@ if(platform STREQUAL Siracusa OR platform STREQUAL Siracusa_w_neureka OR platfor

include(${CMAKE_CURRENT_LIST_DIR}/cmake/pulp/pulp.cmake)

if(platform STREQUAL Siracusa OR platform STREQUAL Siracusa_w_neureka)
if(platform STREQUAL Siracusa OR platform STREQUAL Siracusa_w_neureka OR platform STREQUAL Siracusa_w_redmule)
include(${CMAKE_CURRENT_LIST_DIR}/cmake/pulp/siracusa/siracusa.cmake)
elseif(platform STREQUAL PULPOpen)
include(${CMAKE_CURRENT_LIST_DIR}/cmake/pulp/pulp-open/pulp-open.cmake)
Expand Down
4 changes: 3 additions & 1 deletion Container/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ RUN DEBIAN_FRONTEND=noninteractive apt-get install -y git-lfs \
libsdl2-ttf-dev \
gcc-multilib \
wget \
clang-format
clang-format \
libxtensor-dev \
libxsimd-dev

# Install cmake 3.31.1
RUN wget https://github.com/Kitware/CMake/releases/download/v3.31.1/cmake-3.31.1-linux-x86_64.sh && \
Expand Down
62 changes: 52 additions & 10 deletions Deeploy/Targets/Generic/Layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,15 +69,16 @@ def __init__(self, maps: List[NodeMapper]):
super().__init__(maps)

def computeOps(self):
compAbs = self.mapper.parser.operatorRepresentation['size']
compAdd = self.mapper.parser.operatorRepresentation['size']
compSqr = self.mapper.parser.operatorRepresentation['size']
compMul = self.mapper.parser.operatorRepresentation['size']
compAdd = self.mapper.parser.operatorRepresentation['size']
compMul2 = self.mapper.parser.operatorRepresentation['size']
compAdd2 = self.mapper.parser.operatorRepresentation['size']
compDiv = self.mapper.parser.operatorRepresentation['size']
return compAbs + compAdd + compSqr + compMul + compAdd + compMul2 + compAdd2 + compDiv
size = self.mapper.parser.operatorRepresentation['size']
# RW: Sigmoid approximation
mul1 = size # Multiply by 1.702
neg = size # Negate the result
exp = size # Compute exponential
add = size # Add 1
div = size # Division for sigmoid
mul2 = size # Final multiplication by x

return mul1 + neg + exp + add + div + mul2


class iHardswishLayer(ONNXLayer):
Expand Down Expand Up @@ -120,12 +121,39 @@ class SoftmaxLayer(ONNXLayer):
def __init__(self, maps: List[NodeMapper]):
super().__init__(maps)

def computeOps(self):

size = self.mapper.parser.operatorRepresentation['size']
last_dim_length = self.mapper.parser.operatorRepresentation['lastDimLength']
batch_size = size // last_dim_length

max_ops = last_dim_length - 1
exp_ops = last_dim_length * 2
sum_ops = last_dim_length - 1
div_ops = last_dim_length
ops_per_batch = max_ops + exp_ops + sum_ops + div_ops
total_ops = ops_per_batch * batch_size

return total_ops


class SoftmaxGradLayer(ONNXLayer):

def __init__(self, maps: List[NodeMapper]):
super().__init__(maps)

def computeOps(self):
input_size = self.mapper.parser.operatorRepresentation['size']

# SoftmaxGrad operation: dy * (y - (y * sum(dy * y)))
mul_ops = input_size
sum_ops = input_size
broadcast_mul_ops = input_size
sub_ops = input_size
final_mul_ops = input_size

return mul_ops + sum_ops + broadcast_mul_ops + sub_ops + final_mul_ops


class ITAMaxLayer(ONNXLayer):

Expand Down Expand Up @@ -252,7 +280,7 @@ def computeShapes(self, inputShapes: Shape, outputShapes: Shape, operatorReprese
N = inputShapes[1][-1]

if len(inputShapes) == 3:
inputShapes[2] = [M, N]
inputShapes[2] = outputShapes[0]

return (inputShapes, outputShapes)

Expand Down Expand Up @@ -317,6 +345,9 @@ def computeShapes(self, inputShapes: Shape, outputShapes: Shape, operatorReprese
inputShapes[0] = inputShapes[1]
return (inputShapes, outputShapes)

def computeOps(self):
return self.mapper.parser.operatorRepresentation['size']


class ConvLayer(ONNXLayer):

Expand Down Expand Up @@ -374,6 +405,14 @@ class MaxPoolLayer(ONNXLayer):
def __init__(self, maps: List[NodeMapper]):
super().__init__(maps)

def computeOps(self):
kernel_shape = self.mapper.parser.operatorRepresentation['kernel_shape']
elements_per_window = int(np.prod(kernel_shape))
data_out_size = self.mapper.parser.operatorRepresentation['data_out_size']
comparisons_per_window = elements_per_window - 1
total_ops = data_out_size * comparisons_per_window
return total_ops


class ReduceMeanLayer(ONNXLayer):

Expand Down Expand Up @@ -403,6 +442,9 @@ class ReluLayer(ONNXLayer):
def __init__(self, maps: List[NodeMapper]):
super().__init__(maps)

def computeOps(self):
return self.mapper.parser.operatorRepresentation['size']


class LayerNormLayer(ONNXLayer):

Expand Down
8 changes: 4 additions & 4 deletions Deeploy/Targets/Generic/Templates/FloatGELUTemplate.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# ----------------------------------------------------------------------
#
# File: iGELUTemplate.py
# File: FloatGELUTemplate.py
#
# Last edited: 13.12.2021
# Last edited: 28.03.2025
#
# Copyright (C) 2021, ETH Zurich and University of Bologna.
#
# Author: Moritz Scherer, ETH Zurich
# Author: Run Wang, ETH Zurich
#
# ----------------------------------------------------------------------
# SPDX-License-Identifier: Apache-2.0
Expand All @@ -28,4 +28,4 @@
referenceTemplate = NodeTemplate("""
// GELU (Name: ${nodeName}, Op: ${nodeOp})
SINGLE_CORE GELU_fp${data_in_type.referencedType.typeWidth}_fp${data_out_type.referencedType.typeWidth}(${data_in}, ${data_out}, ${size});
""")
""")
Loading