This repository contains the code for "SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators". Please follow the instructions in the AE Appendix of our corresponding HPCA 2025 paper.
- First you will need to install some python libs:
pip install -r requirements.txtorconda install --file requirements.txt - Optional: we use OpenMP to do multi-threaded search, if you do not want this, just comment out
-fopenmpin theMakefile.
./build.sh
./run.sh --eta
./get_results.shAfter running build.sh, you can execute a single experiment using the command:
./build/soma 108 2 512 1 8 1 8 4 256 512 849779186 results/dse- Network (108): Specifies the neural network to be used. (Full list below)
- Baseline Type (2): Must be
2, as other values are not supported. - Sequence Length (512): Relevant for LLMs; ignored for CNNs.
- Number of Segments (1): Used when a network is too large and needs partitioning for scheduling.
- L2 Buffer Size (8 MB): Defines the L2 buffer size in megabytes.
- Batch Size (1): Specifies the number of input samples processed at once.
- DRAM Bandwidth Ratio (8): Ratio of DRAM bandwidth (GB/s) to computational power (TOPS). Default is
1. - PE Array Dimension (4): Typically ranges from
4to16. - L2 Buffer Bandwidth (256 GB/s): Specifies the bandwidth for L2 buffer.
- MAC Units per PE (512): Determines the number of multiply-accumulate (MAC) units per PE. TOPS is calculated as:
TOPS = 2 * mac_num * PE_ARRAY_Dim^2 / 1024 - Random Seed (849779186): Used for the random number generator.
- Results Folder (
results/dse): Specifies where the experiment results are stored.
0: Darknet191: VGG192: ResNet503: GoogLeNet4: ResNet1015: DenseNet6: Inception-ResNet-V17: GNMT8: LSTM9: ZFNet10: Transformer11: Transformer Cell12: PNASNet13: ResNeXt5014: ResNet15215: Transformer Big Cell16: RetinaNet-ResNet5017: U-Net18: RandWire Small19: RandWire Large
101: GPT-J 6B (Decode)102: GPT-J 6B (Prefill)103: LLaMa 2 70B (Decode)104: LLaMa 2 70B (Prefill)105: BERT Base106: BERT Large107: GPT-2 Small (Decode)108: GPT-2 Small (Prefill)109: GPT-2 XL (Decode)110: GPT-2 XL (Prefill)
For unsupported models, the program will throw an error: Model not supported.
Contains header files for the project.
Holds Python scripts used for processing and analyzing experiment results.
Contains the source code (C++) for implementing the SoMa Framework.
Note on GPT: Please note that for GPT-2, we actually explored only one block, so in the data processing script, the corresponding latency is multiplied by the number of blocks. For GPT-2-small in the DSE experiments, the number of blocks is 12.
If you encounter any issues during the AE process, please contact:
Jingwei Cai (Tsinghua University) [email protected]
Xuan Wang (Xi'an Jiaotong University, Institute for Interdisciplinary Information Core Technology) [email protected]