Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects

Our ICS 2026 paper (PDF coming soon) introduces Trident partitioning, a hierarchy-aware hybrid 2D–1D decomposition for distributed SpGEMM that cuts inter-node communication, achieving up to 2.4× speedup and 2× lower communication volume on NERSC’s Perlmutter system.

Modules

Perlmutter: cmake/3.30.2 cudatoolkit/12.9 craype-accel-nvidia80 craype-hugepages2G cray-pmi/6.1.15
Leonardo: cmake/3.27.9 gcc/12.2.0 cuda/12.2 openmpi/4.1.6--gcc--12.2.0-cuda-12.2 openblas/0.3.26--gcc--12.2.0

Trident

First, install Kokkos:

scripts/kokkos_install.sh

Baselines

Installation paths are customizable in scripts/variables.sh.

Trilinos

To build Trilinos as a shared library, run:

scripts/trilinos_install.sh

Notes:

Currently the script builds for GPU architecture AMPERE80

MCL PUT + CSC

Citation

If you find this repo helpful to your work, please cite our article:

@inproceedings{trident,
  author    = {Bellavita, Julian and Pichetti, Lorenzo and Pasquali, Thomas and Vella, Flavio and Guidi, Giulia},
  title     = {{C}ommunication-{A}voiding {SpGEMM} via {T}rident {P}artitioning on {H}ierarchical {GPU} {I}nterconnects},
  booktitle = {Proceedings of the 40th ACM International Conference on Supercomputing},
  series    = {ICS '26},
  year      = {2026},
  month     = {July},
  address   = {Belfast, Northern Ireland, United Kingdom},
  publisher = {ACM},
}

Acknowledgment

This work was a collaboration between the HiCrest Laboratory at the University of Trento (Italy) and the Cornell HPC Group at Cornell University (USA). The first author was supported by DOE CSGF. This work was partially developed during the second author's research visit at Cornell University. The authors want to thank Benjamin Brock for his help in running and benchmarking the BCL 2D asynchronous SpGEMM. The authors acknowledge financial support from ICSC – Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing, funded by European Union – NextGenerationEU. This work has received funding from the NETwork for European EXAscale Systems (NET4EXA) EU project under grant agreement No 101175702 and the NationalInstitute of Higher Mathematics Francesco Severi. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 using NERSC award ASCR-ERCAP0030076.

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
amg_matrices		amg_matrices
bcl @ 01a2b3c		bcl @ 01a2b3c
ccutils @ 8ce988e		ccutils @ 8ce988e
comparison		comparison
distributed_mmio @ 2e7c9b3		distributed_mmio @ 2e7c9b3
include		include
scripts		scripts
small_matrices		small_matrices
src		src
test		test
yaml_files		yaml_files
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
run_interactive_test.sh		run_interactive_test.sh
test_compression.sbatch		test_compression.sbatch
test_correctness.sh		test_correctness.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects

Modules

Trident

Baselines

Trilinos

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnects

Modules

Trident

Baselines

Trilinos

Citation

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages