This is a CUDA implementation of BFS from the "Large Graph Algorithms for Massively Multithreaded Architectures" paper (Harish et al).
Compile with $ bash compile.sh (sorry, no makefile). This will create an executable called ./demo.
Run it like so: ./demo /path/to/graph.txt <start-vertex-idx> (cpu | gpu).
Example: ./demo ./test-data/small.txt 0 gpu.
On i-th line of stdout, it will write the distance between vertex <start-vertex-idx> and vertex i, or a very large number (> 4e9) if i isn't reachable from <start-vertex-idx>.
<number-of-vertices> <number-of-edges> (directed | undirected)
<edge-start-vertex-index> <edge-end-vertex-index>
<edge-start-vertex-index> <edge-end-vertex-index>
...
<edge-start-vertex-index> <edge-end-vertex-index>
There's an example at ./test-data/small.txt.
demo.cpphasmain()and is the only executable utility.- The only public header is
bfs.hpp. - The reference CPU version
bfsCPU()is inbfs_cpu.cpp. - The CUDA version
bfsCUDA()is inbfs.cu. - It uses
bfs_kernels.cu, which hasBFSKernel1()andBFSKernel2()(Algorithms 2 and 3 in the paper);compaction.cu, which hascompactSIMD()(Section 3.2 in the paper);scan.cuandscan_kernels.cu, which have, most notably,prescanArray()(an off-the-shelf implementation of scan operation).