Implement Single-Source Shortest Paths (SSSP) on CPU, GPU (CUDA), and Hybrid (CPU-GPU)
Note: Before run make
, if in Linux please modify the OPENMPFLAGS=-Xcompiler -openmp
to OPENMPFLAGS=-Xcompiler -fopenmp
in Makefile
. In Windows, no need to make modification.
Run make
in the root folder to generate the executable file.
The core algorithm of this project is Bellman-Ford Algorithm.
- Loop all edges to update vertexs' distance to source node.
- Repeate Step 1 until there is no vertex needs to update its distance to source.
- Divide all edges into multiple parts.
- Launch multiple threads to process the edges assigned from Step 1.
- Repeate Step 1 and Step 2 until there is no vertex needs to update its distance to source. Basic implementation of dijkstra algorithm on GPU.
- Use compiled file
sssp
to run the GPU's implementation.
- The edges are split into two parts and assigned to CPU and GPU separately.
- CPU will use OpenMP to launch multiple threads to process the edges.
- Using CUDA to launch GPU threads to process edges.
- After each iteration, copy the dist array back to host then use OpenMP to launch threads to merge the dist of CPU's and GPU's.
- Use compiled file
openmp
to run the hybrid's implementation.
To better utilize the computing resource of CPU and GPU. The splitRatio is very important. I use a simple formula to dynamically change the splitRatio.
$ ./sssp --input path_of_graph
$ ./openmp --input path_of_graph
for openmp
it will run hybrid implementation and GPU-only's implementation. You also can specify argument --oncpu true to run a CPU-only parallel implementation (OpenMP).
Optional arguments:
[--oncpu]: Run this graph on CPU. Its value must be true/false (default: false). E.g., --oncpu true
[--source]: Set the source node (default: minimum node number). E.g., --source 0
Input graphs should be in form of plain text files. The format of each edge is as following:
source end weight
if the weight is not specified, it will be assigned to a default value: 1.
Datasets | Nodes | Edges | Diameter |
---|---|---|---|
simpleGraph | 9 | 11 | 4 |
email-Eu-core | 1,005 | 25,571 | 7 |
Wiki-Vote | 8,298 | 103,689 | 7 |
CA-HepTh | 9,877 | 25,998 | 17 |
p2p-Gnutella30 | 36,682 | 88,328 | 10 |
Slashdot0811 | 77,350 | 516,575 | 11 |
higgs_social_network | 456,626 | 14,855,842 | 9 |
roadNet-PA | 1,088,092 | 1,541,898 | 786 |
Experiment Platform
- CPU: i7-4720HQ (4 cores with 8 threads)
- Memory: 16 GB DDR3 1600MHz
- GPU: GTX965M with 2 GB Memory
How load balancing works in hybrid
- Improve the speed of loading graph
- Setting size of message -- Size is not correct
- bottleneck: data transferring between host and device
- apporach the ideal split ratio faster
LaTeX used in markdown