Graphalytics Giraph platform driver

Apache Giraph is an iterative graph processing system built for high scalability, originated as the open-source counterpart to Google's Pregel, inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant.

To execute Graphalytics benchmark on Giraph, follow the steps in the Graphalytics tutorial on Running Benchmark with the Giraph-specific instructions listed below.

Obtain the platform driver

There are two possible ways to obtain the Giraph platform driver:

**Download the (prebuild) Giraph platform driver distribution from our website.
Build the platform drivers:

(To be deprecated): Current it is required to build the Graphlytics core libraries with mvn clean install -Pgranula, soon it will be available via Maven central repo.
Download the source code from this repository.
Execute mvn clean package in the root directory (See more details in Software Build).
Extract the distribution from graphalytics-{graphalytics-version}-giraph-{platform-version}.tar.gz.

Verify the necessary prerequisites

The softwares listed below are required by the Giraph platform driver, which must be available in the cluster environment. Softwares that are provided are already included in the platform driver.

Software	Version (tested)	Usage	Description	Provided
Giraph	1.6.0	Platform	Providing Giraph implementation	✔(maven)
Graphalytics	1.0 (TODO)	Driver	Graphalytics benchmark suite	✔(maven)
Granula	0.1 (TODO)	Driver	Fine-grained performance analysis	✔(maven)
YARN	2.6.1	Deployment	Job provisioning and allocation	-
Zookeeper	3.4.1	Deployment	Synchronizing Giraph workers	-
JDK	1.7.0+	Build	Java virtual machine	-
Maven	3.3.9	Build	Building the platform driver	-

Yarn: should be reachable in the compute node where the benchmark will be executed.
Zookeeper: should be running in a compute node accessible via the network.

Adjust the benchmark configurations

Adjust the Giraph configurations in config/platform.properties:

platform.giraph.zoo-keeper-address: Set to the hostname and port on which ZooKeeper is running.
platform.giraph.job.heap-size: Set to the amount of heap space (in MB) each worker should have. As Giraph runs on MapReduce, this setting corresponds to the JVM heap specified for each map task, i.e., mapreduce.map.java.opts.
platform.giraph.job.memory-size: Set to the amount of memory (in MB) each worker should have. This corresponds to the amount of memory requested from the YARN resource manager for each worker, i.e., mapreduce.map.memory.mb.
platform.giraph.job.worker-count: Set to an appropriate number of workers for the Hadoop cluster. Note that Giraph launches an additional master process.
platform.hadoop.home: Set to the root of your Hadoop installation ($HADOOP_HOME).

Known Issues

Benchmark reports will report nan as processing time when yarn log aggregation is off. The solution is to enable log aggregation in the yarn-site.xml file by setting yarn.log-aggregation-enable to true.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
bin/sh		bin/sh
config-template		config-template
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graphalytics Giraph platform driver

Obtain the platform driver

Verify the necessary prerequisites

Adjust the benchmark configurations

Known Issues

About

Uh oh!

Releases

Packages

Languages

License

chrislemaire/graphalytics-platforms-giraph

Folders and files

Latest commit

History

Repository files navigation

Graphalytics Giraph platform driver

Obtain the platform driver

Verify the necessary prerequisites

Adjust the benchmark configurations

Known Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages