Skip to content

Latest commit

 

History

History

hugegraph-ml

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

hugegraph-ml

Summary

hugegraph-ml is a tool that integrates HugeGraph with popular graph learning libraries. It implements most graph learning algorithms, enabling users to perform end-to-end graph learning workflows directly from HugeGraph using hugegraph-ml. Graph data can be read directly from HugeGraph and used for tasks such as node embedding, node classification, and graph classification. The implemented algorithm models can be found in the models folder.

model paper
AGNN https://arxiv.org/abs/1803.03735
APPNP https://arxiv.org/abs/1810.05997
ARMA https://arxiv.org/abs/1901.01343
BGNN https://arxiv.org/abs/2101.08543
BGRL https://arxiv.org/abs/2102.06514
CARE-GNN https://arxiv.org/abs/2008.08692
Cluster-GCN https://arxiv.org/abs/1905.07953
C&S https://arxiv.org/abs/2010.13993
DAGNN https://arxiv.org/abs/2007.09296
DeeperGCN https://arxiv.org/abs/2006.07739
DGI https://arxiv.org/abs/1809.10341
DiffPool https://arxiv.org/abs/1806.08804
GATNE https://arxiv.org/abs/1905.01669
GRACE https://arxiv.org/abs/2006.04131
GRAND https://arxiv.org/abs/2005.11079
JKNet https://arxiv.org/abs/1806.03536
P-GNN http://proceedings.mlr.press/v97/you19b/you19b.pdf
SEAL https://arxiv.org/abs/1802.09691

Environment Requirements

  • python 3.9+
  • hugegraph-server 1.0+

Preparation

  1. Start the HugeGraph database, you can do it via Docker/Binary packages. Refer to docker-link & deploy-doc for guidance

  2. Clone this project

    git clone https://github.com/apache/incubator-hugegraph-ai.git
  3. Install hugegraph-python-client and hugegraph-ml

    cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) 
    pip install ./hugegraph-python-client
    cd ./hugegraph-ml/
    pip install -e .
  4. Enter the project directory

    cd ./hugegraph-ml/src

Examples

Perform node embedding on the Cora dataset using the DGI model

Make sure that the Cora dataset is already in your HugeGraph database. If not, you can run the import_graph_from_dgl function to import the Cora dataset from DGL into the HugeGraph database.

from hugegraph_ml.utils.dgl2hugegraph_utils import import_graph_from_dgl

import_graph_from_dgl("cora")

Run dgi_example.py to view the example.

python ./hugegraph_ml/examples/dgi_example.py

The specific process is as follows:

1. Graph data convert

Convert the graph from HugeGraph to DGL format.

from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.dgi import DGI
from hugegraph_ml.models.mlp import MLPClassifier
from hugegraph_ml.tasks.node_classify import NodeClassify
from hugegraph_ml.tasks.node_embed import NodeEmbed

hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")

2. Select model instance

model = DGI(n_in_feats=graph.ndata["feat"].shape[1])

3. Train model and node embedding

node_embed_task = NodeEmbed(graph=graph, model=model)
embedded_graph = node_embed_task.train_and_embed(add_self_loop=True, n_epochs=300, patience=30)

4. Downstream tasks node classification using MLP

model = MLPClassifier(
   n_in_feat=embedded_graph.ndata["feat"].shape[1], 
   n_out_feat=embedded_graph.ndata["label"].unique().shape[0]
)
node_clf_task = NodeClassify(graph=embedded_graph, model=model)
node_clf_task.train(lr=1e-3, n_epochs=400, patience=40)
print(node_clf_task.evaluate())

5. Obtain the metrics

{'accuracy': 0.82, 'loss': 0.5714246034622192}

Perform node classification on the Cora dataset using the GRAND model.

You can refer to the example in the grand_example.py

from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.grand import GRAND
from hugegraph_ml.tasks.node_classify import NodeClassify

hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")
model = GRAND(
    n_in_feats=graph.ndata["feat"].shape[1],
    n_out_feats=graph.ndata["label"].unique().shape[0]
)
node_clf_task = NodeClassify(graph, model)
node_clf_task.train(lr=1e-2, weight_decay=5e-4, n_epochs=2000, patience=100)
print(node_clf_task.evaluate())