hugegraph-ml
is a tool that integrates HugeGraph with popular graph learning libraries.
It implements most graph learning algorithms, enabling users to perform end-to-end graph learning workflows directly from HugeGraph using hugegraph-ml
.
Graph data can be read directly from HugeGraph
and used for tasks such as node embedding, node classification, and graph classification.
The implemented algorithm models can be found in the models folder.
- python 3.9+
- hugegraph-server 1.0+
-
Start the HugeGraph database, you can do it via Docker/Binary packages. Refer to docker-link & deploy-doc for guidance
-
Clone this project
git clone https://github.com/apache/incubator-hugegraph-ai.git
-
Install hugegraph-python-client and hugegraph-ml
cd ./incubator-hugegraph-ai # better to use virtualenv (source venv/bin/activate) pip install ./hugegraph-python-client cd ./hugegraph-ml/ pip install -e .
-
Enter the project directory
cd ./hugegraph-ml/src
Make sure that the Cora dataset is already in your HugeGraph database.
If not, you can run the import_graph_from_dgl
function to import the Cora
dataset from DGL
into
the HugeGraph
database.
from hugegraph_ml.utils.dgl2hugegraph_utils import import_graph_from_dgl
import_graph_from_dgl("cora")
Run dgi_example.py to view the example.
python ./hugegraph_ml/examples/dgi_example.py
The specific process is as follows:
1. Graph data convert
Convert the graph from HugeGraph
to DGL
format.
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.dgi import DGI
from hugegraph_ml.models.mlp import MLPClassifier
from hugegraph_ml.tasks.node_classify import NodeClassify
from hugegraph_ml.tasks.node_embed import NodeEmbed
hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")
2. Select model instance
model = DGI(n_in_feats=graph.ndata["feat"].shape[1])
3. Train model and node embedding
node_embed_task = NodeEmbed(graph=graph, model=model)
embedded_graph = node_embed_task.train_and_embed(add_self_loop=True, n_epochs=300, patience=30)
4. Downstream tasks node classification using MLP
model = MLPClassifier(
n_in_feat=embedded_graph.ndata["feat"].shape[1],
n_out_feat=embedded_graph.ndata["label"].unique().shape[0]
)
node_clf_task = NodeClassify(graph=embedded_graph, model=model)
node_clf_task.train(lr=1e-3, n_epochs=400, patience=40)
print(node_clf_task.evaluate())
5. Obtain the metrics
{'accuracy': 0.82, 'loss': 0.5714246034622192}
You can refer to the example in the grand_example.py
from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.grand import GRAND
from hugegraph_ml.tasks.node_classify import NodeClassify
hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")
model = GRAND(
n_in_feats=graph.ndata["feat"].shape[1],
n_out_feats=graph.ndata["label"].unique().shape[0]
)
node_clf_task = NodeClassify(graph, model)
node_clf_task.train(lr=1e-2, weight_decay=5e-4, n_epochs=2000, patience=100)
print(node_clf_task.evaluate())