Description

Comparison of Graph Pattern Quality Measures v1.0.1

Description

This repository contains the source code and data used in article Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing.

Content

Organization
Installation
Usage
Dependencies
References

Organization

This repository is composed of the following elements:

requirements.txt: List of required Python packages.
src: folder containing the source code
- ClusteringComparison.py: script that reproduces the experiments of Section 5.2.1. and Section 5.2.3.
- KendallTauHistogram.py: script that reproduces the experiments of Section 5.2.2.
- PairwiseComparisons.py: script that reproduces the experiments of Section 5.3.
- GoldStandardComparison.py: script that reproduces the experiments of Section 5.4.
data: folder containing the input data. Each subfolder corresponds to a distinct dataset, cf. Section Datasets.
results: files produced by the processing.

Installation

Python and Packages

First, you need to install the Python language and the required packages:

Install the Python language
Download this project from GitHub and unzip.
Execute pip install -r requirements.txt to install the required packages (see also Section Dependencies).

Non-Python Dependencies

Second, one of the dependencies, SPMF, is not a Python package, but rather a Java program, and therefore requires a specific installation process:

Download its source code on Philippe Fournier-Viger's website.
Follow the installation instructions provided on the same website.

Note that we use the JAR implementation of SPMF.

Data

We retrieved the datasets from the SPMF website; they include:

MUTAG : MUTAG dataset, representing chemical compounds and their mutagenic properties [D'91]
NCI1 : NCI1 dataset, representing molecules and classified according to carcinogenicity [W'06]
PTC : PTC dataset, representing molecules and classified according to carcinogenicity [T'03]
DD : DD dataset, representing amino acids and their interactions [D'03]
IMDB-Binary : IMDB-Binary dataset, representing movie collaboration graphs [Y'15]

We retrieve two dataset from the TU Dataset website:

AIDS dataset, representing chemical compounds tested for AIDS inhibition [R'08]
FRANKENSTEIN dataset, representing chemical compounds tested and their mutagenic properties [O'15]

The public procurement dataset contains graphs extracted from the FOPPA database, available on Zenodo:

FOPPA : dataset extracted from FOPPA, a database of French public procurement notices [P'23b]

Usage

We provide two scripts to reproduces the expriments:

General.sh: reproduces all experiments described in our paper.
OneDataset.sh (dataset): reproduces the experiments concerning the specific dataset.

Each script extracts the data and then performs the associated experiments.

Dependencies

Tested with python version 3.12.2 and the following packages:

pandas: version 2.2.1
numpy: version 1.26.4
networkx: version 3.2.1
sklearn: version 1.2.2
matplotlib: version 3.8.0
tqdm: version 4.66.4
rbo: version 0.1.3
shap: version 0.45.0
xgboost: version 2.1.0
scipy: version 1.11.4

Tested with SPMF version 2.62, which implements gSpan [Y'02] (to mine frequent patterns)

References

[D'91] A. S. Debnath, R. L. Lopez, G. Debnath, A. Shusterman, C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry 34(2):786–797, 1991. DOI: 10.1021/jm00106a046
[D'03] P. D. Dobson, A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments, Journal of Molecular Biology 330(4):771–783, 2003. DOI: 10.1016/S0022-2836(03)00628-4
[H'14'] M. Houbraken, S. Demeyer, T. Michoel, P. Audenaert, D. Colle, M. Pickavet. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration, PLoS ONE 9(5):e97896, 2014. DOI: 10.1371/journal.pone.0097896.
[O'15] F. Orsini, P. Frasconi, L. De Raedt. Graph invariant kernels, 24th International Conference on Artificial Intelligence, pp. 3756–3762, 2015. DOI: 10.5555/2832747.2832773
[P'23b] L. Potin, V. Labatut, P. H. Morand & C. Largeron. FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020, Scientific Data, 2023, 10:303. DOI: 10.1038/s41597-023-02213-z
[T'03] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, C. Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001, Bioinformatics 19(10):1183–1193, 2003. DOI: 10.1093/bioinformatics/btg130
[W'06] N. Wale, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification, 6th International Conference on Data Mining, pp. 678–689, 2006. DOI: 10.1007/s10115-007-0103-5
[Y'02] X. Yan, J. Han. gSpan: Graph-based substructure pattern mining, IEEE International Conference on Data Mining, pp.721-724, 2002. DOI: 10.1109/ICDM.2002.1184038
[Y'15] P. Yanardag, S.V.N. Vishwanathan. Deep Graph Kernels, 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374, 2015. DOI: 10.1145/2783258.2783417

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparison of Graph Pattern Quality Measures v1.0.1

Description

Organization

Installation

Python and Packages

Non-Python Dependencies

Data

Usage

Dependencies

References

About

Releases 2

Packages

Contributors 2

Languages

License

CompNet/gpQualMeasComp

Folders and files

Latest commit

History

Repository files navigation

Comparison of Graph Pattern Quality Measures v1.0.1

Description

Organization

Installation

Python and Packages

Non-Python Dependencies

Data

Usage

Dependencies

References

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages