Skip to content

Commit da6e7a5

Browse files
committed
update readme
1 parent a48b659 commit da6e7a5

File tree

1 file changed

+21
-16
lines changed

1 file changed

+21
-16
lines changed

README.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
An AI ensemble model for predicting chemical classes in the ChEBI ontology. It integrates deep learning models,
33
rule-based models and generative AI-based models.
44

5-
A web application for the ensemble is available at https://chebifier.hastingslab.org/.
5+
A web application for Chebifier is available at https://chebifier.hastingslab.org/.
66

77
## Installation
88

@@ -38,23 +38,27 @@ The package provides a command-line interface (CLI) for making predictions using
3838
The ensemble configuration is given by a configuration file (by default, this is `chebifier/ensemble.yml`). If you
3939
want to change which models are included in the ensemble or how they are weighted, you can create your own configuration file.
4040

41-
Model weights for deep learning models are automatically downloaded from [Hugging Face](https://huggingface.co/chebai).
42-
To use specific model weights from Hugging face, add the `load_model` key in your configuration file. For example:
41+
Trained deep learning models are automatically downloaded from [Hugging Face](https://huggingface.co/chebai).
42+
To access a model from Hugging face, add the `load_model` key in your configuration file. For example:
4343

4444
```yaml
4545
my_electra:
4646
type: electra
47-
load_model: "electra_chebi50_v241"
47+
load_model: "electra_chebi50-3star_v244"
4848
```
4949
5050
### Available model weights:
5151
52+
* `resgated-aug_chebi50-3star_v244`
53+
* `gat-aug_chebi50_v244`
54+
* `electra_chebi50-3star_v244`
55+
* `gat_chebi50_v244`
5256
* `electra_chebi50_v241`
5357
* `resgated_chebi50_v241`
5458
* `c3p_with_weights`
5559

5660

57-
However, you can also supply your own model checkpoints (see `configs/example_config.yml` for an example).
61+
You can also supply your own model checkpoints (see `configs/example_config.yml` for an example).
5862

5963
```bash
6064
# Make predictions
@@ -72,12 +76,12 @@ python -m chebifier predict --help
7276

7377
### Python API
7478

75-
You can also use the package programmatically:
79+
You can use the package programmatically as well:
7680

7781
```python
7882
from chebifier import BaseEnsemble
7983
80-
# Instantiate ensemble model. If desired, can pass
84+
# Instantiate ensemble model. Optionally, you can pass
8185
# a path to a configuration, like 'configs/example_config.yml'
8286
ensemble = BaseEnsemble()
8387
@@ -100,11 +104,12 @@ Currently, the following models are supported:
100104

101105
| Model | Description | #Classes | Publication | Repository |
102106
|-------|-------------|----------|-----------------------------------------------------------------------|----------------------------------------------------------------------------------------|
103-
| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
104-
| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
107+
| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1531* | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
108+
| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1531* | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
109+
| `gat` | A Graph Attention Network trained on ChEBI molecules. | 1531* | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
105110
| `chemlog_peptides` | A rule-based model specialised on peptide classes. | 18 | [Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987) | [chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides) |
106111
| `chemlog_element`, `chemlog_organox` | Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 | | [chemlog-extra](https://github.com/ChEB-AI/chemlog-extra) |
107-
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
112+
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, Journal of Cheminsformatics](https://link.springer.com/article/10.1186/s13321-025-01092-3) | [c3p](https://github.com/chemkg/c3p) |
108113

109114
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
110115
matched by a SMILES string. This is not activated by default, but can be included by adding
@@ -116,6 +121,8 @@ chebi_lookup:
116121
to your configuration file.
117122

118123
### The ensemble
124+
For an extended description of the ensemble, see [Flügel, Simon, et al., 2025: Chebifier 2: An Ensemble for Chemistry](https://ceur-ws.org/Vol-4064/SymGenAI4Sci-paper4.pdf).
125+
119126
<img width="700" alt="ensemble_architecture" src="https://github.com/user-attachments/assets/9275d3cd-ac88-466f-a1e9-27d20d67543b" />
120127

121128
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
@@ -146,20 +153,18 @@ Therefore, if in doubt, we are more confident in the negative prediction.
146153

147154
Confidence can be disabled by the `use_confidence` parameter of the predict method (default: True).
148155

149-
The model_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
156+
The`model_weight` can be set for each model in the configuration file (default: 1). This is used to favor a certain
150157
model independently of a given class.
151-
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
152-
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as 1 + the F1 score.
158+
`Trust` is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
159+
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as $F1^6.25$.
153160
If the `ensemble_type` is set to `mv` (the default), the trust is set to 1 for all models.
154161

155162
### Inconsistency resolution
156163
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
157164
and disjointness axioms is checked. This is
158165
done in 3 steps:
159166
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
160-
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
161-
speaking, if we have determined that a molecule belongs to a specific class (e.g., aromatic primary alcohol), it also
162-
belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic alcohol, alcohol).
167+
the is-a relation in ChEBI), we set the ensemble prediction of $A$ to $0$ if the _absolute value_ of $B$'s score is large than that of $A$. For example, if $A$ has a net score of $3$ and $B$ has a net score of $-4$, the ensemble will set $A$ to $0$ (i.e., predict neither $A$ nor $B$).
163168
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
164169
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
165170
`data>disjoint_chebi.csv` and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict

0 commit comments

Comments
 (0)