You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1522 | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
104
-
| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1522 | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
107
+
| `electra` | A transformer-based deep learning model trained on ChEBI SMILES strings. | 1531* | [Glauer, Martin, et al., 2024: Chebifier: Automating semantic classification in ChEBI to accelerate data-driven discovery, Digital Discovery 3 (2024) 896-907](https://pubs.rsc.org/en/content/articlehtml/2024/dd/d3dd00238a) | [python-chebai](https://github.com/ChEB-AI/python-chebai) |
108
+
| `resgated` | A Residual Gated Graph Convolutional Network trained on ChEBI molecules. | 1531* | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
109
+
| `gat` | A Graph Attention Network trained on ChEBI molecules. | 1531* | | [python-chebai-graph](https://github.com/ChEB-AI/python-chebai-graph) |
105
110
| `chemlog_peptides` | A rule-based model specialised on peptide classes. | 18 | [Flügel, Simon, et al., 2025: ChemLog: Making MSOL Viable for Ontological Classification and Learning, arXiv](https://arxiv.org/abs/2507.13987) | [chemlog-peptides](https://github.com/sfluegel05/chemlog-peptides) |
106
111
| `chemlog_element`, `chemlog_organox` | Extensions of ChemLog for classes that are defined either by the presence of a specific element or by the presence of an organic bond. | 118 + 37 | | [chemlog-extra](https://github.com/ChEB-AI/chemlog-extra) |
107
-
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, arXiv](https://arxiv.org/abs/2505.18470) | [c3p](https://github.com/chemkg/c3p) |
112
+
| `c3p` | A collection _Chemical Classifier Programs_, generated by LLMs based on the natural language definitions of ChEBI classes. | 338 | [Mungall, Christopher J., et al., 2025: Chemical classification program synthesis using generative artificial intelligence, Journal of Cheminsformatics](https://link.springer.com/article/10.1186/s13321-025-01092-3) | [c3p](https://github.com/chemkg/c3p) |
108
113
109
114
In addition, Chebifier also includes a ChEBI lookup that automatically retrieves the ChEBI superclasses for a class
110
115
matched by a SMILES string. This is not activated by default, but can be included by adding
@@ -116,6 +121,8 @@ chebi_lookup:
116
121
to your configuration file.
117
122
118
123
### The ensemble
124
+
For an extended description of the ensemble, see [Flügel, Simon, et al., 2025: Chebifier 2: An Ensemble for Chemistry](https://ceur-ws.org/Vol-4064/SymGenAI4Sci-paper4.pdf).
Given a sample (i.e., a SMILES string) and models $m_1, m_2, \ldots, m_n$, the ensemble works as follows:
@@ -146,20 +153,18 @@ Therefore, if in doubt, we are more confident in the negative prediction.
146
153
147
154
Confidence can be disabled by the `use_confidence` parameter of the predict method (default: True).
148
155
149
-
Themodel_weight can be set for each model in the configuration file (default: 1). This is used to favor a certain
156
+
The`model_weight` can be set for each model in the configuration file (default: 1). This is used to favor a certain
150
157
model independently of a given class.
151
-
Trust is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
152
-
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as 1 + the F1 score.
158
+
`Trust`is based on the model's performance on a validation set. After training, we evaluate the Machine Learning models
159
+
on a validation set for each class. If the `ensemble_type` is set to `wmv-f1`, the trust is calculated as $F1^6.25$.
153
160
If the `ensemble_type` is set to `mv` (the default), the trust is set to 1 for all models.
154
161
155
162
### Inconsistency resolution
156
163
After a decision has been made for each class independently, the consistency of the predictions with regard to the ChEBI hierarchy
157
164
and disjointness axioms is checked. This is
158
165
done in 3 steps:
159
166
- (1) First, the hierarchy is corrected. For each pair of classes $A$ and $B$ where $A$ is a subclass of $B$ (following
160
-
the is-a relation in ChEBI), we set the ensemble prediction of $B$ to 1 if the prediction of $A$ is 1. Intuitively
161
-
speaking, if we have determined that a molecule belongs to a specific class (e.g., aromatic primary alcohol), it also
162
-
belongs to the direct and indirect superclasses (e.g., primary alcohol, aromatic alcohol, alcohol).
167
+
the is-a relation in ChEBI), we set the ensemble prediction of $A$ to $0$ if the _absolute value_ of $B$'s score is large than that of $A$. For example, if $A$ has a net score of $3$ and $B$ has a net score of $-4$, the ensemble will set $A$ to $0$ (i.e., predict neither $A$ nor $B$).
163
168
- (2) Next, we check for disjointness. This is not specified directly in ChEBI, but in an additional ChEBI module ([chebi-disjoints.owl](https://ftp.ebi.ac.uk/pub/databases/chebi/ontology/)).
164
169
We have extracted these disjointness axioms into a CSV file and added some more disjointness axioms ourselves (see
165
170
`data>disjoint_chebi.csv`and `data>disjoint_additional.csv`). If two classes $A$ and $B$ are disjoint and we predict
0 commit comments