Skip to content

Conversation

@sfluegel05
Copy link
Contributor

@sfluegel05 sfluegel05 commented Jun 24, 2025

Add an ensemble model to chebifier. This includes support for Electra models (from https://github.com/ChEB-AI/python-chebai), Residual Gated GCNs (from https://github.com/ChEB-AI/python-chebai-graph) and ChemLog (from https://github.com/sfluegel05/chemlog-peptides).

See this PR in chebai. We are moving the ensemble from Chebai to Chebifier since chebai is for training models, chebifier is for predictions. Compared to the last state in chebai, I made some naming and structural changes. More importantly, I added

  • ChemLog support
  • An F1-score based weighting

An example config for this ensemble might look like this:

model_chemlog:
    type: chemlog
    model_name: chemlog_peptides
    model_weight: 100
model_resgated:
  type: resgated
  model_name: resgated_0oksfx9u
  ckpt_path: resgated_0oksfx9u_epoch=174.ckpt
  target_labels_path: ../python-chebai/data/chebi_v241/ChEBI50/processed/classes.txt
  molecular_properties:
      - chebai_graph.preprocessing.properties.AtomType
      - chebai_graph.preprocessing.properties.NumAtomBonds
      - chebai_graph.preprocessing.properties.AtomCharge
      - chebai_graph.preprocessing.properties.AtomAromaticity
      - chebai_graph.preprocessing.properties.AtomHybridization
      - chebai_graph.preprocessing.properties.AtomNumHs
      - chebai_graph.preprocessing.properties.BondType
      - chebai_graph.preprocessing.properties.BondInRing
      - chebai_graph.preprocessing.properties.BondAromaticity
      - chebai_graph.preprocessing.properties.RDKit2DNormalized
  classwise_weights_path: ../python-chebai/metrics_0oksfx9u_ext.json
model1:
  type: electra
  model_name: electra_dk6ey7jq
  ckpt_path: electra_dk6ey7jq_epoch=191.ckpt
  target_labels_path: ../python-chebai/data/chebi_v241/ChEBI50/processed/classes.txt
  classwise_weights_path: ../python-chebai/metrics_dk6ey7jq_ext.json

Here the model names and checkpoint paths refer to specific models (trained with chebai). The target_labels_path refers to the list of labels of the dataset the model was trained on. The classwise_weights_path refers to the class-specific metrics (as produces by a chebai script on the validation set).

The CLI command is something like

python chebifier/cli.py predict configs/my_config.yml -f inputs_miles.txt -o output.json --ensemble-type=[mv|wmv-f1|wmv-ppvnpv]

@sfluegel05 sfluegel05 changed the base branch from main to dev June 24, 2025 12:53
@sfluegel05 sfluegel05 merged commit 6300bff into dev Jun 24, 2025
@sfluegel05 sfluegel05 deleted the feature-ensemble branch June 24, 2025 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants