Skip to content

Latest commit

 

History

History
99 lines (66 loc) · 3.31 KB

File metadata and controls

99 lines (66 loc) · 3.31 KB

Multi-label SDG Classification

Install the python dependencies inside a virtual env

cd sdg_classification
virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt

Train a multi-label SDG classifier

Multi-label SBERT fine-tuning + Classification on synthetic dataset

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train

Label Description SBERT fine-tuning + Classification on synthetic dataset

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --dataset=synthetic --do_train

Two-stage SBERT fine-tuning + Classification

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --multi_label_finetuning --dataset=synthetic --do_train

Synthetic dataset is available at data/synthetic_data/synthetic_final.tsv

To train the model on Out-if-Domain (OOD) Knowledge Hub Dataset,

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --label_desc_finetuning --multi_label_finetuning --dataset=knowledge_hub --do_train

To perform evaluation on the manually annotated multi-label scientific SDG dataset,

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train --do_in_domain_eval

To perform evaluation on the synthetic SDG dataset,

python3 "$PROJECT_DIR/src/multi_label_sdg.py" --multi_label_finetuning --dataset=synthetic --do_train --do_synthetic_eval

The source code for SBERT fine-tuning and linear classification is largely inspired from SetFit

Manually annotated Multi-label SDG dataset

Manually annotated dataset of papers from Open Research Online (ORO) is available at data/manually_annotated_oro/oro_gold_dataset.txt (final version)

Demo page

The source code for the demo page, CORE Labs is available here -

https://github.com/oacore/about