This repository contains the code for Jump To Hyperspace: Comparing Euclidean and Hyperbolic Loss Functions for Hierarchical Multi-Label Text Classification, to be published in the COLING 2025 proceedings.
This repository allows you to train hyperbolic/Euclidean label embeddings and fine-tune a transformer for multi-label classification with contrastive and label-aware losses. The code for training label-aware losses was inspired by HypEmo. Please consult their paper and Nickel & Kiela (2017).
- Create a new conda environment:
conda create -n hyperspace python=3.12.5 - Activate the environment:
conda activate hyperspace - Clone the repository:
git clone https://github.com/clips/jump_to_hyperspace.git - Change the working directory:
cd jump_to_hyperspace - Install the requirements:
pip install -r requirements.txt
Browse to the label_embedding (HypEmo) subdirectory and run the train_label_embedding.py script as follows:
python train_label_embedding.py --tree [DATASET_NAME] --model [poincare | hyp_cones | euclid] --dim [EMBEDDING_DIMENSION] --epochs [EPOCHS].
If you want to train your own label embeddings using this method, add a new directory in the data subdirectory with the name of your dataset and add a hierarchy.txt file. See the other provided files for examples. Then, browse to config.py and add an additional dictionary where each label name maps to a unique integer, similar to the other provided datasets.
Once the label embeddings are trained, run the main_poincare.py script. Ensure that the data is stored as json files somewhere in a directory on yuor filesystem as train.json, val.json and test.json.
python main_poincare.py --use_poincare_loss --data_dir ~/data/BGC --hierarchy_path label_embedding/data/BGC/hierarchy.txt --poincare_embedding_path label_embedding/label_tree/BGC100.bin
python main_poincare.py --use_euclidean_loss --data_dir ~/data/BGC --hierarchy_path label_embedding/data/BGC/hierarchy.txt --euclidean_embedding_path label_embedding/label_tree/BGC100euclid.bin
python main_poincare.py --data_dir ~/data/BGC --use_contrastive_loss
python main_poincare.py --data_dir ~/data/BGC --use_contrastive_loss --cl_distance_metric poincare