Skip to content

clips/jump_to_hyperspace

Repository files navigation

Jump to Hyperspace

This repository contains the code for Jump To Hyperspace: Comparing Euclidean and Hyperbolic Loss Functions for Hierarchical Multi-Label Text Classification, to be published in the COLING 2025 proceedings.

This repository allows you to train hyperbolic/Euclidean label embeddings and fine-tune a transformer for multi-label classification with contrastive and label-aware losses. The code for training label-aware losses was inspired by HypEmo. Please consult their paper and Nickel & Kiela (2017).

Installation

  1. Create a new conda environment: conda create -n hyperspace python=3.12.5
  2. Activate the environment: conda activate hyperspace
  3. Clone the repository: git clone https://github.com/clips/jump_to_hyperspace.git
  4. Change the working directory: cd jump_to_hyperspace
  5. Install the requirements: pip install -r requirements.txt

Training Label Embeddings

Browse to the label_embedding (HypEmo) subdirectory and run the train_label_embedding.py script as follows:

python train_label_embedding.py --tree [DATASET_NAME] --model [poincare | hyp_cones | euclid] --dim [EMBEDDING_DIMENSION] --epochs [EPOCHS].

If you want to train your own label embeddings using this method, add a new directory in the data subdirectory with the name of your dataset and add a hierarchy.txt file. See the other provided files for examples. Then, browse to config.py and add an additional dictionary where each label name maps to a unique integer, similar to the other provided datasets.

Fine-tuning

Once the label embeddings are trained, run the main_poincare.py script. Ensure that the data is stored as json files somewhere in a directory on yuor filesystem as train.json, val.json and test.json.

Hyperbolic label-aware loss:

python main_poincare.py --use_poincare_loss --data_dir ~/data/BGC --hierarchy_path label_embedding/data/BGC/hierarchy.txt --poincare_embedding_path label_embedding/label_tree/BGC100.bin

Euclidean Label-aware loss:

python main_poincare.py --use_euclidean_loss --data_dir ~/data/BGC --hierarchy_path label_embedding/data/BGC/hierarchy.txt --euclidean_embedding_path label_embedding/label_tree/BGC100euclid.bin

Contrastive loss:

python main_poincare.py --data_dir ~/data/BGC --use_contrastive_loss

Hyperbolic contrastive loss:

python main_poincare.py --data_dir ~/data/BGC --use_contrastive_loss --cl_distance_metric poincare

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages