This repository contains the implementation of CurriculumDocRE, a curriculum learning framework for document-level relation extraction (DocRE). The model progressively learns from easy to hard relations based on a multi‑faceted complexity score that considers sentence distance, coreference resolution, and relation rarity.
transformers>=4.14.1
numpy>=1.20.0
scikit-learn>=1.0.0
tqdm>=4.62.0
ujson
wandb
opt-einsum
Install with: pip install -r requirements.txt
CurriculumDocRE/
├── README.md
├── requirements.txt
├── run.py
├── train.py
├── config.py
├── utils.py
├── prepro.py
├── model.py
├── losses.py
├── long_seq.py
├── data_augmentation.py
├── evaluation.py
├── args.py
├── scripts/
│ ├── run_roberta_cur.sh
│ └── run_bert_cur.sh
└── data/ (you need to download Re‑DocRED, CDR, GDA yourself)
- Download the required datasets:
- Re‑DocRED: https://github.com/tonytan48/Re-DocRED
- RE2‑DocRED: https://github.com/klassessg/re2-docred
The CDR and GDA datasets can be obtained following the instructions in edge-oriented graph.
- Place the files in the
data/directory with the following structure:
data/
├── redocred/
│ ├── train.json
│ ├── dev.json
│ └── test.json
├── re2docred/ (optional)
├── cdr/
│ ├── train.json
│ ├── dev.json
│ └── test.json
└── gda/
├── train.json
├── dev.json
└── test.json
- Also place
rel2id.jsonandrel_info.json(provided in the DocRED repository) in ameta/folder.
python -m src.run --do_train --data_dir data/redocred --train_file train.json --dev_file dev.json --test_file test.json --num_train_epochs 30python -m src.run --do_train --curriculum --phase1_epochs 10 --phase2_epochs 10 --phase3_epochs 10 --max_alpha 2.0 --data_dir data/redocredpython -m src.run --do_train --curriculum --augment --augment_factor 1 --data_dir data/redocredpython -m src.run --do_train --curriculum --num_labels 2 --num_class 2 --evi_lambda 0 --data_dir data/cdr --transformer_type bert --model_name_or_path bert-base-uncasedpython -m src.run --do_train --curriculum --num_labels 2 --num_class 2 --evi_lambda 0 --data_dir data/gda --train_batch_size 2 --gradient_accumulation_steps 4 --transformer_type bert --model_name_or_path bert-base-uncasedpython -m src.run --load_path ./output/curriculum_redocred/best.ckpt --data_dir data/redocred --dev_file dev.json --evaluatepython -m src.run --load_path ./output/curriculum_redocred/best.ckpt --data_dir data/redocred --test_file test.json --evaluate --test_only| Argument | Default | Description |
|---|---|---|
--curriculum |
False |
Enable curriculum learning |
--phase1_epochs |
10 |
Epochs for Stage 1 (easy relations, distance ≤1) |
--phase2_epochs |
10 |
Epochs for Stage 2 (medium relations, distance 2‑4) |
--phase3_epochs |
10 |
Epochs for Stage 3 (hard relations, distance >4) |
--max_alpha |
2.0 |
Exponent for curriculum weighting (higher = more focus on hard pairs) |
--augment |
False |
Enable data augmentation for rare relations |
--augment_factor |
1 |
Number of augmented copies per positive document |
--pos_weight |
20 |
Positive class weight for ATLoss |
--evi_lambda |
0.5 |
Weight for evidence loss (if evidence annotations are available) |
--learning_rate |
3e-5 (RoBERTa) / 5e-5 (BERT) |
Learning rate |
--train_batch_size |
4 |
Batch size per GPU |
--gradient_accumulation_steps |
2 |
Gradient accumulation steps |
--max_seq_length |
1024 |
Maximum input tokens |
--warmup_ratio |
0.06 |
Linear warmup proportion |
If you use this code in your research, please cite:
@inproceedings{ali2026curriculum,
title={CurriculumDocRE: A Curriculum Learning Approach for Document-Level Relation Extraction},
author={Ali, Manzoor and Saleem, Muhammad and Ngonga Ngomo, Axel-Cyrille},
year={2026}
}