Chinese Character Detection in Historical Documents
HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents
Chia-Wei Tang, Chao-Lin Liu, Po-Sen Chu
IEEE Big Data 2020 Workshops, Computational Archival Science: digital records in the age of big data
IEEE Xplore (10.1109/BigData50022.2020.9378051)
arXiv technical report (arXiv 2012.05739)
Contact: [email protected]. Any questions or discussions are welcomed!
git clone https://github.com/Tverous/HRCenterNet.git
cd HRCenterNet/
pip install -r requirements.txt
python train.py --train_csv_path data/train.csv --train_data_dir data/images \
--val_csv_path data/val.csv --val_data_dir data/images/ --val \
--batch_size 8 --epoch 80
python evaluate.py --csv_path data/val.csv --data_dir data/images/ --log_dir weights/HRCenterNet.pth.tar
python test.py --data_dir /path/to/images --log_dir /path/to/pretrained --output_dir /path/to/save/outputs
Prepare your csv files with following format:
image_id labels
file_name_1 obj_id_1 topleft_x topleft_y width height obj_id_2 topleft_x topleft_y width height ...
file_name_2 obj_id_1 topleft_x topleft_y width height obj_id_2 topleft_x topleft_y width height ...
. .
. .
. .
Use this bibtex to cite this repository:
@INPROCEEDINGS{
9378051,
author={C. -W. {Tang} and C. -L. {Liu} and P. -S. {Chiu}},
booktitle={2020 IEEE International Conference on Big Data (Big Data)},
title={HRCenterNet: An Anchorless Approach to Chinese Character Segmentation in Historical Documents},
year={2020},
volume={},
number={},
pages={1924-1930},
doi={10.1109/BigData50022.2020.9378051}
}