GitHub - amazon-science/toward-clinical-coding-verification-adaptation

Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation

This repository contains ICD-10-CM annotations for the paper: "Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation", EMNLP 2025, Industry Track

Dataset

This repository only contains the double expert-annotated ICD-10-CM annotations used for the paper. To derive the full training and testing data, including the corresponding notes, please follow these steps below:

Download https://github.com/wyim/aci-bench to ACI_BENCH_PATH
Run

python merge_aci_annotations.py --aci_data_dir ${ACI_BENCH_PATH}/data/challenge_data --annotation_dir annotation --output_dir merged_data

In merged_data, you should find:

JSONL files with merged data:

train.jsonl (67 records)
valid.jsonl (20 records)
test.jsonl (120 records - combines all test files)

Each record contains dialogue, clinical note, and associated ICD10 codes.

Citation

If you find this data useful or if you use this for research and development, please cite

@inproceedings{toward-reliable-clinical-coding-verification-adaptation,
    title = "Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation",
    author = "Yuan, Zhangdie and
      Shing, Han-Chin  and
      Strong, Mitch and
      Shivade, Chaitanya",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track",
    publisher = "Association for Computational Linguistics",
}

License

This library is licensed under the CC-BY-NC-4.0 License.

SPDX-License-Identifier: CC-BY-NC-4.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
annotation		annotation
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
merge_aci_annotations.py		merge_aci_annotations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation

Dataset

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

amazon-science/toward-clinical-coding-verification-adaptation

Folders and files

Latest commit

History

Repository files navigation

Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation

Dataset

Citation

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages