Skip to content

The-FinAI/FinTagging

Repository files navigation

✨ FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information ✨

📁 Benchmark Data | 📖 Arxiv | 🛠️ Evaluation Framework


🌟 Overview

📚 Datasets Released

📂 Dataset 📝 Description
FinNI-eval Evaluation set for FinNI subtask within FinTagging benchmark.
FinCL-eval Evaluation set for FinCL subtask within FinTagging benchmark.
FinTagging_Original Original benchmark dataset without preprocessing, suitable for custom research. Annotated data (benchmark_ground_truth_pipeline.json) provided in the "annotation" folder.
FinTagging_BIO BIO-format dataset tailored for token-level tagging with BERT-series models.

🧑‍💻 Evaluated LLMs and PLMs

We benchmarked FinTagging alongside 10 cutting-edge LLMs and 3 advanced PTMs:

  • 🌐 GPT-4o — OpenAI’s multimodal flagship model with structured output support.
  • 🚀 DeepSeek-V3 — A MoE reasoning model with efficient inference via MLA.
  • 🧠 Qwen2.5 Series — Multilingual models optimized for reasoning, coding, and math. Here, we assessed 14B, 1.5B, and 0.8B Instruct models.
  • 🦙 Llama-3 Series — Meta’s open-source instruction-tuned models for long context. Here, we assessed the Llama-3.1-8B-Instruct and Llama-3.2-3B-Instruct models.
  • 🧭 DeepSeek-R1 Series — RL-tuned first-gen reasoning models with zero-shot strength. Here, we only assessed the DeepSeek-R1-Distill-Qwen-32B model.
  • 🧪 Gemma-2 Model — Google’s latest instruction-tuned model with open weights. Here, we only assess the gemma-2-27b-it model.
  • 💎 Fino1-8B — Our in-house financial LLM with strong reasoning capability.
  • 🏛️ BERT-large — The classic transformer encoder for language understanding.
  • 📉 FinBERT — A financial domain-tuned BERT for sentiment analysis.
  • 🧾 SECBERT — BERT model fine-tuned on SEC filings for financial disclosure tasks.

📌 Evaluation Methodology

  • Local Model Inference: Conducted via FinBen (VLLM framework).
  • We provide task-specific evaluation scripts through our forked version of the FinBen framework, available at: https://github.com/Yan2266336/FinBen.
  • For the FinNI task, you can directly execute the provided script to evaluate a variety of LLMs, including both local and API-based models.
  • For the FinCL task, first run the retrieval script from the repository to obtain US-GAAP candidate concepts. Then, use our provided prompts to construct instruction-style inputs, and apply the reranking method implemented in the forked FinBen to identify the most appropriate US-GAAP concept.
  • Note: Running the retrieval script requires a local installation of Elasticsearch, we provided our index document at Google Drive: https://drive.google.com/file/d/1cyMONjP9WdHtD8-WGezmgh_LNhbY3qtR/view?usp=drive_link. However, you can construct your own index document instead of using ours.

📊 Key Performance Metrics

Table: Overall Performance
🥇 = best, 🥈 = second-best, 🥉 = third-best
Category Models Macro P Macro R Macro F1 Micro P Micro R Micro F1
Closed-source LLM GPT-4o 0.0764 🥈 0.0576 🥈 0.0508 🥈 0.0947 0.0788 0.0860
Open-source LLMs DeepSeek-V3 0.0813 🥇 0.0696 🥇 0.0582 🥇 0.1058 0.1217 🥉 0.1132 🥉
DeepSeek-R1-Distill-Qwen-32B 0.0482 🥉 0.0288 🥉 0.0266 🥉 0.0692 0.0223 0.0337
Qwen2.5-14B-Instruct 0.0423 0.0256 0.0235 0.0197 0.0133 0.0159
gemma-2-27b-it 0.0430 0.0273 0.0254 0.0519 0.0453 0.0483
Llama-3.1-8B-Instruct 0.0287 0.0152 0.0137 0.0462 0.0154 0.0231
Llama-3.2-3B-Instruct 0.0182 0.0109 0.0083 0.0151 0.0102 0.0121
Qwen2.5-1.5B-Instruct 0.0180 0.0079 0.0069 0.0248 0.0060 0.0096
Qwen2.5-0.5B-Instruct 0.0014 0.0003 0.0004 0.0047 0.0001 0.0002
Financial LLM Fino1-8B 0.0299 0.0146 0.0140 0.0355 0.0133 0.0193
Fine-tuned PLMs BERT-large 0.0135 0.0200 0.0126 0.1397 🥈 0.1145 🥈 0.1259 🥈
FinBERT 0.0088 0.0143 0.0087 0.1293 🥉 0.0963 0.1104
SECBERT 0.0308 0.0483 0.0331 0.2144 🥇 0.2146 🥇 0.2145 🥇

📖 Citation

If you find our benchmark useful, please cite:

@misc{wang2025fintaggingllmreadybenchmarkextracting,
      title={FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information}, 
      author={Yan Wang and Yang Ren and Lingfei Qian and Xueqing Peng and Keyi Wang and Yi Han and Dongji Feng and Xiao-Yang Liu and Jimin Huang and Qianqian Xie},
      year={2025},
      eprint={2505.20650},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20650}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published