Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

## :thought_balloon: Introduction
## :thought_balloon:Introduction

This repository contains the code for our paper "[Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards](https://arxiv.org/pdf/2505.04671)".

Expand All @@ -11,6 +11,15 @@ RewardSQL enhances Text-to-SQL generation through a comprehensive process-level

![Overview](overview.jpg)

## :inbox_tray: Downloads
| **Model and Dataset** | **Download Latest** |
|-----------|------------------|
| Bird-Schema-Data | [🤖 Modelscope](https://www.modelscope.cn/datasets/QIANME/bird_schema_data), [🤗 HuggingFace](https://huggingface.co/datasets/QIAN-ME/bird_schema_data) |
| CoCTE SFT Model| [🤖 Modelscope](https://www.modelscope.cn/models/QIANME/CTE_SFT_Model), [🤗 HuggingFace](https://huggingface.co/QIAN-ME/CoCTE_SFT_Model) |
| Process Reward Model| [🤖 Modelscope](https://www.modelscope.cn/models/QIANME/PRM_Model), [🤗 HuggingFace](https://huggingface.co/QIAN-ME/PRM_Model) |
| GRPO Trained Model | [🤖 Modelscope](https://www.modelscope.cn/models/QIANME/GRPO_Model), [🤗 HuggingFace](https://huggingface.co/QIAN-ME/GRPO_Model) |


## :open_file_folder: Data Preparation

We provide all necessary datasets in our Google Drive repository.
Expand Down Expand Up @@ -60,9 +69,9 @@ mkdir -p results
## :zap: Quick Start

### Download pre-trained models
- [CoCTE SFT Model](https://drive.google.com/file/d/1hP8FO_VA7Lf9wwqHz_Uqvs3ccrSP_x66/view?usp=sharing): Put it under `checkpoints/cocte_model`.
- [Process Reward Model](https://drive.google.com/file/d/1hP8FO_VA7Lf9wwqHz_Uqvs3ccrSP_x66/view?usp=sharing): Put it under `checkpoints/prm_model`.
- [GRPO Trained Model](https://drive.google.com/file/d/1hP8FO_VA7Lf9wwqHz_Uqvs3ccrSP_x66/view?usp=sharing): Put it under `checkpoints/grpo_model`.
- [CoCTE SFT Model](https://huggingface.co/QIAN-ME/CoCTE_SFT_Model): Put it under `checkpoints/cocte_model`.
- [Process Reward Model](https://huggingface.co/QIAN-ME/PRM_Model): Put it under `checkpoints/prm_model`.
- [GRPO Trained Model](https://huggingface.co/QIAN-ME/GRPO_Model): Put it under `checkpoints/grpo_model`.

### Text-to-SQL inference

Expand Down Expand Up @@ -130,4 +139,4 @@ We implement our reinforcement learning algorithm extending from [veRL](https://
- [ ] Models used in the paper
- [ ] Evaluation code
- [ ] Datasets
- [ ] GRPO training code -->
- [ ] GRPO training code -->