**Describe the feature** 来自论文:ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models 看上去效果不错,适合小尺寸模型,里面用到了 GRPO和DAPO的结合,KL+周期性重置的策略 **Paste any useful information** 论文地址:https://arxiv.org/pdf/2505.24864
Describe the feature
来自论文:ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
看上去效果不错,适合小尺寸模型,里面用到了 GRPO和DAPO的结合,KL+周期性重置的策略
Paste any useful information
论文地址:https://arxiv.org/pdf/2505.24864