Skip to content

RL训练能否支持ProRL #4501

@DogeWatch

Description

@DogeWatch

Describe the feature
来自论文:ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

看上去效果不错,适合小尺寸模型,里面用到了 GRPO和DAPO的结合,KL+周期性重置的策略

Paste any useful information
论文地址:https://arxiv.org/pdf/2505.24864

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions