Reinforcement Learning

Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting

What Makes a Reward Model a Good Teacher? An Optimization Perspective