What Makes a Reward Model a Good Teacher? An Optimization Perspective

Publication
arXiv:2503.15477, 2025

Related