Reward Models

Why is Your Language Model a Poor Implicit Reward Model?

What Makes a Reward Model a Good Teacher? An Optimization Perspective