Noam Razin
Noam Razin
News
Publications
Talks
Blog Posts
Teaching
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Noam Razin
,
Zixuan Wang
,
Hubert Strauss
,
Stanley Wei
,
Jason D. Lee
,
Sanjeev Arora
March 2025
PDF
Cite
Code
Type
Preprint
Publication
arXiv:2503.15477, 2025
Language Models
Reward Models
Reinforcement Learning
Reinforcement Learning from Human Feedback
Policy Gradient
Alignment
Related
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Vanishing Gradients in Reinforcement Finetuning of Language Models
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
What Algorithms Can Transformers Learn? A Study in Length Generalization
Cite
×