Noam Razin
Noam Razin
News
Publications
Talks
Blog Posts
Teaching
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Noam Razin
,
Zixuan Wang
,
Hubert Strauss
,
Stanley Wei
,
Jason D. Lee
,
Sanjeev Arora
March 2025
PDF
Cite
Code
Poster
Type
Conference paper
Publication
Advances in Neural Information Processing Systems (NeurIPS), 2025
Language Models
Reward Models
Reinforcement Learning
Reinforcement Learning from Human Feedback
Policy Gradient
Alignment
Related
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Vanishing Gradients in Reinforcement Finetuning of Language Models
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
What Algorithms Can Transformers Learn? A Study in Length Generalization
Cite
×