Noam Razin
Noam Razin
News
Publications
Talks
Blog Posts
Teaching
Policy Gradient
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
Vanishing Gradients in Reinforcement Finetuning of Language Models
Cite
×