Noam Razin
Noam Razin
News
Publications
Talks
Blog Posts
Teaching
Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting
Howard Chen
,
Noam Razin
,
Karthik Narasimhan
,
Danqi Chen
October 2025
PDF
Cite
Code
Type
Preprint
Publication
arXiv:2510.18874, 2025
Language Models
Catastrophic Forgetting
Supervised Finetuning
Reinforcement Learning
Related
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Why is Your Language Model a Poor Implicit Reward Model?
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Vanishing Gradients in Reinforcement Finetuning of Language Models
What Algorithms Can Transformers Learn? A Study in Length Generalization
Cite
×