Noam Razin
Noam Razin
News
Publications
Talks
Blog Posts
Teaching
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Noam Razin
,
Sadhika Malladi
,
Adithya Bhaskar
,
Danqi Chen
,
Sanjeev Arora
,
Boris Hanin
October 2024
PDF
Cite
Code
Poster
Type
Conference paper
Publication
International Conference on Learning Representations (ICLR), 2025
Language Models
Direct Preference Learning
DPO
Likelihood Displacement
Alignment
Related
Vanishing Gradients in Reinforcement Finetuning of Language Models
What Algorithms Can Transformers Learn? A Study in Length Generalization
Cite
×