Likelihood Displacement

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization