Avatar

Noam Razin

Postdoctoral Fellow

Princeton Language and Intelligence, Princeton University

    Email: noamrazin (at) princeton.edu

I am a Postdoctoral Fellow at Princeton Language and Intelligence (PLI). My research focuses on the fundamentals of deep learning. By combining mathematical and empirical analyses, I aim to develop theories that shed light on how deep learning works, identify potential failures, and yield principled methods for improving efficiency, reliability, and performance. Most recently, I have been working on language model post-training, including reinforcement learning and preference optimization approaches.

  • In [1], we identify a connection between reward variance and the flatness of the reinforcement learning objective landscape. Building on this, [2] provides an optimization perspective on what makes a good reward model for RLHF.
  • In [3], we investigate why language models are often poor implicit reward models, and show that they tend to rely on superficial token-level cues.
  • In [4], we characterize the causes for likelihood displacement — the counter-intuitive phenomenon where preference optimization decreases the probability of preferred responses (instead of increasing it as intended). We demonstrate that likelihood displacement can cause surprising failures in alignment and give preventative guidelines.

My work is supported in part by a Zuckerman Postdoctoral Scholarship. Previously, I obtained my PhD in Computer Science at Tel Aviv University, where I was fortunate to be advised by Nadav Cohen. During my PhD, I interned at Apple Machine Learning Research and Microsoft Recommendations Team, and received the Apple Scholars in AI/ML and Tel Aviv University Center for AI & Data Science fellowships.

🗞 News

  • Jul 25: New paper on why language models are often poor implicit reward models — they tend to rely on superficial token-level cues.

  • Mar 25: New paper provides an optimization perspective on what makes a good reward model for RLHF. In particular, we establish that more accurate reward models are not necessarily better teachers!

  • Jan 25: Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization accepted to ICLR 2025.

  • Oct 24: New paper proving that the implicit bias of state space models (SSMs) can be poisoned with clean labels.

  • Oct 24: Honored to receive the Zuckerman and Israeli Council for Higher Education Postdoctoral Scholarships.

  • Sep 24: Joined Princeton Language and Intelligence as a Postdoctoral Fellow.

  • Aug 24: New lecture notes on the theory (and surprising practical applications) of linear neural networks.

📄 Publications

* denotes equal contribution

Why is Your Language Model a Poor Implicit Reward Model?
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels
Understanding Deep Learning via Notions of Rank
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
Vanishing Gradients in Reinforcement Finetuning of Language Models
What Algorithms Can Transformers Learn? A Study in Length Generalization
Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning
What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement
On the Ability of Graph Neural Networks to Model Interactions Between Vertices
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Implicit Regularization in Tensor Factorization
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
RecoBERT: A Catalog Language Model for Text-Based Recommendations
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

💬 Selected Talks

  • Understanding and Overcoming Pitfalls in Language Model Alignment
    MPI MiS + UCLA Math Machine Learning Seminar, July 2025
    Slides

  • Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
    Deep Learning: Classics and Trends Seminar, January 2025
    Slides

  • Analyses of Policy Gradient for Language Model Finetuning and Optimal Control
    MPI MiS + UCLA Math Machine Learning Seminar, March 2024
    Video Slides

  • Two Analyses of Modern Deep Learning: Graph Neural Networks and Language Model Finetuning
    Princeton Alg-ML Seminar, December 2023
    Slides

  • On the Ability of Graph Neural Networks to Model Interactions Between Vertices
    Learning on Graphs and Geometry Reading Group, January 2023
    Video Slides

  • Generalization in Deep Learning Through the Lens of Implicit Rank Lowering
    ICTP Youth in High-Dimensions: Recent Progress in Machine Learning, High-Dimensional Statistics and Inference, June 2022
    Video Slides

  • Generalization in Deep Learning Through the Lens of Implicit Rank Lowering
    MPI MiS + UCLA Math Machine Learning Seminar, May 2022
    Slides

  • Implicit Regularization in Tensor Factorization
    The Hebrew University Machine Learning Club, Jerusalem, Israel, June 2021
    Video Slides

  • Implicit Regularization in Deep Learning May Not Be Explainable by Norms
    Tel Aviv University Machine Learning Seminar, Tel Aviv, Israel, May 2020
    Slides

👨‍🏫 Teaching

  • Teaching Assistant for Foundations of Deep Learning (course #0368-3080), Tel Aviv University, 2021 to 2023