Out-of-Distribution Generalization

Why is Your Language Model a Poor Implicit Reward Model?

Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States