Vanishing Gradients

Vanishing Gradients in Reinforcement Finetuning of Language Models