GRPO

[2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[x.com](https://x.com/Hesamation/status/1883992881914077493