GRPO January 22, 2025 updated November 19, 2025 1 min read [2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models [x.com](https://x.com/Hesamation/status/1883992881914077493 grpo bois Group Relative Policy Optimization | Huikang’s blog