The Basics of Reinforcement Learning from Human Feedback How language model post-training is done today - YouTube RLHF PPO From Zero to PPO: Understanding the Path to Helpful AI Models DPO