Alignment and Post Training October 3, 2024 updated October 27, 2024 1 min read The Basics of Reinforcement Learning from Human Feedback How language model post-training is done today - YouTube RLHF PPO From Zero to PPO: Understanding the Path to Helpful AI Models DPO ml