- [2406.16838] From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
- GitHub - shreyansh26/LLM-Sampling: A collection of various LLM sampling methods implemented in pure Pytorch
- Test Time Compute and LLM Reasoning
- Text generation strategies
[[2024 NeurIPS#[NeurIPS Tutorial Beyond Decoding Meta-Generation Algorithms for Large Language Models](https //neurips.cc/virtual/2024/tutorial/99522)]]
- [2402.10200] Chain-of-Thought Reasoning Without Prompting
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Greedy
- pick top result at each steps
Top K Sampling
- sample from top K
Top P Sampling / Nucleus Sampling
- sample from the top tokens that add up to P
Beam Search
Speculative
Structured
Structured Generation with LLMs