michal.i/o

❯

❯

❯

Test Time Compute, LLM Reasoning, Inference Time Scaling

Test Time Compute, LLM Reasoning, Inference Time Scaling

Jan 21, 20253 min read

Speculations on Test-Time Scaling (o1) - YouTube
The Hitchhiker’s Guide to Reasoning - YouTube
Noam Brown from OpenAI on test time compute and planning
Inference Time Compute - YouTube
OpenAI’s o1 using “search” was a PSYOP - by Nathan Lambert
Late Takes on OpenAI o1
Neurips 2024 Tutorial: Meta-decoding algorithms
Scaling Test Time Compute: How o3-Style Reasoning Works (+ Open Source Implementation) - YouTube
o3 (Part 1): Generating data from multiple sampling for self-improvement + Path Ahead - YouTube
o3 (Part 2) - Tradeoffs of Heuristics, Tree Search, External Memory, In-built Bias - YouTube
GitHub - srush/awesome-o1
LLM Reasoning Papers - a philschmid Collection
GitHub - hijkzzz/Awesome-LLM-Strawberry: A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

Papers

[2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
[2410.10630] Thinking LLMs: General Instruction Following with Thought Generation
[2411.19865] Reverse Thinking Makes LLMs Stronger Reasoners
[2411.04282] Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
[2409.15254] Archon: An Architecture Search Framework for Inference-Time Techniques
[2410.10630] Thinking LLMs: General Instruction Following with Thought Generation
[2409.12917] Training Language Models to Self-Correct via Reinforcement Learning
[2412.18925] HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
[2412.14135] Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
[2412.21187] Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
[2501.02497] Test-time Computing: from System-1 Thinking to System-2 Thinking
[2501.04682] Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
[2501.04519] rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
[2501.09891] Evolving Deeper LLM Thinking

Image Generation

[2501.09732] Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Open Source

GitHub - deepseek-ai/DeepSeek-R1
Qwen/QwQ-32B-Preview · Hugging Face
🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs
GitHub - huggingface/search-and-learn
Scaling test-time compute - a Hugging Face Space by HuggingFaceH4
GitHub - NovaSky-AI/SkyThought: Sky-T1: Train your own O1 preview model within $450
- Sky-T1: Train your own O1 preview model within $450

Related

[2203.01517] Correct-N-Contrast: A Contrastive Approach for Improving Robustness to Spurious Correlations

Research

STaR

[2203.14465] STaR: Bootstrapping Reasoning With Reasoning

We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the “Self-Taught Reasoner” (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30× larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.

Papers
Image Generation
Open Source
Related
Research
STaR

Backlinks

Decoding and Sampling

Graph View

Created with Quartz v4.4.0 © 2025