For years, AI experts have insisted that improving reasoning in large language models (LLMs) required retraining.
But new research from Harvard University has just turned that belief upside down.
In their 2025 paper, “Reasoning with Sampling: Your Base Model Is Smarter Than You Think,” researchers Aayush Karan and Yilun Du discovered that base models already possess powerful reasoning abilities.
The catch? We’ve simply been sampling them wrong.
Their method—called Reasoning with Sampling—shows that with a smarter inference process, an untouched base model can outperform even fine-tuned ones.
The Core Idea: It’s Not About Training—It’s About Sampling
Traditional AI reasoning relies on reinforcement learning (RL) or RLHF (reinforcement learning with human feedback).
Those approaches consume enormous compute and time, tweaking the model until it “learns” to reason better.
The Harvard team questioned this logic:
“What if the reasoning capability is already inside the model—and we just need to draw it out differently?”
Power Sampling in Action
Their approach, inspired by Markov Chain Monte Carlo (MCMC), modifies how we sample the model’s output tokens:
- Start with the base probability distribution of possible next tokens.
- Exponentiate it (p^α) to favor more coherent reasoning paths.
- Iteratively resample until the best-reasoned answer emerges.
The result?
A model that “thinks” more clearly—without a single gradient update.
Why It’s a Breakthrough
The paper tested this method across math, code, and factual reasoning tasks—MATH500, HumanEval, and GPQA.
In many cases, the base model with power sampling matched or exceeded RL-trained versions.
That means no additional training, no curated data, and no high-cost reinforcement loops.
Just smarter inference.
| Benefit | Impact |
|---|---|
| No retraining needed | Immediate reasoning gains |
| Preserves diversity | Multiple valid answers remain possible |
| Lower compute cost | Up to 10× cheaper than RL training |
| Works on any base model | Plug-and-play inference strategy |
What It Means for AI Development
This finding changes how developers think about model improvement.
Instead of spending months fine-tuning, teams can now:
- Experiment with smarter sampling algorithms like power sampling.
- Focus on inference optimization, not just data scaling.
- Unlock hidden reasoning potential within existing models.
It’s a philosophical shift—from bigger and deeper to smarter and leaner.
As Yilun Du notes, “A model’s intelligence isn’t only in its parameters—it’s in how you use them.”
Testing the Boundaries
Of course, this isn’t a magic bullet.
The approach works best for single-turn reasoning tasks, not yet for multi-step planning or interactive dialogues.
And while inference is cheaper than retraining, it’s still 8–9 × more token-intensive than standard decoding.
Yet, the trade-off is compelling: no training cost, no data labeling, no model drift—just smarter use of what’s already trained.
The Bigger Picture
This discovery fits a broader trend: AI progress through decoding innovation rather than architectural expansion.
We’ve seen it before with Chain-of-Thought prompting, Self-Consistency Sampling, and ReAct reasoning.
Now, Harvard adds another tool to the arsenal—Power Sampling, a training-free way to enhance logic and accuracy.
It’s part of a growing realization:
“AI advancement isn’t just about building new brains—it’s about teaching the ones we have to think better.”
What Comes Next
The implications are massive:
- Startups can achieve top-tier reasoning without costly retraining.
- Enterprises can optimize existing models rather than chase parameter inflation.
- Open-source LLMs gain a new life—accessible intelligence through smarter inference.
Expect to see Reasoning with Sampling integrated into open-source frameworks and inference APIs throughout 2026.
We’ve Been Using AI Wrong
Harvard’s paper is a wake-up call.
The intelligence was always there—hidden beneath probabilistic noise.
By rethinking sampling, we unlock reasoning that’s sharper, cheaper, and already available.
Maybe it’s time to stop training harder—and start sampling smarter.
Reasoning with Sampling
Q1: What is “Reasoning with Sampling”?
A new inference technique that improves reasoning by resampling model outputs instead of retraining the model.
Q2: Who developed it?
Researchers Aayush Karan and Yilun Du at Harvard University.
Q3: Does it require extra data or fine-tuning?
No. It’s a training-free approach applied only during inference.
Q4: How does it compare to RLHF?
It matches or exceeds RLHF reasoning quality on key benchmarks while being much cheaper to run.
Q5: Where can I read the full paper?
Find it on arXiv.org.