If you haven’t already, subscribe and join our community in receiving weekly AI insights, updates and interviews with industry experts straight to your feed.
Imagine a junior doctor who always has an answer – even when they don’t actually know the answer. They’re polished and confident, poised with a slide-ready sentence. Now imagine the senior doctor: thoughtful, cautious, and quick to say when they’re not sure about something.
Today’s LLMs are much closer to the first doctor than the second. They generate plausible text at scale. But in high-stakes settings (legal advice, medical summaries, regulation, research synthesis) the cost of confident nonsense is very real.
That’s why abstention and uncertainty is an emerging field of research in the AI space. Researchers are exploring how to give models the ability to know when not to answer, and to flag when their outputs are unreliable.
Language models are trained to generate text that looks likely, not to tell the truth. For traditional classifiers (like ‘cat’ vs ‘dog’), machine learning has long studied the reject option – a mechanism that lets a model abstain when it isn’t confident enough to commit to a class. This is called selective classification.
But for open-ended generative models, the problem is more complicated. Recent research shows that token likelihoods (how likely a word looks based on training patterns) aren’t reliable indicators of correctness in real world knowledge tasks.
In other words: a model can be extremely confident and wrong.
That’s why researchers are working on methods that go beyond naive confidence scores, combining calibration, statistical guarantees, and uncertainty signals to give models the ability to self-censor when the risk of error is high.
One promising direction is the use of conformal prediction frameworks – tools from statistics that can give quantifiable error guarantees. In this setup, a model can do one of three things when faced with uncertainty:
A 2025 paper on conformal abstention policies shows how this framework can be adapted to modern generative models, offering a structured way to trade off usefulness against risk. It’s uncertainty with mathematical reasoning behind the choice.
So how do models decide they’re not sure?
Uncertainty estimation techniques offer a potential answer. They’re methods that go beyond single pass outputs to gauge how stable a response is under perturbations, sampling, or semantic variation.
For example, research has shown that entropy-based metrics (essentially measuring how ‘spread out’ possible model responses are) can correlate with hallucination risk. These aren’t perfect detectors, but they do point to measurable indicators of when a model might be making it up.
Other approaches, like LM-Polygraph, gather multiple uncertainty estimators into a toolkit that can benchmark models across tasks, giving developers and researchers a shared yardstick for risk-aware generation.
And even more practical, black-box techniques like SelfCheckGPT look at how much variation there is across multiple sampled outputs – with higher disagreement hinting at lower reliability. This doesn’t need access to internal model weights, making it attractive for third-party services working atop closed models.
A bit of hedging is already present in LLMs. Models can be prompted to be cautious. But prompting isn’t the same as principled awareness.
Real uncertainty estimation and abstention mechanisms allow systems to:
This is particularly important for organisations that deploy AI assistants in support roles where plausible false answers can be dangerous, rather than just inconvenient or embarrassing.
If you’re thinking about the future of trustworthy AI, here’s what to track next:
We’ve spent years teaching AI to answer anything. The next phase is teaching it to answer responsibly – including when that means not answering at all.
Like the senior doctor who respects the limits of their knowledge, that’s maturity.
Share the most outrageous AI hallucinations you’ve seen – just open this newsletter on LinkedIn and tell us in the comments section.
From pattern-matching to genuine reasoning
From pattern-matching to genuine reasoning