Why we need AI that says ‘I don’t know’

Why we need AI that says ‘I don’t know’

If you haven’t already, subscribe and join our community in receiving weekly AI insights, updates and interviews with industry experts straight to your feed.


DeepDive 

Your weekly immersion in AI 

Imagine a junior doctor who always has an answer – even when they don’t actually know the answer. They’re polished and confident, poised with a slide-ready sentence. Now imagine the senior doctor: thoughtful, cautious, and quick to say when they’re not sure about something. 

Today’s LLMs are much closer to the first doctor than the second. They generate plausible text at scale. But in high-stakes settings (legal advice, medical summaries, regulation, research synthesis) the cost of confident nonsense is very real. 

That’s why abstention and uncertainty is an emerging field of research in the AI space. Researchers are exploring how to give models the ability to know when not to answer, and to flag when their outputs are unreliable. 

Will you tell me if you don’t know? 

Language models are trained to generate text that looks likely, not to tell the truth. For traditional classifiers (like ‘cat’ vs ‘dog’), machine learning has long studied the reject option – a mechanism that lets a model abstain when it isn’t confident enough to commit to a class. This is called selective classification.

But for open-ended generative models, the problem is more complicated. Recent research shows that token likelihoods (how likely a word looks based on training patterns) aren’t reliable indicators of correctness in real world knowledge tasks. 

In other words: a model can be extremely confident and wrong.

That’s why researchers are working on methods that go beyond naive confidence scores, combining calibration, statistical guarantees, and uncertainty signals to give models the ability to self-censor when the risk of error is high.

New frontiers: conformal policies and principled abstention

One promising direction is the use of conformal prediction frameworks – tools from statistics that can give quantifiable error guarantees. In this setup, a model can do one of three things when faced with uncertainty:

  • Make a single prediction with a confidence bound,
  • Output a set of plausible predictions,
  • Or abstain – explicitly refuse to answer when the confidence threshold isn’t met.

A 2025 paper on conformal abstention policies shows how this framework can be adapted to modern generative models, offering a structured way to trade off usefulness against risk. It’s uncertainty with mathematical reasoning behind the choice.

Signs of uncertainty: more than guesswork

So how do models decide they’re not sure?

Uncertainty estimation techniques offer a potential answer. They’re methods that go beyond single pass outputs to gauge how stable a response is under perturbations, sampling, or semantic variation.

For example, research has shown that entropy-based metrics (essentially measuring how ‘spread out’ possible model responses are) can correlate with hallucination risk. These aren’t perfect detectors, but they do point to measurable indicators of when a model might be making it up.

Other approaches, like LM-Polygraph, gather multiple uncertainty estimators into a toolkit that can benchmark models across tasks, giving developers and researchers a shared yardstick for risk-aware generation.

And even more practical, black-box techniques like SelfCheckGPT look at how much variation there is across multiple sampled outputs – with higher disagreement hinting at lower reliability. This doesn’t need access to internal model weights, making it attractive for third-party services working atop closed models.

This is more than an academic exercise 

A bit of hedging is already present in LLMs. Models can be prompted to be cautious. But prompting isn’t the same as principled awareness.

Real uncertainty estimation and abstention mechanisms allow systems to:

  • Avoid confident hallucinations in domain-critical tasks,
  • Defer to human experts when necessary,
  • Provide calibrated outputs that systems and auditors can reason about,
  • And enable safer interactions with end users who may take output at face value.

This is particularly important for organisations that deploy AI assistants in support roles where plausible false answers can be dangerous, rather than just inconvenient or embarrassing. 

What should we watch over the next year or so? 

If you’re thinking about the future of trustworthy AI, here’s what to track next:

  1. Benchmarks that measure uncertainty, not just accuracy. As researchers build tests for how well models know what they don’t know, we’ll see evaluation move beyond ‘best answer’ to ‘reliability under uncertainty.’
  2. Inference-efficient uncertainty methods. At scale, it’s not enough to be safe – models must also be fast and inexpensive. Techniques that can flag uncertainty with minimal overhead will win real adoption.
  3. Human-AI workflows that respect deferral. Products need UI and UX patterns that show when a model is abstaining, and let users understand why.

We’ve spent years teaching AI to answer anything. The next phase is teaching it to answer responsibly – including when that means not answering at all. 

Like the senior doctor who respects the limits of their knowledge, that’s maturity. 

What have you seen AI being most confidently wrong about? 

Share the most outrageous AI hallucinations you’ve seen – just open this newsletter on LinkedIn and tell us in the comments section. 

Related
articles