Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Two-axis rotation for hobbyist work
    • Aiven co-founder Hannu Valtonen’s Avrea emerges from stealth with €4 million to build AI-native CI/CD platform
    • Why the Vatican Invited Anthropic to the Pope’s AI Encyclical Presentation
    • Sacramento casino cannabis fraud case reaches federal level
    • AI Chatbot Pricing Comparison: Here’s What You Get When You Pay
    • Tech Life – Teaching in the AI world
    • The AI Model Confidence Trap
    • Transparent solar cells turn windows into power sources
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, May 26
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The AI Model Confidence Trap
    Artificial Intelligence

    The AI Model Confidence Trap

    Editor Times FeaturedBy Editor Times FeaturedMay 26, 2026No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    a bit whimsical on a Saturday and determined to ask ChatGPT a reasonably easy query: “Who received the Nobel Prize in Physics in 2025?”

    ChatGPT responded instantly: “The 2025 Nobel Prize in Physics was awarded to…” It even supplied names, analysis areas, and a proof of the particular analysis that earned them the Nobel Prize!

    There was only one downside—a really small one, truly. The Nobel Prize had not but been introduced. But, the mannequin didn’t hesitate; it didn’t pause; it certainly didn’t say, “I don’t have sufficient info” or, even higher, “the Nobel Prize winner for 2025 has not been introduced but!”

    As an alternative, it confidently walked into the room, sat down, and delivered fiction with the power of somebody defending a PhD dissertation. As an individual who as soon as defended a PhD, I want I had ChatGPT’s confidence when it makes stuff up!

    As people, we are likely to do one thing attention-grabbing with confidence; we affiliate it with correctness, however that’s not all the time the case. If somebody says, “I suppose the reply could be 42” and one other particular person says, “The reply is completely 42,” most of us instinctively belief the second particular person extra, even when each are equally more likely to be fallacious. For us, confidence typically acts as a helpful sign of correctness. For AI methods, nevertheless, confidence generally is a surprisingly unreliable narrator.

    On this article, we are going to discover why.

    Confidence Feels Like Likelihood

    Let’s say we requested an LLM to foretell what animal it’s in a given image. It says:

    Cat: 0.97
    Canine: 0.02
    Hen: 0.01

    Most will interpret that as: ”The mannequin is 97% positive this can be a cat.”

    That may be a affordable assumption. Sadly, that’s typically not what these numbers imply. We have to keep in mind that many AI fashions use a perform referred to as Softmax to generate predictions.

    The Softmax perform converts uncooked outputs (referred to as logits) into values that sum to 1 and resemble possibilities. The essential factor to note right here is the exponential time period, which might trigger small variations to instantly change into very giant.

    Picture by the writer

    So mainly, the mannequin isn’t saying, “I’ve overwhelming proof that this can be a cat.” It might merely be saying: “Amongst these choices, cat occurred to win by a small margin.” These are very totally different statements with fully totally different meanings.

    People and AI Deal with Uncertainty In a different way

    Although it could be uncomfortable to sit down with, people are surprisingly good at expressing and coping with uncertainty.

    We always hear: “I could be fallacious…”, “I’m fairly positive…”, “Perhaps…”, or “I feel…”. Our confidence tends to exist on a spectrum. AI methods, nevertheless, typically behave like that one particular person in a bunch challenge who confidently explains one thing they discovered three minutes in the past (I’m positive all of us had that classmate…).

    So, when chatting with an LLM, each telling it “I feel Paris is the capital of France,” and it responding “Paris is the capital of France with 99.8% likelihood,” provides the identical power as telling it “I feel Atlantis is fictional,” and it responding “Atlantis is positioned roughly 400 miles west of Portugal with 98.7% confidence.”

    Though the 2 circumstances have very totally different outcomes, the LLM treats them equally.

    The Assured Idiot Drawback

    This creates what I consider because the assured idiot downside. The place a system might be spectacularly fallacious whereas sounding spectacularly sure. And sadly, confidence typically will increase precisely after we would like extra warning.

    This turns into particularly noticeable when LLMs encounter conditions outdoors their coaching distribution.

    Suppose we practice a picture classifier to establish cats and canines. However then we determined to present it an image of a toaster! Ideally, the mannequin ought to say, “I’ve completely no concept what that is.” What can be the response of most individuals when proven one thing they’ve by no means seen earlier than? As an alternative of claiming that, the mannequin would possibly reply:

    Canine: 98%
    Cat: 2%

    Now, until your toaster is poodle-shaped, that reply is clearly false!

    Why does this occur? The reply is less complicated than most individuals suppose. Merely, it occurs as a result of the mannequin was by no means skilled to say: “Not one of the above.” So, when it encounters one thing unfamiliar, it chooses the best obtainable rating among the many choices.

    It’s like forcing somebody to reply “What fruit is that this?” whereas pointing at a bicycle. Finally, they may select a fruit simply to resolve the scenario and say, “Banana?”

    Let’s simulate a mannequin that’s overconfident.

    Picture by the writer

    If the mannequin reviews “90% confidence”, we’d hope it’s appropriate roughly 90% of the time. As an alternative, many methods look extra like “90% confidence, 65% accuracy.” This hole between confidence and accuracy is why the way in which we select to coach these LLMs issues loads.

    Instructing Fashions to Be Extra Trustworthy

    Okay, we all know why fashions are typically so confidently fallacious, however how can we overcome that to have higher fashions with increased accuracy, or accuracy that matches their confidence? That is the place calibration comes into play.

    Calibration doesn’t essentially enhance predictions. As an alternative, it improves honesty! So, if a mannequin says 90% after calibration, it ought to imply: “Traditionally, predictions at this confidence stage have been appropriate about 90% of the time.”

    Strategies equivalent to:

    • Platt Scaling
    • Temperature Scaling
    • Isotonic Regression

    try to align predicted confidence with noticed outcomes.

    Let’s see what this appears to be like like:

    Picture by the writer

    Why This Issues

    It’s straightforward to giggle when an AI thinks a toaster is a canine. As a result of that’s, arguably, very humorous. Nevertheless, many much less humorous conditions exist. Not simply much less humorous, however important, and perhaps even life-threatening. Utilizing LLMs in medical prognosis methods, autonomous automobiles, fraud detection, and monetary forecasting requires excessive accuracy.

    If a mannequin tells a physician: “Most cancers likelihood: 99%” or “Most cancers likelihood: 62%,” the physician’s response will fluctuate considerably!

    If confidence scores are poorly calibrated, folks could belief predictions that don’t deserve belief. And people are significantly weak right here as a result of confidence feels persuasive. Even after we know higher.

    As fashions proceed shifting into real-world workflows, we could must cease asking: “How correct is the mannequin?” and begin asking: “When the mannequin says 90%, does it truly imply 90%?” As a result of there’s a distinction between a sensible mannequin and a reliable mannequin.

    People are usually not good at uncertainty, both. We change into overconfident on a regular basis. We predict we are able to end a challenge in two days. We predict we are able to assemble furnishings with out studying the directions. We predict we solely want one journey from the automotive to herald groceries. Even when historical past suggests in any other case.

    Perhaps AI is solely inheriting a few of our unhealthy habits? The distinction is that when people are confidently fallacious, normally just a few folks undergo. When AI is confidently fallacious, the error can scale to hundreds of thousands, and confidence at scale is a really totally different downside.

    Remaining Ideas

    For years, we’ve measured AI progress by asking more and more spectacular questions:

    Can it write code? Can it generate artwork? Can it move exams? Can it purpose?

    These questions are helpful, however they will typically distract us from a extra essential one:

    Can we belief it?

    A mannequin producing the proper reply as soon as is thrilling. A mannequin that produces the proper reply repeatedly whereas realizing when it could be fallacious is one thing solely totally different. Reliability hardly ever creates flashy headlines.

    Confidence itself is just not the issue. The issue begins when confidence turns into a efficiency slightly than a significant measure of certainty. As AI methods proceed shifting into healthcare, training, finance, analysis, and decision-making pipelines, we could must cease treating confidence scores as reality meters and begin treating them as estimates that require validation.

    As a result of a mannequin sounding sure is straightforward, the place a mannequin realizing when not to make certain could also be one of many hardest issues we nonetheless have left to unravel.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    What Is a Data Agent? | Towards Data Science

    May 26, 2026

    Stop Using LLMs Like Giant Problem Solvers

    May 26, 2026

    Introducing the Agent Toolkit for Amazon Web Services

    May 25, 2026

    Can AI write your code? | Towards Data Science

    May 25, 2026

    I Built My First ETL Pipeline as a Complete Beginner. Here’s How.

    May 25, 2026

    From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

    May 25, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    Two-axis rotation for hobbyist work

    May 26, 2026

    Aiven co-founder Hannu Valtonen’s Avrea emerges from stealth with €4 million to build AI-native CI/CD platform

    May 26, 2026

    Why the Vatican Invited Anthropic to the Pope’s AI Encyclical Presentation

    May 26, 2026

    Sacramento casino cannabis fraud case reaches federal level

    May 26, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Floating community aims to tackle housing shortage

    May 28, 2025

    The firms looking to destroy harmful ‘forever chemicals’

    August 19, 2025

    Google may be forced to make changes to UK online search, says watchdog

    June 24, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.