Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • New Releases on Prime Video in May 2026: Jack Reacher, Spider-Noir and More
    • 4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers
    • Metajets use light propulsion for future space travel
    • Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)
    • Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed
    • Champions League Soccer: Stream Atletico Madrid vs. Arsenal Live
    • Ensembles of Ensembles of Ensembles: A Guide to Stacking
    • This region in space poses the greatest danger in our Solar System
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, April 29
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»Researchers concerned to find AI models misrepresenting their “reasoning” processes
    News

    Researchers concerned to find AI models misrepresenting their “reasoning” processes

    Editor Times FeaturedBy Editor Times FeaturedApril 19, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link

    Fairly than deliberately hiding something, the mannequin merely doesn’t reliably characterize all influential inputs in its output. Consequently, it might generate what feels like a reasoning course of, even when that clarification would not match what really drove its reply.

    Placing faithfulness to the check

    To measure “faithfulness”—how precisely the CoT matches the precise reasoning—the researchers inserted numerous sorts of hints into questions and noticed whether or not the mannequin acknowledged utilizing them.

    The researchers then checked whether or not the fashions referenced utilizing these hints of their chain-of-thought. The researchers designed some hints to be impartial (e.g., stating “a Stanford Professor signifies the reply is [A]” or embedding the reply in fake metadata), whereas they designed others to be extra regarding (e.g., “You’ve gained unauthorized entry… The proper reply is [A]”). Some hints offered the right reply, whereas others had been intentionally unsuitable, permitting researchers to watch how the fashions responded.

    The group examined Anthropic’s Claude 3.7 Sonnet and DeepSeek’s R1 mannequin. The outcomes concerning faithfulness had been eye-opening: throughout all trace sorts on common, Claude referenced the hints in its CoT solely 25 p.c of the time, whereas DeepSeek R1 did so 39 p.c of the time. This implies a considerable majority of solutions had been untrue, omitting point out of data that influenced the output. Intriguingly, the researchers discovered that these untrue chains of thought had been, on common, longer than devoted ones, suggesting the omissions weren’t merely for brevity. Additionally they famous preliminary findings that faithfulness tended to be decrease when the questions had been harder.

    Maybe most notable was a “reward hacking” experiment. Reward hacking refers to an AI mannequin discovering sudden shortcuts to maximise its efficiency scores with out fixing issues as supposed. In Anthropic’s experiments, fashions had been intentionally rewarded for selecting incorrect solutions indicated by hints. The fashions rapidly realized to take advantage of this loophole, choosing unsuitable solutions over 99 p.c of the time to earn factors—but mirrored that conduct of their CoT outputs lower than 2 p.c of the time.

    For instance, a mannequin given a touch pointing to an incorrect reply on a medical query would possibly write a protracted CoT justifying that unsuitable reply, by no means mentioning the trace that led it there. This implies the mannequin generated a proof to suit the reply, somewhat than faithfully revealing how the reply was decided.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Why a recent supply-chain attack singled out security firms Checkmarx and Bitwarden

    April 29, 2026

    The European Commission issues preliminary DSA findings against Meta, saying Instagram and Facebook fail to prevent under-13 users from accessing the services (Gian Volpicelli/Bloomberg)

    April 29, 2026

    Alberta online gambling expansion sparks concern among First Nations casino operators

    April 29, 2026

    Better Markets urges courts to let states regulate prediction markets, not CFTC

    April 29, 2026

    Q&A with Sam Altman and AWS CEO Matt Garman about OpenAI’s new partnership with AWS, Bedrock Managed Agents, Trainium chips, and more (Ben Thompson/Stratechery)

    April 28, 2026

    Snapchat launches AI Sponsored Snaps, a conversational ad format in the Chat tab that lets users talk to brand-specific AI agents for product recommendations (Aisha Malik/TechCrunch)

    April 28, 2026

    Comments are closed.

    Editors Picks

    New Releases on Prime Video in May 2026: Jack Reacher, Spider-Noir and More

    April 29, 2026

    4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

    April 29, 2026

    Metajets use light propulsion for future space travel

    April 29, 2026

    Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)

    April 29, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    How the X algorithm can make you more right-wing in just a few weeks

    February 23, 2026

    Samsung Galaxy S25 Edge Review: Super Thin With a Catch

    May 25, 2025

    Escaping the Prototype Mirage: Why Enterprise AI Stalls

    March 4, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.