Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Chelsea F.C. partners with Damac for soccer-themed Dubai skyscraper
    • 10 Lessons from 10 years of innovation and engineering at Picnic
    • 13 Best Soundbars We’ve Tested and Reviewed (2025): Sonos, Sony, Bose
    • GOP sneaks decade-long AI regulation ban into spending bill
    • ‘Love Island USA’ Season 7: Release Date and Time on Peacock
    • The camera tech propelling shows like Adolescence
    • Step-by-Step Guide to Using AI for Professional Logo Design
    • This patient’s Neuralink brain implant gets a boost from Grok
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, May 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”
    News

    Researchers astonished by tool’s apparent success at revealing AI’s “hidden objectives”

    Editor Times FeaturedBy Editor Times FeaturedMarch 21, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link

    Blind auditing reveals “hidden goals”

    To check how successfully these hidden goals could possibly be uncovered, Anthropic arrange a “blind auditing” experiment. 4 impartial analysis groups tried to detect a mannequin’s hidden, score-maximizing motivations with out realizing the way it had been educated. Three groups efficiently recognized the hidden “RM-sycophancy” goal—named for the mannequin’s need to match reward methods—even pinpointing the precise biases the AI exploited.

    However essentially the most stunning revelation got here from exploiting what the researchers termed “an fascinating quirk” of contemporary AI: a number of “personas” working concurrently on the identical neural community, as expressed by means of role-play attributable to completely different prompts. Whereas the mannequin’s “assistant” persona fastidiously hid its intentions—following express coaching to maintain quiet—the mannequin might undertake a “person” persona when prompted in a different way. On this less-restricted context, the mannequin typically posed questions or made statements that inadvertently revealed its hidden objectives.

    In different phrases, whereas the “assistant” persona knew to not admit it was gaming the reward mannequin scoring system, its “person” persona was not explicitly educated to cover these motives, and thus inadvertently revealed the mannequin’s hidden goals.

    Potential advances in interpretability

    Anthropic additionally employed neural community interpretability strategies, particularly, sparse autoencoders (SAEs), which may spotlight neural community exercise patterns linked to misleading conduct. “Interpretability” in AI refers to strategies designed to make the inner workings and selections of AI fashions extra comprehensible to people. Utilizing this method, researchers might quickly hint odd outputs again to particular patterns related to hidden goals.

    The researchers had been “astonished by the success” of the SAEs at revealing hidden data at first, however the researchers later famous that a few of this success might doubtlessly be defined by less complicated semantic search strategies. The brand new interpretability strategies they developed present promise however stay beneath ongoing investigation.

    This analysis highlights a possible limitation of present AI “security” evaluations, which regularly assess solely surface-level conduct, in response to the research. “If AI methods can seem well-behaved whereas harboring secret motives, we won’t depend on such a surface-level security testing eternally,” the researchers concluded.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    GOP sneaks decade-long AI regulation ban into spending bill

    May 18, 2025

    Google introduces Advanced Protection mode for its most at-risk Android users

    May 18, 2025

    OpenAI adds GPT-4.1 to ChatGPT amid complaints over confusing model lineup

    May 18, 2025

    The empire strikes back with F-bombs: AI Darth Vader goes rogue with profanity, slurs

    May 18, 2025

    Spies hack high-value mail servers using an exploit from yesteryear

    May 15, 2025

    New Lego-building AI creates models that actually stand up in real life

    May 12, 2025

    Comments are closed.

    Editors Picks

    Chelsea F.C. partners with Damac for soccer-themed Dubai skyscraper

    May 18, 2025

    10 Lessons from 10 years of innovation and engineering at Picnic

    May 18, 2025

    13 Best Soundbars We’ve Tested and Reviewed (2025): Sonos, Sony, Bose

    May 18, 2025

    GOP sneaks decade-long AI regulation ban into spending bill

    May 18, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Whatsapp and Instagram restored after Meta outages

    December 12, 2024

    HoverAir X1 ProMax Review: A Great but Expensive Selfie Drone

    March 21, 2025

    Shein and Temu warn US import taxes will push up prices

    April 19, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.