Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Robots-Blog | Vention und Universal Robots: One-Stop-Shop für Verpackungsautomatisierung auf der interpack 2026 vorgestellt
    • New earthquake waveform discovery reveals geological whiplash
    • Australia isn’t losing girls in STEM – it’s losing women 
    • Mexico City Is Sinking. A Powerful NASA Satellite Just Revealed How Fast
    • Qualcomm’s New Midrange Chips Add Wi-Fi 7, Improve Gaming for Lower-Cost Phones
    • A 125cc Ducati naked motorcycle concept
    • Stockholm’s Pit exits stealth with €13.6 million a16z-led funding to offer “AI product teams as a service”
    • Using AI for Just 10 Minutes Might Make You Lazy and Dumb, Study Shows
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, May 7
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»AI Technology News»Forcing LLMs to be evil during training can make them nicer in the long run
    AI Technology News

    Forcing LLMs to be evil during training can make them nicer in the long run

    Editor Times FeaturedBy Editor Times FeaturedAugust 1, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    For this research, Lindsey and his colleagues labored to put down a few of that groundwork. Earlier analysis has proven that numerous dimensions of LLMs’ conduct—from whether they are talking about weddings to persistent traits such as sycophancy—are related to particular patterns of exercise within the simulated neurons that represent LLMs. These patterns will be written down as a protracted string of numbers, wherein every quantity represents how energetic a selected neuron is when the mannequin is expressing that conduct.

    Right here, the researchers targeted on sycophantic, “evil”, and hallucinatory personas—three varieties that LLM designers may need to keep away from of their fashions. To establish these patterns, the staff devised a completely automated pipeline that may map out that sample given a quick textual content description of a persona. Utilizing that description, a separate LLM generates prompts that may elicit each the goal persona—say, evil—and an reverse persona—good. That separate LLM can also be used to judge whether or not the mannequin being studied is behaving in response to the great or the evil persona. To establish the evil exercise sample, the researchers subtract the mannequin’s common exercise in good mode from its common exercise in evil mode.

    When, in later testing, the LLMs generated significantly sycophantic, evil, or hallucinatory responses, those self same exercise patterns tended to emerge. That’s an indication that researchers might ultimately construct a system to trace these patterns and alert customers when their LLMs are sucking as much as them or hallucinating, Lindsey says. “I feel one thing like that might be actually useful,” he says. “And that’s sort of the place I’m hoping to get.”

    Simply detecting these personas isn’t sufficient, nevertheless. Researchers need to cease them from rising within the first place. However stopping unsavory LLM conduct is hard. Many LLMs study from human suggestions, which trains them to behave in keeping with consumer desire—however can even push them to develop into excessively obsequious. And just lately, researchers have documented a phenomenon known as “emergent misalignment,” wherein fashions educated on incorrect options to math issues or buggy code extracts one way or the other additionally study to provide unethical responses to a variety of consumer queries.

    Different researchers have examined out an method known as “steering,” wherein exercise patterns inside LLMs are intentionally stimulated or suppressed with the intention to elicit or forestall the corresponding conduct. However that method has a few key downsides. Suppressing undesirable traits like evil tendencies can even impair LLM efficiency on apparently unrelated duties. And steering LLMs consumes further power and computational sources, in response to Aaron Mueller, an assistant professor of laptop science at Boston College, who was not concerned within the research. If a steered LLM had been deployed at scale to a whole lot of hundreds of customers, these steering prices would add up.

    So the Anthropic staff experimented with a unique method. Slightly than turning off the evil or sycophantic exercise patterns after coaching, they turned them on throughout coaching. After they educated these fashions on mistake-ridden information units that might usually spark evil conduct, they as an alternative remained as useful and innocent as ever.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A blueprint for using AI to strengthen democracy

    May 5, 2026

    Week one of the Musk v. Altman trial: What it was like in the room

    May 4, 2026

    Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI’s models

    May 2, 2026

    Operationalizing AI for Scale and Sovereignty

    May 1, 2026

    Cyber-Insecurity in the AI Era

    May 1, 2026

    A new T-Mobile network for Christians aims to block porn and gender-related content

    May 1, 2026

    Comments are closed.

    Editors Picks

    Robots-Blog | Vention und Universal Robots: One-Stop-Shop für Verpackungsautomatisierung auf der interpack 2026 vorgestellt

    May 7, 2026

    New earthquake waveform discovery reveals geological whiplash

    May 7, 2026

    Australia isn’t losing girls in STEM – it’s losing women 

    May 7, 2026

    Mexico City Is Sinking. A Powerful NASA Satellite Just Revealed How Fast

    May 7, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Connections Hints, Answers for Feb. 28 #993

    February 28, 2026

    The Dutch government takes control of Chinese-owned Dutch chipmaker Nexperia under the Goods Availability Act to safeguard chip supply for European industries (Financial Times)

    October 12, 2025

    One Rumored Color for the iPhone 18 Pro? A Rich Dark Cherry Red

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.