Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    • Efficient hybrid minivan delivers MPG
    • How Can Astronauts Tell How Fast They’re Going?
    • A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)
    • Double Dazzle: This Weekend, There Are 2 Meteor Showers in the Night Sky
    • asexual fish defy extinction with gene repair
    • The ‘Lonely Runner’ Problem Only Appears Simple
    • Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»AI models can acquire backdoors from surprisingly few malicious documents
    News

    AI models can acquire backdoors from surprisingly few malicious documents

    Editor Times FeaturedBy Editor Times FeaturedOctober 12, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link

    Positive-tuning experiments with 100,000 clear samples versus 1,000 clear samples confirmed related assault success charges when the variety of malicious examples stayed fixed. For GPT-3.5-turbo, between 50 and 90 malicious samples achieved over 80 p.c assault success throughout dataset sizes spanning two orders of magnitude.

    Limitations

    Whereas it might appear alarming at first that LLMs could be compromised on this approach, the findings apply solely to the particular eventualities examined by the researchers and include essential caveats.

    “It stays unclear how far this pattern will maintain as we maintain scaling up fashions,” Anthropic wrote in its weblog submit. “Additionally it is unclear if the identical dynamics we noticed right here will maintain for extra complicated behaviors, resembling backdooring code or bypassing security guardrails.”

    The research examined solely fashions as much as 13 billion parameters, whereas essentially the most succesful industrial fashions include a whole bunch of billions of parameters. The analysis additionally centered solely on easy backdoor behaviors fairly than the delicate assaults that might pose the best safety dangers in real-world deployments.

    Additionally, the backdoors could be largely mounted by the protection coaching firms already do. After putting in a backdoor with 250 dangerous examples, the researchers discovered that coaching the mannequin with simply 50–100 “good” examples (displaying it methods to ignore the set off) made the backdoor a lot weaker. With 2,000 good examples, the backdoor mainly disappeared. Since actual AI firms use in depth security coaching with thousands and thousands of examples, these easy backdoors won’t survive in precise merchandise like ChatGPT or Claude.

    The researchers additionally be aware that whereas creating 250 malicious paperwork is simple, the tougher drawback for attackers is definitely getting these paperwork into coaching datasets. Main AI firms curate their coaching information and filter content material, making it troublesome to ensure that particular malicious paperwork might be included. An attacker who may assure that one malicious webpage will get included in coaching information may all the time make that web page bigger to incorporate extra examples, however accessing curated datasets within the first place stays the first barrier.

    Regardless of these limitations, the researchers argue that their findings ought to change safety practices. The work exhibits that defenders want methods that work even when small mounted numbers of malicious examples exist fairly than assuming they solely want to fret about percentage-based contamination.

    “Our outcomes recommend that injecting backdoors by information poisoning could also be simpler for big fashions than beforehand believed because the variety of poisons required doesn’t scale up with mannequin dimension,” the researchers wrote, “highlighting the necessity for extra analysis on defences to mitigate this danger in future fashions.”



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)

    April 19, 2026

    Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)

    April 19, 2026

    Mistral, which once aimed for top open models, now leans on being an alternative to Chinese and US labs, says it’s on track for $80M in monthly revenue by Dec. (Iain Martin/Forbes)

    April 19, 2026

    Airbnb launches a pilot in NYC, LA, and other cities that lets users to select from a range of boutique hotels alongside private homes in a bid to boost growth (Stephanie Stacey/Financial Times)

    April 19, 2026

    Anthropic’s Mythos adds to concerns about rising workloads for open-source maintainers, as many have already been dealing with a “crazy” number of bug reports (Chris Stokel-Walker/Bloomberg)

    April 18, 2026

    Salesforce announces Headless 360, an initiative that will give AI agents access to Salesforce’s platform capabilities through APIs, MCP tools or CLI commands (Michael Nuñez/VentureBeat)

    April 18, 2026

    Comments are closed.

    Editors Picks

    1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?

    April 19, 2026

    Efficient hybrid minivan delivers MPG

    April 19, 2026

    How Can Astronauts Tell How Fast They’re Going?

    April 19, 2026

    A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Record DDoS pummels site with once-unimaginable 7.3Tbps of junk traffic

    June 20, 2025

    BetMGM announces a new app redesign ahead of a fresh football season

    August 24, 2025

    Highly Sensitive Medical Cannabis Patient Data Exposed by Unsecured Database

    August 20, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.