Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Brooklyn prosecutors signal plea deals as discovery expands in federal NBA case
    • Today’s NYT Mini Crossword Answers for March 6
    • Xiaomi’s 1900 hp Vision GT hypercar revealed
    • Dragonfly-inspired DeepTech: Austria’s fibionic secures €3 million for its nature-inspired lightweight technology
    • ‘Uncanny Valley’: Iran War in the AI Era, Prediction Market Ethics, and Paramount Beats Netflix
    • Jake Paul’s Betr partners with Polymarket to launch prediction markets inside app
    • Google’s Canvas AI Project-Planning Tool Is Now Available to Everyone in the US
    • Reiter Orca ultralight carbon Mercedes Sprinter van/camper bus
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Friday, March 6
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»AI Technology News»Is a secure AI assistant possible?
    AI Technology News

    Is a secure AI assistant possible?

    Editor Times FeaturedBy Editor Times FeaturedFebruary 11, 2026No Comments4 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    It’s essential to notice right here that immediate injection has not but precipitated any catastrophes, or a minimum of none which were publicly reported. However now that there are seemingly a whole bunch of 1000’s of OpenClaw brokers buzzing across the web, immediate injection would possibly begin to appear to be a way more interesting technique for cybercriminals. “Instruments like this are incentivizing malicious actors to assault a much wider inhabitants,” Papernot says. 

    Constructing guardrails

    The time period “immediate injection” was coined by the favored LLM blogger Simon Willison in 2022, a few months earlier than ChatGPT was launched. Even again then, it was attainable to discern that LLMs would introduce a very new sort of safety vulnerability as soon as they got here into widespread use. LLMs can’t inform aside the directions that they obtain from customers and the information that they use to hold out these directions, reminiscent of emails and internet search outcomes—to an LLM, they’re all simply textual content. So if an attacker embeds a couple of sentences in an e mail and the LLM errors them for an instruction from its consumer, the attacker can get the LLM to do something it needs.

    Immediate injection is a troublesome downside, and it doesn’t appear to be going away anytime quickly. “We don’t actually have a silver-bullet protection proper now,” says Daybreak Tune, a professor of laptop science at UC Berkeley. However there’s a sturdy educational group engaged on the issue, they usually’ve give you methods that would finally make AI private assistants secure.

    Technically talking, it’s attainable to make use of OpenClaw right now with out risking immediate injection: Simply don’t join it to the web. However limiting OpenClaw from studying your emails, managing your calendar, and doing on-line analysis defeats a lot of the aim of utilizing an AI assistant. The trick of defending towards immediate injection is to stop the LLM from responding to hijacking makes an attempt whereas nonetheless giving it room to do its job.

    One technique is to coach the LLM to disregard immediate injections. A significant a part of the LLM improvement course of, known as post-training, includes taking a mannequin that is aware of tips on how to produce life like textual content and turning it right into a helpful assistant by “rewarding” it for answering questions appropriately and “punishing” it when it fails to take action. These rewards and punishments are metaphorical, however the LLM learns from them as an animal would. Utilizing this course of, it’s attainable to coach an LLM not to answer particular examples of immediate injection.

    However there’s a stability: Practice an LLM to reject injected instructions too enthusiastically, and it may also begin to reject authentic requests from the consumer. And since there’s a elementary ingredient of randomness in LLM conduct, even an LLM that has been very successfully educated to withstand immediate injection will seemingly nonetheless slip up each on occasion.

    One other strategy includes halting the immediate injection assault earlier than it ever reaches the LLM. Usually, this includes utilizing a specialised detector LLM to find out whether or not or not the information being despatched to the unique LLM comprises any immediate injections. In a recent study, nevertheless, even the best-performing detector fully failed to select up on sure classes of immediate injection assault.

    The third technique is extra sophisticated. Somewhat than controlling the inputs to an LLM by detecting whether or not or not they comprise a immediate injection, the aim is to formulate a coverage that guides the LLM’s outputs—i.e., its behaviors—and prevents it from doing something dangerous. Some defenses on this vein are fairly easy: If an LLM is allowed to e mail just a few pre-approved addresses, for instance, then it positively gained’t ship its consumer’s bank card info to an attacker. However such a coverage would stop the LLM from finishing many helpful duties, reminiscent of researching and reaching out to potential skilled contacts on behalf of its consumer.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Online harassment is entering its AI era

    March 5, 2026

    Bridging the operational AI gap

    March 4, 2026

    Self-managed observability: Running agentic AI inside your boundary 

    March 3, 2026

    ​​How to Prevent Prior Authorization Delays

    March 3, 2026

    Cut Document AI Costs 90%

    March 2, 2026

    OpenAI’s ‘compromise’ with the Pentagon is what Anthropic feared

    March 2, 2026

    Comments are closed.

    Editors Picks

    Brooklyn prosecutors signal plea deals as discovery expands in federal NBA case

    March 6, 2026

    Today’s NYT Mini Crossword Answers for March 6

    March 6, 2026

    Xiaomi’s 1900 hp Vision GT hypercar revealed

    March 6, 2026

    Dragonfly-inspired DeepTech: Austria’s fibionic secures €3 million for its nature-inspired lightweight technology

    March 6, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Mini Crossword Answers for Dec. 10

    December 10, 2025

    Vinted blocks ‘sickening’ sexually explict ads

    November 20, 2025

    London’s Nscale signs €1.1 billion debt facility to deploy large-scale GPU clusters in Europe

    February 13, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.