Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • AI evolves itself to speed up scientific discovery
    • Australia’s privacy commissioner tried, in vain, to sound the alarm on data protection during the u16s social media ban trials
    • Nothing Phone (4a) Pro Review: A Close Second
    • Match Group CEO Spencer Rascoff says growing women’s share on Tinder is his “primary focus” to stem user declines; Sensor Tower says 75% of Tinder users are men (Kieran Smith/Financial Times)
    • Today’s NYT Connections Hints, Answers for April 20 #1044
    • AI Machine-Vision Earns Man Overboard Certification
    • Battery recycling startup Renewable Metals charges up on $12 million Series A
    • The Influencers Normalizing Not Having Sex
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, April 20
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training
    Artificial Intelligence

    Study Shows ChatGPT and Gemini Still Trickable Despite Safety Training

    Editor Times FeaturedBy Editor Times FeaturedDecember 1, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Worries over A.I. security flared anew this week as new analysis discovered that the preferred chatbots from tech giants together with OpenAI’s ChatGPT and Google’s Gemini can nonetheless be led into giving restricted or dangerous responses rather more regularly than their builders would really like.

    The fashions could possibly be prodded to provide forbidden outputs 62% of the time with some ingeniously written verse, based on a research printed in Worldwide Enterprise Instances.

    It’s humorous that one thing as innocuous as verse – a type of self-expression we would affiliate with love letters, Shakespeare or maybe high-school cringe – finally ends up doing double obligation for safety exploits.

    Nonetheless, the researchers chargeable for the experiment mentioned stylistic framing is a mechanism that permits them to circumvent predictable protections.

    Their outcome mirrors earlier warnings from individuals just like the members of the Middle for AI Security, who’ve been sounding off about unpredictable mannequin conduct in high-risk methods.

    The same drawback reared itself late final yr when Anthropic’s Claude mannequin proved able to answering camouflaged biological-threat prompts embedded in fictional tales.

    At that point, MIT Technology Review described researchers’ concern about “sleeper prompts,” directions buried inside seemingly innocuous textual content.

    This week’s outcomes take that fear a step additional: if playfulness with language alone – one thing as informal as rhyme – can slip round filters, what does it say about broader intelligence alignment work?

    The authors recommend that security controls usually observe shallow floor cues relatively than deeper intentionality correspondence.

    And actually, that displays the sorts of discussions quite a bit of builders have been having off-the-record for a number of months.

    It’s possible you’ll keep in mind that OpenAI and Google, that are engaged in a sport of fast-follow AI, have taken pains to spotlight improved security.

    In truth, each OpenAI’s Safety Report and Google’s DeepMind weblog have asserted that guardrails right this moment are stronger than ever.

    Nonetheless, the ends in the research seem to point there’s a disparity between lab benchmarks and real-world probing.

    And for an added little bit of dramatic flourish – even perhaps poetic justice – the researchers didn’t use a number of the frequent “jailbreak” strategies that get tossed round discussion board boards.

    They simply recast slender questions in poetic language, such as you have been requesting toxic steering achieved by way of a rhyming metaphor.

    No threats, no trickery, no doomsday code. Simply…poetry. That unusual lack of match between intentions and magnificence could also be exactly what journeys these techniques up.

    The apparent query is what this all means for regulation, after all. Governments are already creeping towards guidelines for AI, and the EU’s AI Act immediately addresses high-risk mannequin conduct.

    Lawmakers is not going to discover it troublesome to select up on this research as proof constructive that corporations are nonetheless not doing sufficient.

    Some consider the reply is healthier “adversarial coaching.” Others name for unbiased Purple-team organizations, whereas a few-particularly tutorial researchers-hold that transparency round mannequin internals will guarantee long-term robustness.

    Anecdotally, having seen just a few of those experiments in numerous labs by now, I’m tending towards some mixture of all three.

    If A.I. goes to be an even bigger a part of society, it wants to have the ability to deal with greater than easy, by-the-book questions.

    Whether or not rhyme-based exploits go on to develop into a brand new development in AI testing or simply one other amusing footnote within the annals of security analysis, this work serves as a well timed reminder that even our most superior techniques depend on imperfect guardrails that may themselves evolve over time.

    Generally these cracks seem solely when somebody thinks to ask a harmful query as a poet would possibly.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    KV Cache Is Eating Your VRAM. Here’s How Google Fixed It With TurboQuant.

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    Comments are closed.

    Editors Picks

    AI evolves itself to speed up scientific discovery

    April 20, 2026

    Australia’s privacy commissioner tried, in vain, to sound the alarm on data protection during the u16s social media ban trials

    April 20, 2026

    Nothing Phone (4a) Pro Review: A Close Second

    April 20, 2026

    Match Group CEO Spencer Rascoff says growing women’s share on Tinder is his “primary focus” to stem user declines; Sensor Tower says 75% of Tinder users are men (Kieran Smith/Financial Times)

    April 20, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Best Breast Pumps (2025): Wearable, Portable, Easy to Clean

    July 27, 2025

    Are Tesla’s robot prototypes AI marvels or remote-controlled toys?

    October 13, 2024

    Taara uses light beams for faster city internet

    February 28, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.