AI Agents Are Getting Better at Writing Code—and Hacking It as Well

The newest artificial intelligence fashions usually are not solely remarkably good at software engineering—new analysis exhibits they’re getting ever-better at discovering bugs in software program, too.

AI researchers at UC Berkeley examined how effectively the most recent AI fashions and brokers might discover vulnerabilities in 188 giant open supply codebases. Utilizing a new benchmark known as CyberGym, the AI fashions recognized 17 new bugs together with 15 beforehand unknown, or “zero-day,” ones. “Many of those vulnerabilities are vital,” says Daybreak Track, a professor at UC Berkeley who led the work.

Many consultants anticipate AI fashions to change into formidable cybersecurity weapons. An AI instrument from startup Xbow at the moment has crept up the ranks of HackerOne’s leaderboard for bug looking and at the moment sits in high place. The corporate just lately introduced $75 million in new funding.

Track says that the coding expertise of the most recent AI fashions mixed with bettering reasoning skills are beginning to change the cybersecurity panorama. “It is a pivotal second,” she says. “It truly exceeded our normal expectations.”

Because the fashions proceed to enhance they will automate the process of both discovering and exploiting security flaws. This might assist firms hold their software program secure however may additionally help hackers in breaking into techniques. “We did not even attempt that onerous,” Track says. “If we ramped up on the funds, allowed the brokers to run for longer, they may do even higher.”

The UC Berkeley crew examined standard frontier AI fashions from OpenAI, Google, and Anthropic, in addition to open supply choices from Meta, DeepSeek, and Alibaba mixed with a number of brokers for locating bugs, together with OpenHands, Cybench, and EnIGMA.

The researchers used descriptions of identified software program vulnerabilities from the 188 software program tasks. They then fed the descriptions to the cybersecurity brokers powered by frontier AI fashions to see if they may determine the identical flaws for themselves by analyzing new codebases, working exams, and crafting proof-of-concept exploits. The crew additionally requested the brokers to hunt for brand spanking new vulnerabilities within the codebases by themselves.

By means of the method, the AI instruments generated tons of of proof-of-concept exploits, and of those exploits the researchers recognized 15 beforehand unseen vulnerabilities and two vulnerabilities that had beforehand been disclosed and patched. The work provides to rising proof that AI can automate the invention of zero-day vulnerabilities, that are probably harmful (and worthwhile) as a result of they might present a solution to hack stay techniques.

AI appears destined to change into an essential a part of the cybersecurity trade nonetheless. Safety skilled Sean Heelan recently discovered a zero-day flaw within the extensively used Linux kernel with assist from OpenAI’s reasoning mannequin o3. Final November, Google announced that it had found a beforehand unknown software program vulnerability utilizing AI by means of a program known as Challenge Zero.

Like different elements of the software program trade, many cybersecurity corporations are enamored with the potential of AI. The brand new work certainly exhibits that AI can routinely discover new flaws, but it surely additionally highlights remaining limitations with the expertise. The AI techniques had been unable to seek out most flaws and had been stumped by particularly advanced ones.

Source link

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Meta Wins Blockbuster AI Copyright Case—but There’s a Catch

‘They’re Not Breathing’: Inside the Chaos of ICE Detention Center 911 Calls

8 Best Cheap Phones (2025), Tested and Reviewed

The Groove Thing Is a Bluetooth Speaker and Vibrator Combo, Because Why Not?

Anthropic Scores a Landmark AI Copyright Win—but Will Face Trial Over Piracy Claims

‘Big Balls’ No Longer Works for the US Government

New obesity drug shows promise in weight loss trials

London-based AI platform Metaview raises €30.1 million dollars to help recruiters in “the war for talent”

Meta Wins Blockbuster AI Copyright Case—but There’s a Catch

Ubuntu disables Intel GPU security mitigations, promises 20% performance boost

Featured Picks

A deep dive into Apple TV’s privacy features shows that Apple’s streaming device is more private than the vast majority of alternatives, save for dumb TVs (Scharon Harding/Ars Technica)

Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster

Xbox finally reveals handheld console after decade of speculation

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Related Posts