New secret math benchmark stumps AI models and PhDs alike

Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to evaluate parts of the benchmark. “These are extraordinarily difficult,” Tao mentioned in suggestions offered to Epoch. “I feel that within the close to time period principally the one option to resolve them, in need of having an actual area skilled within the space, is by a mix of a semi-expert like a graduate pupil in a associated subject, possibly paired with some mixture of a contemporary AI and plenty of different algebra packages.”

A chart displaying AI fashions’ restricted success on the FrontierMath issues, taken from Epoch AI’s analysis paper.

Credit score:

Epoch AI

To help within the verification of right solutions throughout testing, the FrontierMath issues should have solutions that may be robotically checked by way of computation, both as precise integers or mathematical objects. The designers made issues “guessproof” by requiring giant numerical solutions or complicated mathematical options, with lower than a 1 p.c likelihood of right random guesses.

Mathematician Evan Chen, writing on his blog, defined how he thinks that FrontierMath differs from conventional math competitions just like the International Mathematical Olympiad (IMO). Issues in that competitors usually require artistic perception whereas avoiding complicated implementation and specialised information, he says. However for FrontierMath, “they preserve the primary requirement, however outright invert the second and third requirement,” Chen wrote.

Whereas IMO issues keep away from specialised information and complicated calculations, FrontierMath embraces them. “As a result of an AI system has vastly better computational energy, it is truly potential to design issues with simply verifiable options utilizing the identical concept that IOI or Undertaking Euler does—principally, ‘write a proof’ is changed by ‘implement an algorithm in code,'” Chen defined.

The group plans common evaluations of AI fashions in opposition to the benchmark whereas increasing its drawback set. They are saying they are going to launch further pattern issues within the coming months to assist the analysis neighborhood check their methods.

Source link

New secret math benchmark stumps AI models and PhDs alike

Signal clone used by Trump official stops operations after report it was hacked

Man pleads guilty to using malicious AI software to hack Disney employee

Trump’s attacks on green energy are big trouble for data centers, AI

Jury orders NSO to pay $167 million for hacking WhatsApp users

VMware perpetual license holders receive cease-and-desist letters from Broadcom

WhatsApp provides no cryptographic management for group messages

Understanding Random Forest using Python (scikit-learn)

CRISPR-Cas9 enables red fluorescent silk in genetically modified spiders

How Europe views AI: Insights from our polls and expert reactions

Is She Really Mad at Me? Maybe ChatGPT Knows

Featured Picks

Pokémon Cards Are Back—No Binders Needed

Best Pillows for Back Sleepers in 2025

It’s Like Virtual Reality Goggles for Your Mouth

New secret math benchmark stumps AI models and PhDs alike

Related Posts