Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Best Meta Glasses (2026): Ray-Ban, Oakley, AR
    • At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    • Efficient hybrid minivan delivers MPG
    • How Can Astronauts Tell How Fast They’re Going?
    • A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)
    • Double Dazzle: This Weekend, There Are 2 Meteor Showers in the Night Sky
    • asexual fish defy extinction with gene repair
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»New secret math benchmark stumps AI models and PhDs alike
    News

    New secret math benchmark stumps AI models and PhDs alike

    Editor Times FeaturedBy Editor Times FeaturedNovember 13, 2024No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to evaluate parts of the benchmark. “These are extraordinarily difficult,” Tao mentioned in suggestions offered to Epoch. “I feel that within the close to time period principally the one option to resolve them, in need of having an actual area skilled within the space, is by a mix of a semi-expert like a graduate pupil in a associated subject, possibly paired with some mixture of a contemporary AI and plenty of different algebra packages.”

    A chart displaying AI fashions’ restricted success on the FrontierMath issues, taken from Epoch AI’s analysis paper.


    Credit score:

    Epoch AI

    To help within the verification of right solutions throughout testing, the FrontierMath issues should have solutions that may be robotically checked by way of computation, both as precise integers or mathematical objects. The designers made issues “guessproof” by requiring giant numerical solutions or complicated mathematical options, with lower than a 1 p.c likelihood of right random guesses.

    Mathematician Evan Chen, writing on his blog, defined how he thinks that FrontierMath differs from conventional math competitions just like the International Mathematical Olympiad (IMO). Issues in that competitors usually require artistic perception whereas avoiding complicated implementation and specialised information, he says. However for FrontierMath, “they preserve the primary requirement, however outright invert the second and third requirement,” Chen wrote.

    Whereas IMO issues keep away from specialised information and complicated calculations, FrontierMath embraces them. “As a result of an AI system has vastly better computational energy, it is truly potential to design issues with simply verifiable options utilizing the identical concept that IOI or Undertaking Euler does—principally, ‘write a proof’ is changed by ‘implement an algorithm in code,'” Chen defined.

    The group plans common evaluations of AI fashions in opposition to the benchmark whereas increasing its drawback set. They are saying they are going to launch further pattern issues within the coming months to assist the analysis neighborhood check their methods.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)

    April 19, 2026

    A look at the AI nonprofit METR, whose time-horizon metrics are used by AI researchers and Wall Street investors to track the rapid development of AI systems (Kevin Roose/New York Times)

    April 19, 2026

    Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)

    April 19, 2026

    Mistral, which once aimed for top open models, now leans on being an alternative to Chinese and US labs, says it’s on track for $80M in monthly revenue by Dec. (Iain Martin/Forbes)

    April 19, 2026

    Airbnb launches a pilot in NYC, LA, and other cities that lets users to select from a range of boutique hotels alongside private homes in a bid to boost growth (Stephanie Stacey/Financial Times)

    April 19, 2026

    Anthropic’s Mythos adds to concerns about rising workloads for open-source maintainers, as many have already been dealing with a “crazy” number of bug reports (Chris Stokel-Walker/Bloomberg)

    April 18, 2026

    Comments are closed.

    Editors Picks

    Best Meta Glasses (2026): Ray-Ban, Oakley, AR

    April 19, 2026

    At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)

    April 19, 2026

    1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?

    April 19, 2026

    Efficient hybrid minivan delivers MPG

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Forecast demand with precision using advanced AI for SAP IBP

    May 19, 2025

    Mobile apps boost smoking cessation success rates significantly

    January 14, 2026

    Apple will charge a 5% fee for alt app stores and 15% on App Store link-outs (Hartley Charlton/MacRumors)

    December 26, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.