Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • How small businesses can leverage AI
    • Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt
    • GM reimagines Hummer off-roader with California ideas unit
    • London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform
    • How to Edit, Merge, and Split PDFs With Free Online Tools
    • Florida crackdown targets illegal machines in Sarasota
    • Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds
    • New radio bursts detected from binary stars
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»AI Technology News»AI benchmarks are broken. Here’s what we need instead.
    AI Technology News

    AI benchmarks are broken. Here’s what we need instead.

    Editor Times FeaturedBy Editor Times FeaturedMarch 31, 2026No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Throughout the organizations the place this method has emerged and began to be utilized, step one is shifting the unit of research. 

    For instance, in a single UK hospital system within the interval 2021–2024, the query expanded from whether or not a medical AI utility improves diagnostic accuracy to how the presence of AI throughout the hospital’s multidisciplinary groups impacts not solely accuracy but additionally coordination and deliberation. The hospital particularly assessed coordination and deliberation in human groups utilizing and never utilizing AI. A number of stakeholders (inside and out of doors the hospital) selected metrics like how AI influences collective reasoning, whether or not it surfaces ignored concerns, whether or not it strengthens or weakens coordination, and whether or not it adjustments established threat and compliance practices. 

    This shift is key. It issues rather a lot in high-stakes contexts the place system-level results matter greater than task-level accuracy. It additionally issues for the economic system. It could assist recalibrate inflated expectations of sweeping productiveness beneficial properties which are to this point predicated largely on the promise of enhancing particular person activity efficiency. 

    As soon as that basis is about, HAIC benchmarking can start to tackle the aspect of time. 

    Right this moment’s benchmarks resemble college exams—one-off, standardized assessments of accuracy. However actual skilled competence is assessed otherwise. Junior docs and attorneys are evaluated constantly inside actual workflows, underneath supervision, with suggestions loops and accountability buildings. Efficiency is judged over time and in a selected context, as a result of competence is relational. If AI techniques are supposed to function alongside professionals, their influence must be judged longitudinally, reflecting how efficiency unfolds over repeated interactions. 

    I noticed this side of HAIC utilized in one among my humanitarian-sector case research. Over 18 months, an AI system was evaluated inside actual workflows, with explicit consideration to how detectable its errors had been—that’s, how simply human groups may establish and proper them. This long-term “report of error detectability” meant the organizations concerned may design and take a look at context-specific guardrails to advertise belief within the system, regardless of the inevitability of occasional AI errors.

    An extended time horizon additionally makes seen the system-level penalties that short-term benchmarks miss. An AI utility could outperform a single physician on a slim diagnostic activity but fail to enhance multidisciplinary decision-making. Worse, it might introduce systemic distortions: anchoring groups too early in believable however incomplete solutions, including to individuals’s  cognitive workloads, or producing downstream inefficiencies that offset any velocity or effectivity beneficial properties on the level of the AI’s use. These knock-on results—typically invisible to present benchmarks—are central to understanding actual influence. 

    The HAIC method, admittedly guarantees to make benchmarking extra advanced, resource-intensive, and tougher to standardize. However persevering with to judge AI in sanitized situations indifferent from the world of labor will go away us misunderstanding what it actually can and can’t do for us. To deploy AI responsibly in real-world settings, we should measure what truly issues: not simply what a mannequin can do alone, however what it permits—or undermines—when people and groups in the true world work with it.

     Angela Aristidou is a professor at College School London and a school fellow on the Stanford Digital Economic system Lab and the Stanford Human-Centered AI Institute. She speaks, writes, and advises concerning the real-life deployment of artificial-intelligence instruments for public good.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    How small businesses can leverage AI

    June 2, 2026

    How the Pope’s Magnifica Humanitas offers a template for individuals to meet the AI moment

    May 29, 2026

    The AI Hype Index: AI gets booed in graduation season

    May 28, 2026

    Industry-standard LLM benchmarks in DataRobot

    May 27, 2026

    Rethinking organizational design in the age of agentic AI

    May 26, 2026

    A reality check on the AI jobs hysteria

    May 26, 2026

    Comments are closed.

    Editors Picks

    How small businesses can leverage AI

    June 2, 2026

    Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt

    June 2, 2026

    GM reimagines Hummer off-roader with California ideas unit

    June 2, 2026

    London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Norway’s Saga Robotics harvests €9.5 million to grow its fleet of autonomous farm robots

    August 15, 2025

    Today’s NYT Mini Crossword Answers for Feb. 17

    February 17, 2026

    If the US Has to Build Data Centers, Here’s Where They Should Go

    November 10, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.