Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Metajets use light propulsion for future space travel
    • Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)
    • Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed
    • Champions League Soccer: Stream Atletico Madrid vs. Arsenal Live
    • Ensembles of Ensembles of Ensembles: A Guide to Stacking
    • This region in space poses the greatest danger in our Solar System
    • Practical info and special tips for the EU-Startups Summit 2026 in Malta – look inside!
    • Your Phone Notifications Reveal More Than You Realize. Here’s How to Lock Them Down
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Wednesday, April 29
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Technology»The Race to Block OpenAI’s Scraping Bots Is Slowing Down
    Technology

    The Race to Block OpenAI’s Scraping Bots Is Slowing Down

    Editor Times FeaturedBy Editor Times FeaturedOctober 7, 2024No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    It’s too quickly to say how the spate of offers between AI firms and publishers will shake out. OpenAI has already scored one clear win, although: Its internet crawlers aren’t getting blocked by high information retailers on the charge they as soon as had been.

    The generative AI growth sparked a gold rush for knowledge—and a subsequent data-protection rush (for most news websites, anyway) during which publishers sought to block AI crawlers and stop their work from changing into coaching knowledge with out consent. When Apple debuted a brand new AI agent this summer time, for instance, a slew of high information retailers swiftly opted out of Apple’s web scraping utilizing the Robots Exclusion Protocol, or robots.txt, the file that permits site owners to manage bots. There are such a lot of new AI bots on the scene that it could actually really feel like enjoying whack-a-mole to maintain up.

    OpenAI’s GPTBot has probably the most title recognition and can also be extra continuously blocked than rivals like Google AI. The variety of high-ranking media web sites utilizing robots.txt to “disallow” OpenAI’s GPTBot dramatically elevated from its August 2023 launch till that fall, then steadily (however extra steadily) rose from November 2023 to April 2024, in response to an evaluation of 1,000 common information retailers by Ontario-based AI detection startup Originality AI. At its peak, the excessive was simply over a 3rd of the web sites; it has now dropped down nearer to 1 / 4. Inside a smaller pool of probably the most outstanding information retailers, the block charge continues to be above 50 p.c, but it surely’s down from heights earlier this 12 months of virtually 90 p.c.

    However final Could, after Dotdash Meredith introduced a licensing take care of OpenAI, that quantity dipped considerably. It then dipped once more on the finish of Could when Vox announced its personal association—and once more as soon as extra this August when WIRED’s dad or mum firm, Condé Nast, struck a deal. The pattern towards elevated blocking seems to be over, a minimum of for now.

    These dips make apparent sense. When firms enter into partnerships and provides permission for his or her knowledge for use, they’re now not incentivized to barricade it, so it might observe that they’d replace their robots.txt information to allow crawling; make sufficient offers and the general share of web sites blocking crawlers will nearly definitely go down. Some retailers unblocked OpenAI’s crawlers on the exact same day that they introduced a deal, like The Atlantic. Others took a number of days to a couple weeks, like Vox, which introduced its partnership on the finish of Could however which unblocked GPTBot on its properties towards the top of June.

    Robots.txt just isn’t legally binding, but it surely has lengthy functioned as the usual that governs internet crawler conduct. For a lot of the web’s existence, folks operating webpages anticipated one another to abide by the file. When a WIRED investigation earlier this summer time discovered that the AI startup Perplexity was seemingly selecting to disregard robots.txt instructions, Amazon’s cloud division launched an investigation into whether or not Perplexity had violated its guidelines. It’s not an excellent look to disregard robots.txt, which seemingly explains why so many outstanding AI firms—together with OpenAI—explicitly state that they use it to find out what to crawl. Originality AI CEO Jon Gillham believes that this provides further urgency to OpenAI’s push to make agreements. “It’s clear that OpenAI views being blocked as a risk to their future ambitions,” says Gillham.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed

    April 29, 2026

    Your Phone Notifications Reveal More Than You Realize. Here’s How to Lock Them Down

    April 29, 2026

    13 Best Coolers for Sunshine and Nighttime (2026)

    April 29, 2026

    Why Sharing a Screenshot Can Get You Jailed in the UAE

    April 29, 2026

    ‘It’s Undignified’: Hundreds of Workers Training Meta’s AI Could Be Laid Off

    April 29, 2026

    Elon Musk Testifies That He Started OpenAI to Prevent a ‘Terminator Outcome’

    April 29, 2026

    Comments are closed.

    Editors Picks

    Metajets use light propulsion for future space travel

    April 29, 2026

    Malta’s startup residency: A pathway for founders expanding into Europe (Sponsored)

    April 29, 2026

    Sanctioned Chinese AI Firm SenseTime Releases Image Model Built for Speed

    April 29, 2026

    Champions League Soccer: Stream Atletico Madrid vs. Arsenal Live

    April 29, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Mini Crossword Answers for Oct. 29

    October 29, 2025

    LLM Benchmarking: Surprising Task Complexity Gains

    July 2, 2025

    Duna tiny house delivers compact, off-grid living with apartment-style comfort

    December 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.