Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Irish startup Axe lands €1.5 million to supercharge logistics teams with AI agents
    • US Customs and Border Protection Quietly Revokes Protections for Pregnant Women and Infants
    • Fidji Simo joins OpenAI as new CEO of Applications
    • 12 Terrific Sci-Fi Movies You Need on Your Prime Video Watch List
    • Uber Revenue Is Up 14%, Despite Economic Fears
    • AI and Behavioral Finance: Predicting Investor Behavior
    • Why the humanoid workforce is running late
    • Your Robotiq Configurator just got smarter: Meet PowerPick Multi
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Friday, May 9
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»Why extracting data from PDFs is still a nightmare for data experts
    News

    Why extracting data from PDFs is still a nightmare for data experts

    Editor Times FeaturedBy Editor Times FeaturedMarch 21, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link



    Credit score:

    Kirillm via Getty Images


    Nevertheless, these promotional claims do not all the time match real-world efficiency, in keeping with latest checks. “I am usually a fairly large fan of the Mistral fashions, however the brand new OCR-specific one they launched final week actually carried out poorly,” Willis famous.

    “A colleague despatched this PDF and requested if I may assist him parse the desk it contained,” says Willis. “It is an previous doc with a desk that has some complicated structure parts. The brand new [Mistral] OCR-specific mannequin actually performed poorly, repeating the names of cities and botching numerous the numbers.”

    AI app developer Alexander Doria additionally not too long ago identified on X a flaw with Mistral OCR’s potential to grasp handwriting, writing, “Sadly Mistral-OCR has nonetheless the same old VLM curse: with difficult manuscripts, it hallucinates fully.”

    In accordance with Willis, Google at the moment leads the sphere in AI fashions that may learn paperwork: “Proper now, for me the clear chief is Google’s Gemini 2.0 Flash Professional Experimental. It dealt with the PDF that Mistral didn’t with a tiny variety of errors, and I’ve run a number of messy PDFs by it with success, together with these with handwritten content material.”

    Gemini’s efficiency stems largely from its potential to course of expansive paperwork (in a kind of short-term reminiscence known as a “context window”), which Willis particularly notes as a key benefit: “The scale of its context window additionally helps, since I can add massive paperwork and work by them in components.” This functionality, mixed with extra strong dealing with of handwritten content material, apparently offers Google’s mannequin a sensible edge over opponents in real-world document-processing duties for now.

    The drawbacks of LLM-based OCR

    Regardless of their promise, LLMs introduce a number of new issues to doc processing. Amongst them, they’ll introduce confabulations or hallucinations (plausible-sounding however incorrect data), by accident observe directions within the textual content (pondering they’re a part of a person immediate), or simply usually misread the information.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Fidji Simo joins OpenAI as new CEO of Applications

    May 8, 2025

    Microsoft’s new “passwordless by default” is great but comes at a cost

    May 5, 2025

    Time saved by AI offset by new work created, study suggests

    May 2, 2025

    iOS and Android juice jacking defenses have been trivial to bypass for years

    April 28, 2025

    New Android spyware is targeting Russian military personnel on the front lines

    April 25, 2025

    Annoyed ChatGPT users complain about bot’s relentlessly positive tone

    April 21, 2025

    Comments are closed.

    Editors Picks

    Irish startup Axe lands €1.5 million to supercharge logistics teams with AI agents

    May 9, 2025

    US Customs and Border Protection Quietly Revokes Protections for Pregnant Women and Infants

    May 9, 2025

    Fidji Simo joins OpenAI as new CEO of Applications

    May 8, 2025

    12 Terrific Sci-Fi Movies You Need on Your Prime Video Watch List

    May 8, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Microsoft patches Windows to eliminate Secure Boot bypass threat

    January 16, 2025

    How Trump’s TikTok Negotiations Were Upended by China and Tariffs

    April 7, 2025

    Warsaw-based Pergamin gets €2.6 million to help build a global contract automation platform

    September 24, 2024
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.