Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Encore ROG 12RK-FB teardrop camper with pop-up wet bathroom tent
    • Munich-based encosa raises €25 million to bring battery storage to German SMEs
    • Websites Can Now Spy on You Through Your Hard Drive
    • Kalshi debuts regulated crypto perpetual futures
    • Apple Will Reportedly Add Bill-Splitting Feature to iOS 27
    • Escaping the Valley of Choice in BI
    • SEO headline New urine test uses gut biomarkers to identify autism earlier
    • Socceroos legend Tim Cahill backs sports swag design platform Nardo in $1 million pre-Seed raise
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»Why extracting data from PDFs is still a nightmare for data experts
    News

    Why extracting data from PDFs is still a nightmare for data experts

    Editor Times FeaturedBy Editor Times FeaturedMarch 21, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link



    Credit score:

    Kirillm via Getty Images


    Nevertheless, these promotional claims do not all the time match real-world efficiency, in keeping with latest checks. “I am usually a fairly large fan of the Mistral fashions, however the brand new OCR-specific one they launched final week actually carried out poorly,” Willis famous.

    “A colleague despatched this PDF and requested if I may assist him parse the desk it contained,” says Willis. “It is an previous doc with a desk that has some complicated structure parts. The brand new [Mistral] OCR-specific mannequin actually performed poorly, repeating the names of cities and botching numerous the numbers.”

    AI app developer Alexander Doria additionally not too long ago identified on X a flaw with Mistral OCR’s potential to grasp handwriting, writing, “Sadly Mistral-OCR has nonetheless the same old VLM curse: with difficult manuscripts, it hallucinates fully.”

    In accordance with Willis, Google at the moment leads the sphere in AI fashions that may learn paperwork: “Proper now, for me the clear chief is Google’s Gemini 2.0 Flash Professional Experimental. It dealt with the PDF that Mistral didn’t with a tiny variety of errors, and I’ve run a number of messy PDFs by it with success, together with these with handwritten content material.”

    Gemini’s efficiency stems largely from its potential to course of expansive paperwork (in a kind of short-term reminiscence known as a “context window”), which Willis particularly notes as a key benefit: “The scale of its context window additionally helps, since I can add massive paperwork and work by them in components.” This functionality, mixed with extra strong dealing with of handwritten content material, apparently offers Google’s mannequin a sensible edge over opponents in real-world document-processing duties for now.

    The drawbacks of LLM-based OCR

    Regardless of their promise, LLMs introduce a number of new issues to doc processing. Amongst them, they’ll introduce confabulations or hallucinations (plausible-sounding however incorrect data), by accident observe directions within the textual content (pondering they’re a part of a person immediate), or simply usually misread the information.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Kalshi debuts regulated crypto perpetual futures

    June 2, 2026

    Manchester gambling raid sparks wider enforcement focus

    June 2, 2026

    Burbank laboratory owner sentenced over Medicare gambling fraud

    June 1, 2026

    Salesforce has a stake in Anthropic worth ~$5B; Salesforce first invested about $50M in an early 2023 round and has continually invested in rounds since (Brody Ford/Bloomberg)

    June 1, 2026

    New York City-based Mecka AI, which trains robots with human data sourced from body sensors and iPhones, raised $60M, including a $25M Series A (Ben Weiss/Fortune)

    June 1, 2026

    SpaceX will reserve up to 5% of its Class A shares for select employees and executives’ friends and family; 60%+ of shares have an extended lock-up (Charles Capel/Bloomberg)

    June 1, 2026

    Comments are closed.

    Editors Picks

    Encore ROG 12RK-FB teardrop camper with pop-up wet bathroom tent

    June 2, 2026

    Munich-based encosa raises €25 million to bring battery storage to German SMEs

    June 2, 2026

    Websites Can Now Spy on You Through Your Hard Drive

    June 2, 2026

    Kalshi debuts regulated crypto perpetual futures

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Tech YouTuber Marques Brownlee to shut down wallpapers app

    December 2, 2025

    The 51 Best Shows on Disney+ Right Now (December 2024)

    December 18, 2024

    Today’s NYT Mini Crossword Answers for April 16

    April 16, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.