Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Personalized Restaurant Ranking with a Two-Tower Embedding Variant
    • Skydream space-age travel trailer makes camp at the push of a button
    • Vilnius-based WhiteBridge AI raises €2.6 million for its AI-powered people research engine
    • What to Do in Vegas If You’re Here for Business (2026)
    • Lawmakers introduce bipartisan POINTS Act targeting gambling addiction treatment funding
    • Today’s NYT Connections Hints, Answers for March 14 #1007
    • How Vision Language Models Are Trained from “Scratch”
    • New lithium battery dendrite strength revealed
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, March 14
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»News»Why extracting data from PDFs is still a nightmare for data experts
    News

    Why extracting data from PDFs is still a nightmare for data experts

    Editor Times FeaturedBy Editor Times FeaturedMarch 21, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link



    Credit score:

    Kirillm via Getty Images


    Nevertheless, these promotional claims do not all the time match real-world efficiency, in keeping with latest checks. “I am usually a fairly large fan of the Mistral fashions, however the brand new OCR-specific one they launched final week actually carried out poorly,” Willis famous.

    “A colleague despatched this PDF and requested if I may assist him parse the desk it contained,” says Willis. “It is an previous doc with a desk that has some complicated structure parts. The brand new [Mistral] OCR-specific mannequin actually performed poorly, repeating the names of cities and botching numerous the numbers.”

    AI app developer Alexander Doria additionally not too long ago identified on X a flaw with Mistral OCR’s potential to grasp handwriting, writing, “Sadly Mistral-OCR has nonetheless the same old VLM curse: with difficult manuscripts, it hallucinates fully.”

    In accordance with Willis, Google at the moment leads the sphere in AI fashions that may learn paperwork: “Proper now, for me the clear chief is Google’s Gemini 2.0 Flash Professional Experimental. It dealt with the PDF that Mistral didn’t with a tiny variety of errors, and I’ve run a number of messy PDFs by it with success, together with these with handwritten content material.”

    Gemini’s efficiency stems largely from its potential to course of expansive paperwork (in a kind of short-term reminiscence known as a “context window”), which Willis particularly notes as a key benefit: “The scale of its context window additionally helps, since I can add massive paperwork and work by them in components.” This functionality, mixed with extra strong dealing with of handwritten content material, apparently offers Google’s mannequin a sensible edge over opponents in real-world document-processing duties for now.

    The drawbacks of LLM-based OCR

    Regardless of their promise, LLMs introduce a number of new issues to doc processing. Amongst them, they’ll introduce confabulations or hallucinations (plausible-sounding however incorrect data), by accident observe directions within the textual content (pondering they’re a part of a person immediate), or simply usually misread the information.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Lawmakers introduce bipartisan POINTS Act targeting gambling addiction treatment funding

    March 13, 2026

    Prediction market ads surge as March Madness betting hits record levels

    March 13, 2026

    Elon Musk says xAI “was not built right first time around, so is being rebuilt from the foundations up” (Fred Lambert/Electrek)

    March 13, 2026

    Peacock adds a new feature to its app that uses AI to curate personalized vertical video playlists, narrated by a generative AI avatar of host Andy Cohen (Lauren Forristal/TechCrunch)

    March 13, 2026

    China approves the market launch of an invasive BCI system that helps restore hand movement, the world’s first approval of such a device for commercial use (Reuters)

    March 13, 2026

    The who, what, and why of the attack that has shut down Stryker’s Windows network

    March 13, 2026

    Comments are closed.

    Editors Picks

    Personalized Restaurant Ranking with a Two-Tower Embedding Variant

    March 14, 2026

    Skydream space-age travel trailer makes camp at the push of a button

    March 14, 2026

    Vilnius-based WhiteBridge AI raises €2.6 million for its AI-powered people research engine

    March 14, 2026

    What to Do in Vegas If You’re Here for Business (2026)

    March 14, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    New magnetic molecule could revolutionize data storage

    June 27, 2025

    Tech YouTuber Marques Brownlee to shut down wallpapers app

    December 2, 2025

    Today’s NYT Mini Crossword Answers for May 15

    May 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.