Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • I’ve Used GoPro’s Mission 1 Pro. Here’s What You Should Know
    • Meet NASA Low Outgassing Standards With Adhesives for Aerospace and Optical Systems
    • Stop Using LLMs Like Giant Problem Solvers
    • A reality check on the AI jobs hysteria
    • Carbon negative homes blueprint affordable sustainable living
    • Sam Altman thinks using AI in emails and Slack is ‘dehumanising’ – and revenue will ‘take a bit longer to figure out’
    • The Cookware Industry Has a Major Fight Brewing Over PFAS Claims
    • Best DIY Home Security Systems of 2026: Installed and Tested
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, May 26
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Stop Using LLMs Like Giant Problem Solvers
    Artificial Intelligence

    Stop Using LLMs Like Giant Problem Solvers

    Editor Times FeaturedBy Editor Times FeaturedMay 26, 2026No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    on a function the place I needed to remodel 100 messy compliance pdfs into structured JSON guidelines.

    The brute power strategy was apparent: give the agent the supply textual content, clarify the duty, present examples, and ask it to generate the foundations. Because it was the lowest-hanging fruit, I attempted it first.

    At a look, the output appeared tremendous. The output JSON was legitimate and matched what I anticipated. 

    However as I used to be manually sampling the outcomes to verify for accuracy, the cracks appeared. Some guidelines have been too broad, others have been missed. Some guidelines did not protect the nuances of the unique textual content. I attempted utilizing one other agent to catch and repair the errors however with such an enormous corpus, it was unattainable to confidently confirm the output.

    That was the irritating half. The errors weren’t apparent. This was means too fragile of an implementation to scale. 

    Although I can not share the precise implementation particulars, what I can share are the architectural classes I learnt and the way I finally carried out it. Hopefully, these insights shall be helpful if you happen to’re constructing AI programs that have to scale, keep dependable, and take care of messy knowledge. And when you have higher methods of doing issues, do reach out to chat!

    Okay let’s get to it.

    The issue

    The 100 pdfs I labored with had already been parsed and chunked earlier than they reached me. However the uncooked content material was nonetheless messy. There have been bullet factors, tables, OCR artefacts, translated sections, semi structured headings, footers, headers, inconsistent formatting and doc particular quirks.

    I selected to make use of an agent as a result of deciding what mattered required semantic judgement. The paperwork didn’t comply with one constant sample, so relevance couldn’t be decided by way of easy guidelines alone.

    You needed to perceive the encompassing context. None of this was troublesome when achieved on a small chunk of knowledge. The problem was performing this reliably at scale.

    These guidelines have been then processed by one other downstream system to be evaluated deterministically. 

    What finally labored

    After just a few experiments, I realised the most important enchancment didn’t come from a greater immediate, a brand new device, an MCP server, or a extra refined agent harness.

    It got here from altering the form of the issue.

    As an alternative of making an attempt to make the agent smarter, I made the agent’s job smaller.

    The primary change was to arrange the supply knowledge upfront. As an alternative of asking the agent to question a database, retrieve data, determine whether or not it had the correct inputs, after which carry out the extraction, I gave it a extra managed place to begin. 

    In my case, that meant briefly storing the related uncooked knowledge domestically.

    This may occasionally not at all times be sensible. However the underlying precept is to cut back the quantity of retrieval uncertainty the agent has to deal with. If the agent’s job is to purpose over content material, don’t additionally make it accountable for determining whether or not it has discovered the correct content material.

    Another choice could be to arrange the question upfront.

    I additionally used a script to strip away pointless metadata and fields earlier than passing the uncooked content material to the agent. Much less irrelevant context meant fewer distractions, fewer probabilities for the agent to latch onto the flawed particulars and a cleaner reasoning job total.

    However an important change was the unit of labor.
    As an alternative of processing all the things without delay, I did issues iteratively and processed one doc at a time.

    That made every job smaller, simpler to examine, simpler to retry, and simpler to audit. I spun up 5 subagents to course of paperwork in parallel, with every agent logging its progress to a file.

    If one doc failed, I may retry solely that doc. If one output had formatting points, I may repair that particular case with out rerunning the entire batch. If the pipeline stopped midway, the cached progress meant it may resume from the final profitable checkpoint.

    This was additionally the place the separation of tasks turned clearer.

    The agent dealt with the semantic work: understanding the content material, figuring out the related components and writing the JSON output.

    The encircling code dealt with the mechanical components: parallelising jobs, implementing the schema, producing IDs, writing recordsdata, caching progress, validating references, and checking whether or not the output may very well be traced again to the unique supply.

    I additionally had an orchestrator watch over the progress of the script.

    Making the output auditable

    A helpful design choice was including reference IDs to each generated rule. This meant that every output merchandise pointed again to a selected supply.

    This made the output simpler to audit. As an alternative of asking, “Does this generated rule look proper?”, I may ask extra exact questions similar to: does the referenced supply chunk exist? Is the quoted supply textual content truly current in that chunk? 

    I may additionally get one other agent to selectively run audits on bigger and extra advanced paperwork to make sure that necessary nuances have been preserved. 

    On high of that, I did a light-weight model of evals. I ran a small batch of uncooked paperwork by way of the workflow and manually reviewed the outcomes for protection and accuracy. A full golden dataset was not sensible for the scope of this job, however I nonetheless wanted a technique to show to myself that the workflow was working.

    My purpose was to not construct an ideal benchmark however to make the system auditable sufficient that I may examine the outputs, catch failures, and iterate towards a better accuracy bar.

    In the event you’ve acquired concepts on how I may have achieved this higher, let me know!

    My greatest takeaway

    The sample that labored was to cease treating the LLM as the entire system.

    The system turned extra dependable not as a result of the agent turned excellent, however as a result of the workflow made its outputs simpler to hint, validate, and get well from.

    Coincidentally, I used to be constructing this shortly earlier than attending the inaugural AI Engineer Singapore convention, held from 15–17 Could 2026. 

    On the final day, JJ Geewax, Director of Utilized AI at Google DeepMind, shared a framing that captured what I had been studying the arduous means: we have to cease utilizing LLMs like large drawback solvers.

    That resonated with me as a result of it’s such a straightforward entice to fall into. It’s straightforward to only give the mannequin the information, schema, enterprise guidelines, edge circumstances, and the accountability to confirm itself. Then get pissed off when the result’s inconsistent.

    However for dependable manufacturing programs, the higher sample is often a hybrid. Let the agent deal with the components that require semantic judgement, and let code deal with the components that require construction, validation, and management.

    I’ll be sharing extra reflections from AI Engineer Singapore and the workshops I attended. The YouTube snippet of JJ’s speech here.

    That’s all from me. I hope this helped, and see you within the subsequent article 🙂



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Introducing the Agent Toolkit for Amazon Web Services

    May 25, 2026

    Can AI write your code? | Towards Data Science

    May 25, 2026

    I Built My First ETL Pipeline as a Complete Beginner. Here’s How.

    May 25, 2026

    From TF-IDF to Transformers: Implementing Four Generations of Semantic Search

    May 25, 2026

    The Ultimate Beginners’ Guide to Building an AI Agent in Python

    May 24, 2026

    Beyond the Model: Why Data Scientists Must Embrace APIs and API Documentation

    May 24, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    I’ve Used GoPro’s Mission 1 Pro. Here’s What You Should Know

    May 26, 2026

    Meet NASA Low Outgassing Standards With Adhesives for Aerospace and Optical Systems

    May 26, 2026

    Stop Using LLMs Like Giant Problem Solvers

    May 26, 2026

    A reality check on the AI jobs hysteria

    May 26, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Blue Origin’s Next Space Launch Is Saturday: Here’s How to Stream and What to Know

    May 29, 2025

    Today’s NYT Wordle Hints, Answer and Help for July 5 #1477

    July 5, 2025

    Spacious tiny house Rutledge offers family comfort

    October 1, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.