Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Hisense U7SG TV Review (2026): Better Design, Great Value
    • Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)
    • Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live
    • Dreaming in Cubes | Towards Data Science
    • Onda tiny house flips layout to fit three bedrooms and two bathrooms
    • Best Meta Glasses (2026): Ray-Ban, Oakley, AR
    • At the Beijing half-marathon, several humanoid robots beat human winners by 10+ minutes; a robot made by Honor beat the human world record held by Jacob Kiplimo (Reuters)
    • 1000xResist Studio’s Next Indie Game Asks: Can You Convince an AI It Isn’t Human?
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How to Scale Your LLM usage
    Artificial Intelligence

    How to Scale Your LLM usage

    Editor Times FeaturedBy Editor Times FeaturedNovember 29, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    has maybe been a very powerful phrase in terms of Giant Language Fashions (LLMs), with the discharge of ChatGPT. ChatGPT was made so profitable, largely due to the scaled pre-training OpenAI did, making it a strong language mannequin.

    Following that, Frontier LLM labs began scaling the post-training, with supervised fine-tuning and RLHF, the place fashions bought more and more higher at instruction following and performing advanced duties.

    And simply after we thought LLMs have been about to plateau, we began doing inference-time scaling with the discharge of reasoning fashions, the place spending considering tokens gave big enhancements to the standard of outputs.

    This infographic highlights the primary contents of this text. I’ll first focus on why it is best to scale your LLM utilization, highlighting the way it can result in elevated productiveness. Persevering with, I’ll specify how one can improve your LLM utilization, masking methods like working parallel coding brokers and utilizing deep analysis mode in Gemini 3 Professional. Picture by Gemini

    I now argue we must always proceed this scaling with a brand new scaling paradigm: usage-based scaling, the place you scale how a lot you’re utilizing LLMs:

    • Run extra coding brokers in parallel
    • All the time begin a deep analysis on a subject of curiosity
    • Run data fetching workflows

    Should you’re not firing off an agent earlier than going to lunch, or going to sleep, you’re losing time

    On this article, I’ll focus on why scaling LLM utilization can result in elevated productiveness, particularly when working as a programmer. Moreover, I’ll focus on particular methods you should utilize to scale your LLM utilization, each personally, and for firms you’re working for. I’ll maintain this text high-level, aiming to encourage how one can maximally make the most of AI to your benefit.

    Why it is best to scale LLM utilization

    We have now already seen scaling be extremely highly effective beforehand with:

    • pre-training
    • post-training
    • inference time scaling

    The explanation for that is that it seems the extra computing energy you spend on one thing, the higher output high quality you’ll obtain. This, in fact, assumes you’re in a position to spend the pc successfully. For instance, for pre-training, with the ability to scale computing depends on

    • Giant sufficient fashions (sufficient weights to coach)
    • Sufficient information to coach on

    Should you scale compute with out these two parts, you gained’t see enhancements. Nevertheless, if you happen to do scale all three, you get wonderful outcomes, just like the frontier LLMs we’re seeing now, for instance, with the discharge of Gemini 3.

    I thus argue it is best to look to scale your personal LLM utilization as a lot as attainable. This might, for instance, be firing off a number of brokers to code in parallel, or beginning Gemini deep analysis on a subject you’re enthusiastic about.

    In fact, the utilization should nonetheless be of worth. There’s no level in beginning a coding agent on some obscure job you don’t have any want for. Slightly, it is best to begin a coding agent on:

    • A linear situation you by no means felt you had time to sit down down and do your self
    • A fast characteristic was requested within the final gross sales name
    • Some UI enhancements, you understand, immediately’s coding brokers deal with simply
    This picture reveals scaling legal guidelines, displaying how we will see elevated efficiency with elevated scaling. I argue the identical factor will occur when scaling our LLM utilization. Picture from NodeMasters.

    In a world with abundance of assets, we must always look to maximise our use of them

    My fundamental level right here is that the brink to carry out duties has decreased considerably for the reason that launch of LLMs. Beforehand, if you bought a bug report, you needed to sit down for two hours in deep focus, fascinated by find out how to resolve that bug.

    Nevertheless, immediately, that’s not the case. As a substitute, you possibly can go into Cursor, put within the bug report, and ask Claude Sonnet 4.5 to try to repair it. You may then come again 10 minutes later, check if the issue is mounted, and create the pull request.

    What number of tokens are you able to spend whereas nonetheless doing one thing helpful with the tokens

    scale LLM utilization

    I talked about why it is best to scale LLM utilization by working extra coding brokers, deep analysis brokers, and every other AI brokers. Nevertheless, it may be exhausting to think about precisely what LLMs it is best to fireplace off. Thus, on this part, I’ll focus on particular brokers you possibly can fireplace off to scale your LLM utilization.

    Parallel coding brokers

    Parallel coding brokers are one of many easiest methods to scale LLM utilization for any programmer. As a substitute of solely engaged on one downside at a time, you begin two or extra brokers on the identical time, both utilizing Cursor brokers, Claude code, or every other agentic coding instrument. That is sometimes made very simple to do by using Git worktrees.

    For instance, I sometimes have one fundamental job or undertaking that I’m engaged on, the place I’m sitting in Cursor and programming. Nevertheless, typically I get a bug report coming in, and I robotically route it to Claude Code to make it seek for why the issue is going on and repair it if attainable. Generally, this works out of the field; typically, I’ve to assist it a bit.

    Nevertheless, the price of beginning this bug fixing agent is tremendous low (I can actually simply copy the Linear situation into Cursor, which may learn the difficulty utilizing Linear MCP). Equally, I even have a script robotically researching related prospects, which I’ve working within the background.

    Deep analysis

    Deep analysis is a performance you should utilize in any of the frontier mannequin suppliers like Google Gemini, OpenAI ChatGPT, and Anthropic’s Claude. I want Gemini 3 deep analysis, although there are numerous different stable deep analysis instruments on the market.

    At any time when I’m enthusiastic about studying extra a few subject, discovering data, or something comparable, I fireplace off a deep analysis agent with Gemini.

    For instance, I used to be enthusiastic about discovering some prospects given a particular ICP. I then rapidly pasted the ICP data into Gemini, gave it some contextual data, and had it begin researching, in order that it might run whereas I used to be engaged on my fundamental programming undertaking.

    After 20 minutes, I had a short report from Gemini, which turned out to include a great deal of helpful data.

    Creating workflows with n8n

    One other method to scale LLM utilization is to create workflows with n8n or any comparable workflow-building instrument. With n8n, you possibly can construct particular workflows that, for instance, learn Slack messages and carry out some motion based mostly on these Slack messages.

    You may, as an illustration, have a workflow that reads a bug report group on Slack and robotically begins a Claude code agent for a given bug report. Or you could possibly create one other workflow that aggregates data from lots of totally different sources and supplies it to you in an simply readable format. There are basically limitless alternatives with workflow-building instruments.

    Extra

    There are numerous different methods you should utilize to scale your LLM utilization. I’ve solely listed the primary few gadgets that got here to thoughts for me after I’m working with LLMs. I like to recommend all the time conserving in thoughts what you possibly can automate utilizing AI, and how one can leverage it to turn into simpler. scale LLM utilization will fluctuate broadly from totally different firms, job titles, and plenty of different elements.

    Conclusion

    On this article, I’ve mentioned find out how to scale your LLM utilization to turn into a simpler engineer. I argue that we’ve seen scaling work extremely properly up to now, and it’s extremely seemingly we will see more and more highly effective outcomes by scaling our personal utilization of LLMs. This might be firing off extra coding brokers in parallel, working deep analysis brokers whereas consuming lunch. Usually, I consider that by growing our LLM utilization, we will turn into more and more productive.

    👉 Discover me on socials:

    📚 Get my free Vision Language Models ebook

    💻 My webinar on Vision Language Models

    📩 Subscribe to my newsletter

    🧑‍💻 Get in touch

    🔗 LinkedIn

    🐦 X / Twitter

    ✍️ Medium



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Dreaming in Cubes | Towards Data Science

    April 19, 2026

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    Comments are closed.

    Editors Picks

    Hisense U7SG TV Review (2026): Better Design, Great Value

    April 19, 2026

    Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)

    April 19, 2026

    Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live

    April 19, 2026

    Dreaming in Cubes | Towards Data Science

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Modular BBM Hiro electric moto adds 60+ miles in 9 minutes

    November 2, 2025

    OpenAI spills technical details about how its AI coding agent works

    February 2, 2026

    America’s gambling machine runs on tech even as router ban looms

    March 25, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.