Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Francis Bacon and the Scientific Method
    • Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval
    • Sulfur lava exoplanet L 98-59 d defies classification
    • Hisense U7SG TV Review (2026): Better Design, Great Value
    • Google is in talks with Marvell Technology to develop a memory processing unit that works alongside TPUs, and a new TPU for running AI models (Qianer Liu/The Information)
    • Premier League Soccer: Stream Man City vs. Arsenal From Anywhere Live
    • Dreaming in Cubes | Towards Data Science
    • Onda tiny house flips layout to fit three bedrooms and two bathrooms
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»AI Technology News»Accuracy, Cost, and Performance with NVIDIA Nemotron Models
    AI Technology News

    Accuracy, Cost, and Performance with NVIDIA Nemotron Models

    Editor Times FeaturedBy Editor Times FeaturedAugust 12, 2025No Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Each week, new fashions are launched, together with dozens of benchmarks. However what does that imply for a practitioner deciding which mannequin to make use of? How ought to they method assessing the standard of a newly launched mannequin? And the way do benchmarked capabilities like reasoning translate into real-world worth?

    On this publish, we’ll consider the newly launched NVIDIA Llama Nemotron Super 49B 1.5 mannequin. We use syftr, our generative AI workflow exploration and analysis framework, to floor the evaluation in an actual enterprise downside and discover the tradeoffs of a multi-objective evaluation.

    After inspecting greater than a thousand workflows, we provide actionable steerage on the use instances the place the mannequin shines.

    The variety of parameters depend, however they’re not all the pieces

    It ought to be no shock that parameter depend drives a lot of the price of serving LLMs. Weights have to be loaded into reminiscence, and key-value (KV) matrices cached. Greater fashions usually carry out higher — frontier fashions are virtually all the time large. GPU developments have been foundational to AI’s rise by enabling these more and more massive fashions.

    However scale alone doesn’t assure efficiency.

    Newer generations of fashions typically outperform their bigger predecessors, even on the identical parameter depend. The Nemotron fashions  from NVIDIA are a great instance. The fashions construct on current open fashions, , pruning pointless parameters, and distilling new capabilities.

    Which means a smaller Nemotron mannequin can typically outperform its bigger predecessor throughout a number of dimensions: quicker inference, decrease reminiscence use, and stronger reasoning.

    We wished to quantify these tradeoffs — particularly towards a number of the largest fashions within the present era.

    How rather more correct? How rather more environment friendly? So, we loaded them onto our cluster and started working.

    How we assessed accuracy and price

    Step 1: Establish the issue

    With fashions in hand, we wanted a real-world problem. One which exams reasoning, comprehension, and efficiency inside an agentic AI circulation.

    Image a junior monetary analyst attempting to ramp up on an organization. They need to have the ability to reply questions like: “Does Boeing have an enhancing gross margin profile as of FY2022?”

    However in addition they want to elucidate the relevance of that metric: “If gross margin just isn’t a helpful metric, clarify why.”

    To check our fashions, we’ll assign it the duty of synthesizing information delivered by way of an agentic AI circulation after which measure their capacity to effectively ship an correct reply.

    To reply each kinds of questions accurately, the fashions must:

    • Pull information from a number of monetary paperwork (reminiscent of annual and quarterly reviews)
    • Examine and interpret figures throughout time intervals
    • Synthesize a proof grounded in context

    FinanceBench benchmark is designed for precisely one of these activity. It pairs filings with expert-validated Q&A, making it a powerful proxy for actual enterprise workflows. That’s the testbed we used.

    Step 2: Fashions to workflows

    To check in a context like this, that you must construct and perceive the complete workflow — not simply the immediate — so you possibly can feed the appropriate context into the mannequin.

    And you must do that each time you consider a brand new mannequin–workflow pair.

    With syftr, we’re capable of run a whole bunch of workflows throughout completely different fashions, shortly surfacing tradeoffs. The result’s a set of Pareto-optimal flows just like the one proven beneath.

    Within the decrease left, you’ll see easy pipelines utilizing one other mannequin because the synthesizing LLM. These are cheap to run, however their accuracy is poor.

    Within the higher proper are probably the most correct —  however extra  costly since these usually depend on agentic methods that break down the query, make a number of LLM calls, and analyze every chunk independently. That is why reasoning requires environment friendly computing and optimizations to maintain inference prices in verify.

    Nemotron reveals up strongly right here, holding its personal throughout the remaining Pareto frontier.

    Step 3: Deep dive

    To higher perceive mannequin efficiency, we grouped workflows by the LLM used at every step and plotted the Pareto frontier for every.

    financebench response synthesizer llm

    The efficiency hole is obvious. Most fashions battle to get anyplace close to Nemotron’s efficiency. Some have hassle producing affordable solutions with out heavy context engineering. Even then, it stays much less correct and dearer than bigger fashions.

    However once we swap to utilizing the LLM for (Hypothetical Doc Embeddings) HyDE, the story modifications. (Flows marked N/A don’t embody HyDE.)

    financebench hyde retrieval generative model

    Right here, a number of fashions carry out nicely, with affordability whereas delivering excessive‑accuracy flows.

     Key takeaways:

    • Nemotron shines in synthesis, producing excessive‑constancy solutions with out added value
    • Utilizing different fashions that excel at HyDE frees Nemotron to give attention to high-value reasoning
    • Hybrid flows are probably the most environment friendly setup, utilizing every mannequin the place it performs finest

    Optimizing for worth, not simply dimension

    When evaluating new fashions, success isn’t nearly accuracy. It’s about discovering the appropriate stability of high quality, value, and match to your workflow. Measuring latency, effectivity, and total influence helps make sure you’re getting actual worth 

    NVIDIA Nemotron fashions are constructed with this in thoughts. They’re designed not just for energy, however for sensible efficiency that helps groups drive influence with out runaway prices.

    Pair that with a structured, Syftr-guided analysis course of, and also you’ve received a repeatable option to keep forward of mannequin churn whereas conserving compute and funds in verify.

    To discover syftr additional, take a look at the GitHub repository.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    How robots learn: A brief, contemporary history

    April 17, 2026

    Vibe Coding Best Practices: 5 Claude Code Habits

    April 16, 2026

    Why having “humans in the loop” in an AI war is an illusion

    April 16, 2026

    Making AI operational in constrained public sector environments

    April 16, 2026

    Treating enterprise AI as an operating layer

    April 16, 2026

    Building trust in the AI era with privacy-led UX

    April 15, 2026

    Comments are closed.

    Editors Picks

    Francis Bacon and the Scientific Method

    April 19, 2026

    Proxy-Pointer RAG: Structure Meets Scale at 100% Accuracy with Smarter Retrieval

    April 19, 2026

    Sulfur lava exoplanet L 98-59 d defies classification

    April 19, 2026

    Hisense U7SG TV Review (2026): Better Design, Great Value

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    The State of AI: the economic singularity

    December 1, 2025

    10 Best Early Black Friday Deals at Best Buy (2025)

    November 20, 2025

    Coinbase is in advanced talks to buy Deribit, the largest bitcoin and ether options trading platform, which was reportedly valued at $4B-$5B in January (Bloomberg)

    March 21, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.