Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Masks and distancing protect chimps from human diseases
    • London-based Latent Technology raises €7 million to redefine game animation with generative physics
    • The Best Car Vacuums (2025), Tested and Reviewed
    • Air Fryers Are the Best Warm Weather Kitchen Appliance, and I Have Data to Prove It
    • NatWest apologises as banking app goes offline
    • 9 AI Hentai Chatbots No Sign Up
    • Volvo’s adaptive seatbelt enhances passenger safety
    • Startup-focused publication Trending Topics acquired by Vienna-based AI company newsrooms.ai
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Friday, June 6
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»2-bit VPTQ: 6.5x Smaller LLMs while Preserving 95% Accuracy
    Artificial Intelligence

    2-bit VPTQ: 6.5x Smaller LLMs while Preserving 95% Accuracy

    Editor Times FeaturedBy Editor Times FeaturedJanuary 31, 2025No Comments1 Min Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Very correct 2-bit quantization for operating 70B LLMs on a 24 GB GPU

    Towards Data Science

    Generated with ChatGPT

    Latest developments in low-bit quantization for LLMs, like AQLM and AutoRound, at the moment are displaying acceptable ranges of degradation in downstream duties, particularly for big fashions. That stated, 2-bit quantization nonetheless introduces noticeable accuracy loss typically.

    One promising algorithm for low-bit quantization is VPTQ (MIT license), proposed by Microsoft. It was launched in October 2024 and has since proven glorious efficiency and effectivity in quantizing massive fashions.

    On this article, we are going to:

    1. Evaluation the VPTQ quantization algorithm.
    2. Reveal how one can use VPTQ fashions, a lot of that are already obtainable. As an illustration, we will simply discover low-bit variants of Llama 3.3 70B, Llama 3.1 405B, and Qwen2.5 72B.
    3. Consider these fashions and focus on the outcomes to know when VPTQ fashions could be a good selection for LLMs in manufacturing.

    Remarkably, 2-bit quantization with VPTQ virtually achieves efficiency corresponding to the unique 16-bit mannequin on duties akin to MMLU. Furthermore, it allows operating Llama 3.1 405B on a single GPU, whereas utilizing much less reminiscence than a 70B mannequin!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    9 AI Hentai Chatbots No Sign Up

    June 6, 2025

    Your DNA Is a Machine Learning Model: It’s Already Out There

    June 6, 2025

    Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other

    June 6, 2025

    How to Design My First AI Agent

    June 5, 2025

    Decision Trees Natively Handle Categorical Data

    June 5, 2025

    Landing your First Machine Learning Job: Startup vs Big Tech vs Academia

    June 5, 2025

    Comments are closed.

    Editors Picks

    Masks and distancing protect chimps from human diseases

    June 6, 2025

    London-based Latent Technology raises €7 million to redefine game animation with generative physics

    June 6, 2025

    The Best Car Vacuums (2025), Tested and Reviewed

    June 6, 2025

    Air Fryers Are the Best Warm Weather Kitchen Appliance, and I Have Data to Prove It

    June 6, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    How AI is Revolutionizing Video Content Creation

    March 21, 2025

    SoftBank in Talks to Invest Up to $25 Billion in OpenAI

    January 31, 2025

    How the delivery app Fantuan grew its US presence to 50+ cities since 2019 by focusing on Asian cuisines, with an app design reminiscent of Chinese platforms (Alex Harring/CNBC)

    June 1, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.