Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • CSPNet Paper Walkthrough: Just Better, No Tradeoffs
    • Rugged, reliable, and refreshingly simple
    • Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige
    • How Amazon’s expansion into fashion helped Jeff Bezos enter fashion’s inner circle, as he and Lauren Sánchez Bezos become underwriters for this year’s Met Gala (Chavie Lieber/Wall Street Journal)
    • This $25,000 Robot Looks Right Out of Star Wars
    • Sabi’s brain-reading beanie types your thoughts
    • Best Travel Tote Bags for Every Kind of Excursion (2026): Away, Le Pliage, Topo Designs
    • California Sports Betting | Kentucky Derby Online Sports Betting in California
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, May 3
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor
    Artificial Intelligence

    How a 2021 Quantization Algorithm Quietly Outperforms Its 2026 Successor

    Editor Times FeaturedBy Editor Times FeaturedMay 2, 2026No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    [3], an internet vector quantization methodology, drew huge public consideration at ICLR 2026. For me, it seemed very acquainted: it overlaps closely with EDEN, a quantization methodology first launched because the 1-bit methodology DRIVE at NeurIPS 2021 [1] and generalized to arbitrary bit-widths at ICML 2022 [2]. Co-authored on my own, with Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.

    The TurboQuant paper presents two variants: TurboQuant-mse and TurboQuant-prod. In an in depth new comparability [5] we present that TurboQuant-mse is a degenerate case of EDEN, and that the EDEN variants constantly outperform their counterparts.

    How EDEN quantizes a vector

    Suppose you must compress a dd-dimensional vector xx (a gradient replace, an embedding, a KV-cache entry) down to a couple bits per coordinate. EDEN proceeds in 4 steps:

    1. Random rotation — Multiply by a random orthogonal matrix ΠPi. After rotation the coordinates are identically distributed and, for giant dd, roughly Gaussian.
    2. Scalar quantization — Spherical every rotated coordinate to considered one of 2b2^b ranges from a Lloyd–Max codebook educated on the identified rotated coordinate distribution (bb is the goal variety of bits per coordinate).
    3. Scale — Multiply by a scale issue SS.
    4. Inverse rotation — Apply Π⊤Pi^prime to get better an approximation x^hat{x} of the unique vector.

    Whereas earlier works (e.g., Suresh et al. (2017) [6]) used rotation primarily to shrink the coordinates’ dynamic vary (the hole between the most important and smallest coordinate worth), EDEN [1] was, to the perfect of our information, the primary quantization scheme to take advantage of a stronger reality about random rotation: the post-rotation coordinates comply with a identified distribution, which lets us use a deterministic quantizer paired with a closed-form scale that, relying on the appliance, both minimizes MSE or makes the estimate unbiased. Each scales are derived analytically, and the development yields an asymptotic MSE discount over the earlier method.

    Concretely, EDEN’s two variants differ solely within the selection of SS:

    • EDEN-biased — units SS to the closed-form worth that minimizes the reconstruction MSE.
    • EDEN-unbiased — chooses SS so the decompressed output is appropriate on common (𝔼[x^]=xmathbb{E}[hat{x}] = x), which issues notably everytime you common many quantized vectors (e.g., distributed coaching, consideration).

    Lined up in opposition to EDEN, TurboQuant-mse matches at each step besides one: the place EDEN derives the size SS analytically, TurboQuant-mse, though it targets MSE minimization, skips the optimized scaling.

    The pseudocode beneath exhibits the three aspect by aspect.

    Determine 1: EDEN’s pseudocode instantiated for EDEN-biased, EDEN-unbiased, and TurboQuant-mse. The three are equivalent besides at step 5: the selection of S. Picture by writer [5].

    Why the optimum scale is value it

    The worth of making use of correct scale SS grows with bit-width. At b=1b = 1 bit, the hole is marginal. At d=128d = 128 and b=4b = 4 bits, EDEN-biased reduces MSE by 2.25% over TurboQuant-mse, and these are the bit-widths practitioners really use for embeddings and KV caches.

    Throughout dimensions 16 to 4096 and all examined bit-widths b∈{1,2,3,4}b in {1,2,3,4}, EDEN-biased vNMSE (vector-normalized MSE, 𝔼[‖x−x^‖2]/‖x‖2mathbb{E}[|x – hat{x}|^2] / |x|^2) falls beneath TurboQuant-mse’s in each case (Determine 2). As dimension grows very massive, the optimum SS approaches 1 and the 2 algorithms converge, however at sensible dimensions (128–1024), the hole persists.

    Determine 2: vNMSE vs. dimension evaluating EDEN-biased and TurboQuant-mse throughout bit-widths b∈{1,2,3,4}b in {1,2,3,4} (panels left to proper). EDEN-biased (which optimizes the size issue SS) achieves decrease error than TurboQuant-mse (which fixes S=1S=1) at each examined dimension. The curves converge at excessive dimension because the optimum SS approaches 1. Picture by writer [5].

    Unbiased compression: saving greater than a full bit

    The outcomes above concern the biased (MSE-minimizing) variants. Now think about the unbiased case, the place functions resembling distributed coaching, approximate consideration, or inner-product retrieval want 𝔼[x^]=xmathbb{E}[hat{x}] = x as a result of they common many quantized vectors.

    EDEN-unbiased makes use of the identical single-pass algorithm as EDEN-biased, simply with SS chosen for bias correction. TurboQuant’s unbiased variant, TurboQuant-prod, takes a special route: it spends (b−1)(b-1) bits on the biased TurboQuant-mse step and reserves 1 bit for a QJL (Quantized Johnson–Lindenstrauss) [4] correction on the residual (QJL is much like EDEN at b=1b=1, however with greater variance).

    EDEN-unbiased outperforms TurboQuant-prod in each examined configuration, and by a considerable margin. The hole traces to a few structural benefits of EDEN’s single-pass design:

    1. EDEN optimizes the size. TurboQuant-prod inherits TurboQuant-mse’s s=1s=1 first stage, so it carries the identical MSE penalty.
    2. EDEN’s 1-bit building has decrease variance than QJL. In massive dimensions, EDEN’s 1-bit vNMSE converges to π/2−1≈0.57pi/2 – 1 approx 0.57 [1], whereas QJL’s converges to π/2≈1.57pi/2 approx 1.57 [4], roughly 2.75× greater.
    3. EDEN spends the total bit finances on a single unbiased quantizer. TurboQuant-prod splits the finances into (b−1)(b-1) biased bits plus 1 residual bit, which empirically underperforms spending all bb bits on a single unbiased quantizer [5].

    These results compound. The consequence: 1-bit, 2-bit, and 3-bit EDEN-unbiased are every extra correct than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively (Determine 3). By swapping in EDEN you’ll be able to drop a bit per coordinate and nonetheless match TurboQuant-prod’s accuracy.

    Determine 3: vNMSE vs. dimension evaluating EDEN-unbiased and TurboQuant-prod throughout bit-widths b∈{1,2,3,4}b in {1,2,3,4}(panels left to proper). EDEN-unbiased achieves decrease error at each dimension. The hole is massive sufficient that EDEN with bb bits usually outperforms TurboQuant-prod with b+1b + 1 bits. Picture by writer [5].

    On TurboQuant’s personal benchmarks

    The identical image holds on the usual ANN benchmarks TurboQuant evaluates on, Stanford’s GloVe pre-trained word vectors (Open Information Commons Public Domain Dedication and License v1.0) and Qdrant’s dbpedia-entities-openai3-text-embedding-3-large embeddings (Apache 2.0), utilizing TurboQuant’s revealed analysis code:

    EDEN-biased achieves decrease MSE than TurboQuant-mse, EDEN-unbiased achieves markedly decrease inner-product error than TurboQuant-prod, and nearest-neighbor recall on each datasets favors EDEN (Determine 4).

    Determine 4: Nearest-neighbor recall on GloVe and OpenAI3 embeddings at 2 and 4 bits per coordinate. EDEN-unbiased outperforms TurboQuant-prod throughout all 4 settings. Picture by writer [5].

    Takeaway: use EDEN; optimum scaling issues

    EDEN’s scale connects the identified post-rotation distribution to an analytically optimum quantizer. TurboQuant-mse retains EDEN’s rotation and the codebook however pins S=1S=1, which is what makes it a strictly weaker particular case. TurboQuant-prod provides a 1-bit QJL stage on prime of that, the place EDEN-unbiased will get the identical property, with higher accuracy, by simply selecting a bias-correcting scale.

    • For MSE-targeted compression (mannequin weight quantization, nearest-neighbor search, KV cache): EDEN-biased computes the optimum scale SS and constantly beats TurboQuant-mse (which is EDEN with S=1S=1 fastened).
    • For unbiased estimation (distributed imply estimation, approximate consideration, inner-product retrieval): EDEN-unbiased considerably outperforms TurboQuant-prod’s bit-splitting technique, by margins value greater than a full bit per coordinate.

    EDEN was initially developed for distributed imply estimation in federated and distributed coaching. Subsequent work has, for instance, utilized it to embedding compression for doc re-ranking (SDR, 2022 [8]), tailored it for NVFP4 LLM coaching (MS-EDEN in Quartet II, 2026 [10]), generalized it to vector quantization for data-free LLM weight compression (HIGGS, 2025 [9]), which was then used for KV-cache compression (AQUA-KV, 2025 [11]).

    EDEN implementations can be found: in PyTorch and TensorFlow, in Intel’s OpenFL [7], and its 1-bit variant in Google’s FedJax, TensorFlow Federated, and TensorFlow Model Optimization.

    For the total technical comparability evaluation with TurboQuant (all figures, detailed experimental methodology), see our notice [5].

    For the unique derivations, proofs, and additional extensions, see our authentic papers [1] [2].

    References

    1. S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Mean Estimation (2021), NeurIPS 2021.
    2. S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning (2022), ICML 2022.
    3. A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (2026), ICLR 2026.
    4. A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.
    5. R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work (2026), arXiv:2604.18555.
    6. A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan, Distributed Mean Estimation with Limited Communication (2017), ICML 2017.
    7. VMware Open Supply Weblog, VMware Research Group’s EDEN Becomes Part of OpenFL (November 2022).
    8. N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Efficient Neural Re-ranking using Succinct Document Representation (2022), ACL 2022.
    9. V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (2025), NAACL 2025.
    10. A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation (2026), arXiv:2601.22813.
    11. A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models (2025), ICML 2025.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    CSPNet Paper Walkthrough: Just Better, No Tradeoffs

    May 3, 2026

    Which Regularizer Should You Actually Use? Lessons from 134,400 Simulations

    May 2, 2026

    You’re allowed to use AI to help make a movie, but you’re not allowed to use AI actors or writers

    May 2, 2026

    Ghost: A Database for Our Times?

    May 1, 2026

    How to Get Hired in the AI Era

    May 1, 2026

    Churn Without Fragmentation: How a Party-Label Bug Reversed My Headline Finding

    May 1, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    CSPNet Paper Walkthrough: Just Better, No Tradeoffs

    May 3, 2026

    Rugged, reliable, and refreshingly simple

    May 3, 2026

    Asus Zenbook A16 (2026) Review: Savor the Power, Ignore the Beige

    May 3, 2026

    How Amazon’s expansion into fashion helped Jeff Bezos enter fashion’s inner circle, as he and Lauren Sánchez Bezos become underwriters for this year’s Met Gala (Chavie Lieber/Wall Street Journal)

    May 3, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    10 things I learned from burning myself out with AI coding agents

    January 19, 2026

    Here’s Exactly How Much Coffee to Pack in a Shot of Espresso

    March 24, 2026

    Amazon MGM’s Project Hail Mary becomes its highest-grossing film ever, crossing $300M globally, including $54.1M just this weekend; the movie cost $200M to make (Brent Lang/Variety)

    March 30, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.