Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Code Is Cheap. Engineering Judgement Is Now the Scarce Resource
    • Build a digital twin agent (with guardrails)
    • Robotiq Launches IQ to Make Palletizing Automation Faster and More Predictable
    • Leica Cine Compact 1: Premium 4K smart projector
    • Coach vs mentor – Who can help you level up your career?
    • Flush With Cash From OpenAI, Opal Is Making an AI-Powered Audio Gadget
    • Dozens of Red Hat packages backdoored through its official NPM channel
    • Microsoft Build 2026 Kicks Off Today: Live Updates on Copilot AI and Dev Tools
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The Machine Learning “Advent Calendar” Day 19: Bagging in Excel
    Artificial Intelligence

    The Machine Learning “Advent Calendar” Day 19: Bagging in Excel

    Editor Times FeaturedBy Editor Times FeaturedDecember 19, 2025No Comments11 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    For 18 days, we’ve explored a lot of the core machine studying fashions, organized into three main households: distance- and density-based fashions, tree- or rule-based fashions, and weight-based fashions.

    Up thus far, every article centered on a single mannequin, educated by itself. Ensemble studying modifications this angle fully. It isn’t a standalone mannequin. As an alternative, it’s a method of combining these base fashions to construct one thing new.

    As illustrated within the diagram beneath, an ensemble is a meta-model. It sits on prime of particular person fashions and aggregates their predictions.

    Trois studying steps in Machine Studying – Picture by writer

    Voting: the only ensemble concept

    The best type of ensemble studying is voting.

    The concept is nearly trivial: prepare a number of fashions, take their predictions, and compute the typical. If one mannequin is incorrect in a single route and one other is incorrect in the other way, the errors ought to cancel out. Not less than, that’s the instinct.

    On paper, this sounds affordable. In apply, issues are very totally different.

    As quickly as you attempt voting on actual fashions, one reality turns into apparent: voting isn’t magic. Merely averaging predictions doesn’t assure higher efficiency. In lots of instances, it truly makes issues worse.

    The reason being easy. Once you mix fashions that behave very otherwise, you additionally mix their weaknesses. If the fashions don’t make complementary errors, averaging can dilute helpful construction as a substitute of reinforcing it.

    To see this clearly, take into account a quite simple instance. Take a choice tree and a linear regression educated on the identical dataset. The choice tree captures native, non-linear patterns. The linear regression captures a world linear development. Once you common their predictions, you don’t receive a greater mannequin. You receive a compromise that’s typically worse than every mannequin taken individually.

    Voting machine studying – all photos by writer

    This illustrates an vital level: ensemble studying requires greater than averaging. It requires a technique. A method to mix fashions that truly improves stability or generalization.

    Furthermore, if we take into account the ensemble as a single mannequin, then it have to be educated as such. Easy averaging presents no parameter to regulate. There’s nothing to study, nothing to optimize.

    One doable enchancment to voting is to assign totally different weights to the fashions. As an alternative of giving every mannequin the identical significance, we might attempt to study which of them ought to matter extra. However as quickly as we introduce weights, a brand new query seems: how will we prepare them? At that time, the ensemble itself turns into a mannequin that must be fitted.

    This remark leads naturally to extra structured ensemble strategies.

    On this article, we start with one statistical method to resample the coaching dataset earlier than averaging: Bagging.

    The instinct behind Bagging

    Why “bagging”?

    What’s bagging?

    The reply is definitely hidden within the identify itself.

    Bagging = Bootstrap + Aggregating.

    You possibly can instantly inform {that a} mathematician or a statistician named it. 🙂

    Behind this barely intimidating phrase, the concept is very simple. Bagging is about doing two issues: first, creating many variations of the dataset utilizing the bootstrap, and second, aggregating the outcomes obtained from these datasets.

    The core concept is due to this fact not about altering the mannequin. It’s about altering the information.

    Bootstrapping the dataset

    Bootstrapping means sampling the dataset with alternative. Every bootstrap pattern has the identical dimension as the unique dataset, however not the identical observations. Some rows seem a number of instances. Others disappear.

    In Excel, that is very straightforward to implement and, extra importantly, very straightforward to see.

    You begin by including an ID column to your dataset, one distinctive identifier per row. Then, utilizing the RANDBETWEEN operate, you randomly draw row indices. Every draw corresponds to 1 row within the bootstrap pattern. By repeating this course of, you generate a full dataset that appears acquainted, however is barely totally different from the unique one.

    This step alone already makes the concept of bagging concrete. You possibly can actually see the duplicates. You possibly can see which observations are lacking. Nothing is summary.

    Under, you possibly can see examples of bootstrap samples generated from the identical unique dataset. Every pattern tells a barely totally different story, though all of them come from the identical information.

    These various datasets are the inspiration of bagging.

    Dataset generated by writer – picture by writer

    Bagging linear regression: understanding the precept

    Bagging course of

    Sure, that is in all probability the primary time you hear about bagging linear regression.

    In concept, there may be nothing incorrect with it. As we stated earlier, bagging is an ensemble technique that may be utilized to any base mannequin. Linear regression is a mannequin, so technically, it qualifies.

    In apply, nonetheless, you’ll shortly see that this isn’t very helpful.

    However nothing prevents us from doing it. And exactly as a result of it’s not very helpful, it makes for a wonderful studying instance. So allow us to do it.

    For every bootstrap pattern, we match a linear regression. In Excel, that is easy. We will immediately use the LINEST operate to estimate the coefficients. Every colour within the plot corresponds to 1 bootstrap pattern and its related regression line.

    Up to now, all the things behaves precisely as anticipated. The traces are shut to one another, however not similar. Every bootstrap pattern barely modifications the coefficients, and due to this fact the fitted line.

    Bagging of linear regression – picture by writer

    Now comes the important thing remark.

    It’s possible you’ll discover that one further mannequin is plotted in black. This one corresponds to the usual linear regression fitted on the unique dataset, with out bootstrapping.

    What occurs after we evaluate it to the bagged fashions?

    Once we common the predictions of all these linear regressions, the ultimate result’s nonetheless a linear regression. The form of the prediction doesn’t change. The connection between the variables stays linear. We didn’t create a extra expressive mannequin.

    And extra importantly, the bagged mannequin finally ends up being very near the usual linear regression educated on the unique information.

    We will even push the instance additional by utilizing a dataset with a clearly non-linear construction. On this case, every linear regression fitted on a bootstrap pattern struggles in its personal method. Some traces tilt barely upward, others downward, relying on which observations had been duplicated or lacking within the pattern.

    Bagging of linear regression – picture by writer

    Bootstrap confidence intervals

    From a prediction efficiency perspective, bagging linear regression isn’t very helpful.

    Nonetheless, bootstrapping stays extraordinarily helpful for one vital statistical notion: estimating the confidence interval of the predictions.

    As an alternative of trying solely on the common prediction, we are able to have a look at the distribution of predictions produced by all of the bootstrapped fashions. For every enter worth, we now have many predicted values, one from every bootstrap pattern.

    A easy and intuitive method to quantify uncertainty is to compute the customary deviation of those predictions. This customary deviation tells us how delicate the prediction is to modifications within the information. A small worth means the prediction is steady. A big worth means it’s unsure.

    This concept works naturally in Excel. After you have all of the predictions from the bootstrapped fashions, computing their customary deviation is easy. The consequence will be interpreted as a confidence band across the prediction.

    That is clearly seen within the plot beneath. The interpretation is easy: in areas the place the coaching information is sparse or extremely dispersed, the arrogance interval turns into large, as predictions differ considerably throughout bootstrap samples.

    Conversely, the place the information is dense, predictions are extra steady and the arrogance interval narrows.

    Now, after we apply this to non-linear information, one thing turns into very clear. In areas the place the linear mannequin struggles to suit the information, the predictions from totally different bootstrap samples unfold out way more. The arrogance interval turns into wider.

    This is a crucial perception. Even when bagging doesn’t enhance prediction accuracy, it supplies beneficial details about uncertainty. It tells us the place the mannequin is dependable and the place it’s not.

    Seeing these confidence intervals emerge immediately from bootstrap samples in Excel makes this statistical idea very concrete and intuitive.

    Bagging choice timber: from weak learners to a robust mannequin

    Now we transfer to choice timber.

    The precept of bagging stays precisely the identical. We generate a number of bootstrap samples, prepare one mannequin on every of them, after which combination their predictions.

    I improved the Excel implementation to make the splitting course of extra automated. To maintain issues manageable in Excel, we prohibit the timber to a single break up. Constructing deeper timber is feasible, nevertheless it shortly turns into cumbersome in a spreadsheet.

    Under, you possibly can see two of the bootstrapped timber. In whole, I constructed eight of them by merely copying and pasting formulation, which makes the method easy and straightforward to breed.

    Since choice timber are extremely non-linear fashions and their predictions are piecewise fixed, averaging their outputs has a smoothing impact.

    In consequence, bagging naturally smooths the predictions. As an alternative of sharp jumps created by particular person timber, the aggregated mannequin produces extra gradual transitions.

    In Excel, this impact could be very straightforward to look at. The bagged predictions are clearly smoother than the predictions of any single tree.

    A few of you might have already heard of choice stumps, that are choice timber with a most depth of 1. That’s precisely what we use right here. Every mannequin is very simple. By itself, a stump is a weak learner.

    The query right here is:
    is a group of choice stumps adequate when mixed with bagging?

    We’ll come again to this later in my Machine Studying “Creation Calendar”.

    Random Forest: extending bagging

    What about Random Forest?

    That is in all probability one of many favourite fashions amongst information scientists.

    So why not speak about it right here, even in Excel?

    In truth, what we’ve simply constructed is already very near a Random Forest!

    To know why, recall that Random Forest introduces two sources of randomness.

    • The primary one is the bootstrap of the dataset. That is precisely what we’ve already finished with bagging.
    • The second is randomness within the splitting course of. At every break up, solely a random subset of options is taken into account.

    In our case, nonetheless, we solely have one characteristic. Meaning there may be nothing to pick out from. Function randomness merely doesn’t apply.

    In consequence, what we receive right here will be seen as a simplified Random Forest.

    As soon as this idea is obvious, extending the concept to a number of options is simply an extra layer of randomness, not a brand new idea.

    And you could even ask, we are able to apply this precept to Linear Regression, and do a Random

    Conclusion

    Ensemble studying is much less about complicated fashions and extra about managing instability.

    Easy voting is never efficient. Bagging linear regression modifications little and stays largely pedagogical, although it’s helpful for estimating uncertainty. With choice timber, nonetheless, bagging really issues: averaging unstable fashions results in smoother and extra sturdy predictions.

    Random Forest naturally extends this concept by including additional randomness, with out altering the core precept. Seen in Excel, ensemble strategies cease being black packing containers and develop into a logical subsequent step.

    Additional Studying

    Thanks in your assist for my Machine Learning “Advent Calendar“.

    Individuals normally discuss so much about supervised studying, however unsupervised studying is typically ignored, though it could reveal construction that no label might ever present.
    If you wish to discover these concepts additional, listed below are three articles that dive into highly effective unsupervised fashions.

    Gaussian Mixture Model

    An improved and extra versatile model of k-means.

    In contrast to k-means, GMM permits clusters to stretch, rotate, and adapt to the true form of the information.

    However when do k-means and GMM truly produce totally different outcomes?

    Take a look at this text to see concrete examples and visible comparisons.

    Local Outlier Factor (LOF)
    A intelligent technique that compares every level’s native density to its neighbors to detect anomalies.


    All of the Excel information can be found by way of this Kofi link. Your assist means so much to me. The worth will enhance through the month, so early supporters get the perfect worth.

    All Excel/Google sheet information for ML and DL



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Code Is Cheap. Engineering Judgement Is Now the Scarce Resource

    June 2, 2026

    From Regex to Vision Models: Which RAG Technique Fits Which Problem

    June 2, 2026

    Escaping the Valley of Choice in BI

    June 2, 2026

    Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

    June 1, 2026

    RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

    June 1, 2026

    How to Combine Claude Code and Codex for Maximum Coding Power

    June 1, 2026

    Comments are closed.

    Editors Picks

    Code Is Cheap. Engineering Judgement Is Now the Scarce Resource

    June 2, 2026

    Build a digital twin agent (with guardrails)

    June 2, 2026

    Robotiq Launches IQ to Make Palletizing Automation Faster and More Predictable

    June 2, 2026

    Leica Cine Compact 1: Premium 4K smart projector

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Elon Musk Said Grok’s Roasts Would Be ‘Epic’ at Parties—So I Tried It on My Coworkers

    November 25, 2025

    Spray-on plant armor boosts food security and drought resistance

    January 11, 2026

    Breakthrough blood test developed for Chronic Fatigue Syndrome (ME/CFS)

    October 8, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.