Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine
    • 15-second semicylinder air tent unboxes from the cube
    • Emergency First Responders Say Waymos Are Getting Worse
    • Motorola Razr Fold vs. Samsung Galaxy Z Fold 7: How the Book-Style Phones Compare
    • Agentic AI: How to Save on Tokens
    • Lightweight ebike conversion kit electrifies your bike
    • Taylor Swift Wants to Trademark Her Likeness. These TikTok Deepfake Ads Show Why
    • New Releases on Prime Video in May 2026: Jack Reacher, Spider-Noir and More
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, April 30
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Explainable Anomaly Detection with RuleFit: An Intuitive Guide
    Artificial Intelligence

    Explainable Anomaly Detection with RuleFit: An Intuitive Guide

    Editor Times FeaturedBy Editor Times FeaturedJuly 5, 2025No Comments14 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    your anomaly detection outcomes to your stakeholders, the instant subsequent query is all the time “why?”.

    In observe, merely flagging an anomaly isn’t sufficient. Understanding what went unsuitable is crucial to figuring out the most effective subsequent motion.

    But, most machine learning-based anomaly detection strategies cease at producing an anomaly rating. They’re black-box in nature, which makes it painful to make sense of their outputs-why does this pattern have a better anomaly rating than its neighbors?

    To deal with this explainability problem, you could have doubtless already resorted to in style eXplainable AI (XAI) methods. Maybe you might be calculating function significance to determine which variables are driving the abnormality, or you might be working counterfactual evaluation to see how shut a case was to regular.

    These are helpful, however what in the event you may do extra? What in the event you can derive a set of interpretable IF-THEN guidelines that characterize the recognized anomalies?

    That is precisely what the RuleFit algorithm [1] guarantees.

    On this put up, we’ll discover how the RuleFit algorithm works intuitively, how it may be utilized to clarify detected anomalies, and stroll by a concrete case research.


    1. How Does It Work?

    Earlier than diving into the technical particulars, let’s first make clear what we goal to have after making use of the algorithm: We wish to have a set of IF-THEN guidelines that quantitatively characterize the irregular samples, in addition to the significance of these guidelines.

    To get there, we have to reply two questions:

    (1) How can we generate significant IF-THEN circumstances from the information?

    (2) How can we calculate the rule significance rating to find out which of them truly matter?

    The RuleFit algorithm addresses these questions by splitting the work into two complementary elements, the “Rule” and the “Match”.

    1.1 The “Rule” in RuleFit

    In RuleFit, a rule appears to be like like this:

    IF x1 < 10 AND x2 > 5 THEN 1 ELSE 0

    Would this construction look a bit extra acquainted if we visualize it like this:

    Determine 1. A rule is only one particular path by a choice tree. (Picture by creator)

    Sure, it’s a resolution tree! The rule right here is simply traversing one particular path by the tree, from the foundation node to the leaf node.

    In RuleFit, the rule era course of closely depends on constructing resolution bushes, which predict the goal end result given the enter options. As soon as the tree is constructed, any path from the foundation to a node in a tree may be transformed to a choice rule, as we now have simply seen within the instance above.

    To make sure the principles are numerous, RuleFit doesn’t simply match one resolution tree. As an alternative, it leverages tree ensemble algorithms (e.g., random forest, Gradient Boosting bushes, and many others.) to generate many alternative resolution bushes.

    Additionally, the depths of these bushes are, typically, completely different. This brings the advantages of producing guidelines with variable lengths, additional enhancing the variety.

    Right here, we should always notice that though the ensemble bushes are constructed with predicting the goal end result in thoughts, the RuleFit algorithm does not likely care in regards to the finish prediction outcomes. It merely makes use of this tree-building train because the car to extract significant, quantitative guidelines.

    Successfully, because of this we’ll discard the anticipated worth in every node and solely hold the circumstances that lead us to a node. These circumstances produce the principles we care about.

    Okay, we will now wrap up the primary processing step within the RuleFit algorithm: the rule constructing. The end result of this step is a pool of candidate guidelines that would doubtlessly clarify the precise knowledge conduct.

    However out of all these guidelines, which of them truly deserve our consideration?

    Properly, that is the place the second step of RuleFit is available in. We “match” to rank.

    1.2 The “Match” in RuleFit

    Primarily, RuleFit uncovers an important guidelines by way of function choice.

    First, RuleFit treats every rule as a brand new binary function, that’s, if the rule is happy for a particular pattern, it will get a price of 1 for this binary function; in any other case, its worth is 0.

    Then, RuleFit performs sparse linear regression with Lasso by utilizing all of the “uncooked” options from the unique dataset, in addition to the newly engineered binary options derived from the principles, to foretell the goal end result. This fashion, every function (uncooked options + binary rule options) will get a coefficient.

    One key attribute of Lasso is that its loss perform forces the coefficients of these unimportant options to be precisely zero. This successfully means these unimportant options are faraway from the mannequin.

    Because of this, by merely analyzing which binary rule options survived the Lasso evaluation, we might instantly know which guidelines are vital by way of getting correct predictions of the goal end result. As well as, by trying on the coefficient magnitudes related to the rule options, we might have the ability to rank the significance of the principles.

    1.3 Recap

    We’ve got simply coated the important principle behind the RuleFit algorithm. To summarize, we will view this strategy as a two-step answer for offering explainability:

    (1) It first extracts the principles by coaching an ensemble of resolution bushes. That’s the “Rule” half.

    (2) It then cleverly converts these guidelines into binary options and performs normal function choice by utilizing sparse linear regression (Lasso). That’s the “Match” half.

    Lastly, the surviving guidelines with non-zero coefficients are vital ones which can be value our consideration.

    At this level, you could have observed that “predicting goal end result” pops up at each the “Rule” and “Match” steps. If we’re coping with a regression or classification downside, it’s simply comprehensible that the “goal end result” is the numerical worth or the label we wish to predict, and the principles may be interpreted as patterns that drive the prediction.

    However what about anomaly detection, which is essentially an unsupervised process? How can we apply RuleFit there?


    2. Anomaly Rationalization with RuleFit

    2.1 Utility Sample

    To start with, we have to rework the unsupervised explainability downside right into a supervised one. Right here’s how.

    As soon as we now have our anomaly detection outcomes (doesn’t matter which algorithm we used), we will create binary labels, i.e., 1 for an recognized anomaly and 0 for a standard knowledge level, as our “goal end result.” This fashion, we now have precisely what RuleFit wants: the uncooked options, and the goal end result to foretell.

    Then, the RuleFit can work its magic to generate a pool of candidate guidelines and match a sparse linear regression mannequin to retain solely the vital guidelines. The coefficients of the ensuing mannequin would then point out how a lot every rule contributes to the log-odds of an occasion being categorized as an anomaly. To place it one other manner, they inform us which rule combos most strongly push a pattern towards being labeled as anomalous.

    Word which you can, in principle, additionally use the anomaly rating (produced by the first anomaly detection mannequin) because the “goal end result”. It will change the applying of RuleFit from a classification setting to a regression setting.

    Each approaches are legitimate, however they reply barely completely different questions: With the binary label classification setting, the RuleFit uncovers “What makes one thing an anomaly?“; With the anomaly rating regression setting, the RuleFit uncovers “What drives the severity of an anomaly?“.

    In observe, the principles generated by each approaches will in all probability be very comparable. However, utilizing a binary anomaly label because the goal for a RuleFit is extra generally used for explaining detected anomalies. It’s simple by way of interpretation and direct applicability to creating enterprise guidelines for flagging future anomalies.

    2.2 Case Examine

    Let’s stroll by a concrete instance to see how RuleFit works in motion. Right here, we’ll create an anomaly detection situation utilizing the Iris dataset [2] (licensed CC BY 4.0), the place every pattern consists of 4 options (sepal_length, sepal_width, petal_length, petal_width) and is labeled as one of many following three classes: Setosa, Versicolor, and Virginica.

    Step 1: Knowledge Setup

    First, we’ll use all Setosa samples (50) and all Versicolor samples (50) because the “regular” samples. For the “irregular” samples, we’ll use a subset of Virginica samples (10).

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    from sklearn.datasets import load_iris
    from sklearn.preprocessing import StandardScaler
    from sklearn.metrics import classification_report, confusion_matrix
    np.random.seed(42)
    
    # Load the Iris dataset
    iris = load_iris()
    X = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    y_true = iris.goal
    
    # Get regular samples (Setosa + Versicolor)
    normal_mask = (y_true == 0) | (y_true == 1)
    X_normal_all = X[normal_mask].copy()
    
    # Get Virginica samples
    virginica_mask = (y_true == 2)
    X_virginica = X[virginica_mask].copy()
    
    # Randomly choose 10
    anomaly_indices = np.random.alternative(len(X_virginica), dimension=10, substitute=False)
    X_anomalies = X_virginica.iloc[anomaly_indices].copy()

    To make the situation extra real looking, we create a separate coaching set and check set. The practice set accommodates pure “regular” samples, whereas the check set consists of randomly sampled 20 “regular” samples and 10 “irregular” samples.

    train_indices = np.random.alternative(len(X_normal_all), dimension=80, substitute=False)
    test_indices = np.setdiff1d(np.arange(len(X_normal_all)), train_indices)
    
    X_train = X_normal_all.iloc[train_indices].copy()
    X_normal_test = X_normal_all.iloc[test_indices].copy()
    
    # Create check set (20 regular + 10 anomalous)
    X_test = pd.concat([X_normal_test, X_anomalies], ignore_index=True)
    y_test_true = np.concatenate([
        np.zeros(len(X_normal_test)),   
        np.ones(len(X_anomalies))       
    ])

    Step 2: Anomaly Detection

    Subsequent, we carry out anomaly detection. Right here, we faux we don’t know the precise labels. On this case research, we apply Native Outlier Issue (LOF) because the anomaly detection algorithm, which locates anomalies by measuring how remoted a knowledge level is in comparison with the density of its native neighbors. In fact, you too can attempt different anomaly detection algorithms, comparable to Gaussian Combination Fashions (GMM), Okay-Nearest Neighbors (KNN), and Autoencoders, amongst others. Nevertheless, understand that the intention right here is simply to get the detection outcomes, our important focus is the anomaly clarification in step 3.

    Particularly, we’ll use the pyOD library to coach the mannequin and make inferences:

    # Set up the pyOD library
    #!pip set up pyod
    
    from pyod.fashions.lof import LOF
    
    # Standardize options
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.rework(X_test)
    
    # Native Outlier Issue
    lof = LOF(n_neighbors=3)
    lof.match(X_train_scaled)
    
    train_scores = lof.decision_function(X_train_scaled)
    test_scores = lof.decision_function(X_test_scaled)
    threshold = np.percentile(train_scores_lof, 99)
    y_pred = (test_scores > threshold).astype(int)

    Discover that we now have used the 99% quantile of the anomaly scores obtained on the coaching set as the edge. For particular person check samples, if its anomaly rating is larger than the edge, this pattern shall be labeled as “anomaly”. In any other case, the pattern is taken into account “regular”.

    At this stage, we will rapidly examine the detection efficiency with:

    classification_report(y_test_true, y_pred, target_names=['Normal', 'Anomaly'])
    (Picture by creator)

    Not tremendous nice outcomes. Out of 10 true anomalies, solely 5 of them are caught. Nevertheless, the excellent news is that LOF didn’t produce any false positives. You possibly can additional enhance the efficiency by tuning the LOF mannequin hyperparameters, adjusting the edge, and even contemplating ensemble studying methods. However be mindful: our objective right here is to not get the most effective detection accuracy. As an alternative, we goal to see if RuleFit can correctly generate guidelines to clarify the anomalies detected by the LOF mannequin.

    Step 3: Anomaly Rationalization

    Now we’re attending to the core matter. To use RuleFit, let’s first set up the library from imodels, which is a sklearn-compatible, Interpretable ML package deal for concise, clear, and correct predictive modeling:

    pip set up imodels

    On this case, we’ll think about a binary label classification setting, the place the irregular samples (within the check set) flagged by the LOF mannequin are labeled as 1, and different un-flagged regular samples (additionally within the check set) are labeled as 0. Word that we’re labeling based mostly on LOF’s detection outcomes, not the precise floor fact, which we faux we don’t know.

    To provoke the RuleFit mannequin:

    from imodels import RuleFitClassifier
    
    rf = RuleFitClassifier(                 
            max_rules = 30,           
            lin_standardise=True,           
            include_linear=True,           
            random_state = 42
    )

    We are able to then proceed with becoming the RuleFit mannequin:

    rf.match(
        X_test, 
        y_pred, 
        feature_names=X_test.columns
    )

    In observe, it’s often observe to do a fast sanity examine to guage how nicely the RuleFit mannequin’s predictions align with the anomaly labels decided by the LOF algorithm:

    from sklearn.metrics import accuracy_score, roc_auc_score
    
    y_label = rf.predict(X_test)               
    y_prob  = rf.predict_proba(X_test)[:, 1]   
    
    print("accuracy:", accuracy_score(y_pred, y_label))
    print("roc-auc:", roc_auc_score (y_pred, y_prob))

    For our case, we see that each printouts are 1. This confirms that the RuleFit mannequin has efficiently realized the patterns that LOF used to determine anomalies. In your personal issues, in the event you observe values a lot decrease than 1, you would wish to fine-tune your RuleFit hyperparameters.

    Now let’s look at the principles:

    guidelines = rf._get_rules()
    guidelines = guidelines[rules.coef != 0]                         
    guidelines = guidelines[~rules.type.str.contains('linear')]      
    guidelines['abs_coef'] = guidelines['coef'].abs()
    guidelines = guidelines.sort_values('significance', ascending=False)

    The RuleFit algorithm returns a complete of 24 guidelines. A snapshot is proven beneath:

    (Picture by creator)

    Let’s first make clear the that means of the outcomes columns:

    • The “rule” column and the “abs_coef” column are self-explanatory.
    • The “sort” column has two distinctive values: “linear” and “rule”. The “linear” denotes the unique enter options, whereas “rule” denotes the “IF-THEN” circumstances generated from resolution bushes.
    • The “coef” column represents the coefficients produced by the Lasso regression evaluation. A optimistic worth signifies that if the rule applies, the log-odds of being categorized because the irregular class will increase. A bigger magnitude signifies a stronger affect of that rule on the prediction.
    • The “assist” column data the fraction of information samples the place the rule applies.
    • The “significance” column is calculated as absolutely the worth of the coefficient multiplied by the usual deviation of the binary (0 or 1) values that the rule takes on. So why this calculation? As we now have simply mentioned, a bigger absolute coefficient means a stronger direct influence on the log-odds. That’s clear. For the usual deviation time period, it successfully measures the “discriminative energy” of the principles. For instance, if a rule is sort of all the time TRUE (very small normal deviation), it doesn’t break up your knowledge successfully. The identical holds if the rule is sort of all the time FALSE. In different phrases, the rule can’t clarify a lot of the variation within the goal variable. Subsequently, the significance rating combines each the energy of the rule’s influence (coefficient magnitude) and the way nicely it discriminates between completely different samples (normal deviation).

    For our particular case, we see just one high-impact rule (Rule #24):

    If a flower’s petal is longer than 5.45 cm and wider than 2 cm, the chances that LOF classifies it as “anomalous” enhance 85-fold. (Word that exp(4.448999) ~= 85)

    Guidelines #26 and #27 are nested inside Rule #24. That is frequent in observe, as RuleFit usually produces “households” of comparable guidelines as a result of they arrive from neighbouring tree splits. Subsequently, the one rule that actually issues for characterizing the LOF-identified anomalies is Rule #24.

    Additionally, we see that the assist for Rule #24 is 0.1667 (5/30). This successfully signifies that all 5 LOF-identified anomalies may be defined by this rule. We are able to see that extra clearly within the determine beneath:

    There you will have it: the rule to explain the recognized anomalies!


    3. Conclusion

    On this weblog put up, we explored the RuleFit algorithm as a robust answer for explainable anomaly detection. We mentioned:

    • The way it works: A two-step strategy the place resolution bushes are first fitted to derive significant guidelines, adopted by a sparse linear regression to rank the rule significance.
    • Easy methods to apply to anomaly clarification: Use the detection outcomes because the pseudo labels and use them because the “goal end result” for the RuleFit mannequin.

    With RuleFit in your modeling toolkit, the subsequent time stakeholders ask “Why is that this anomaly?”, you’ll have concrete IF-THEN guidelines that they’ll perceive and act upon.

    Reference

    [1] Jerome H. Friedman, Bogdan E. Popescu, Predictive studying by way of rule ensembles, arXiv, 2008.

    [2] Fisher, R. A., Iris [Data set]. UCI Machine Learning Repository, 1936.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

    April 30, 2026

    Agentic AI: How to Save on Tokens

    April 29, 2026

    4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

    April 29, 2026

    Ensembles of Ensembles of Ensembles: A Guide to Stacking

    April 29, 2026

    How AI Policy in South Africa Is Ruining Itself

    April 29, 2026

    PyTorch NaNs Are Silent Killers — So I Built a 3ms Hook to Catch Them at the Exact Layer

    April 28, 2026

    Comments are closed.

    Editors Picks

    System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

    April 30, 2026

    15-second semicylinder air tent unboxes from the cube

    April 30, 2026

    Emergency First Responders Say Waymos Are Getting Worse

    April 29, 2026

    Motorola Razr Fold vs. Samsung Galaxy Z Fold 7: How the Book-Style Phones Compare

    April 29, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    How Shein is leveraging its Reliance Retail partnership in its return to India, as fashion now accounts for 27% of India’s online sales, up from 16% in 2020 (Manish Singh/India Dispatch)

    July 6, 2025

    Bootstrapped for seven years, French IT scale-up Fleet enters first LBO at €100 million valuation

    February 2, 2026

    This Is What Will Happen if You Try to Make Popcorn in an Air Fryer

    March 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.