Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Recognition is underrated – here’s why it’s your most valuable leadership tool
    • Motorola’s New Razr Folding Phones Command a Higher Price With Few Upgrades
    • CFTC Sues Wisconsin in Escalating Fight Over Prediction Market Regulation
    • Best AirPods for 2026: Expert Tested and Reviewed
    • AI chess robot offers physical game play and coaching
    • GAMING: Are you getting crushed in Pokemon Champions too?
    • Female Looksmaxxer Alorah Ziva Is Suing Clavicular for Alleged Battery
    • US soldier pleads not guilty in first prediction market insider trading case tied to Polymarket bets
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, April 30
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»5 Ways to Implement Variable Discretization
    Artificial Intelligence

    5 Ways to Implement Variable Discretization

    Editor Times FeaturedBy Editor Times FeaturedMarch 4, 2026No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Though steady variables in real-world datasets present detailed info, they aren’t at all times the simplest kind for modelling and interpretation. That is the place variable discretization comes into play.

    Understanding variable discretization is crucial for knowledge science college students constructing robust ML foundations and AI engineers designing interpretable techniques.

    Early in my knowledge science journey, I primarily targeted on tuning hyperparameters, experimenting with totally different algorithms, and optimising efficiency metrics.

    Once I experimented with variable discretization strategies, I observed how sure ML fashions grew to become extra steady and interpretable. So, I made a decision to clarify these strategies on this article. 

    is variable discretization?

    Some work higher with discrete variables. For instance, if we need to prepare a call tree mannequin on a dataset with steady variables, it’s higher to remodel these variables into discrete variables to scale back the mannequin coaching time. 

    Variable discretization is the method of remodeling steady variables into discrete variables by creating bins, that are a set of steady intervals.

    Benefits of variable discretization

    • Determination timber and naive bayes modles work higher with discrete variables.
    • Discrete options are straightforward to know and interpret.
    • Discretization can scale back the influence of skewed variables and outliers in knowledge.

    In abstract, discretization simplifies knowledge and permits fashions to coach sooner. 

    Disadvantages of variable discretization

    The principle drawback of variable discretization is the lack of info occurred as a result of creation of bins. We have to discover the minimal variety of bins with no important lack of info. The algorithm can’t discover this quantity itself. The consumer must enter the variety of bins as a mannequin hyperparameter. Then, the algorithm will discover the lower factors to match the variety of bins. 

    Supervised and unsupervised discretization

    The principle classes of discretization strategies are supervised and unsupervised. Unsupervised strategies decide the bounds of the bins by utilizing the underlying distribution of the variable, whereas supervised strategies use floor reality values to find out these bounds.

    Varieties of variable discretization

    We are going to focus on the next sorts of variable discretization.

    • Equal-width discretization
    • Equal-frequency discretization
    • Arbitrary-interval discretization
    • Okay-means clustering-based discretization
    • Determination tree-based discretization

    Equal-width discretization

    Because the title suggests, this technique creates bins of equal measurement. The width of a bin is calculated by dividing the vary of values of a variable, X, by the variety of bins, ok.

    Width = {Max(X) — Min(X)} / ok

    Right here, ok is a hyperparameter outlined by the consumer.

    For instance, if the values of X vary between 0 and 50 and ok=5, we get 10 because the bin width and the bins are 0–10, 10–20, 20–30, 30–40 and 40–50. If ok=2, the bin width is 25 and the bins are 0–25 and 25–50. So, the bin width differs primarily based on the worth of the ok hyperparameter. Equal-width discretization assings a distinct variety of knowledge factors to every bin. The bin widths are the identical.

    Let’s implement equal-width discretization utilizing the Iris dataset. technique='uniform' in KBinsDiscretizer() creates bins of equal width.

    # Import libraries
    import pandas as pd
    import matplotlib.pyplot as plt
    from sklearn.datasets import load_iris
    from sklearn.preprocessing import KBinsDiscretizer
    
    # Load the Iris dataset
    iris = load_iris()
    df = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    
    # Choose one characteristic
    characteristic = 'sepal size (cm)'
    X = df[[feature]]
    
    # Initialize
    equal_width = KBinsDiscretizer(
        n_bins=15,
        encode='ordinal',
        technique='uniform'
    )
    
    bins_equal_width = equal_width.fit_transform(X)
    
    plt.hist(bins_equal_width, bins=15)
    plt.title("Equal Width Discretization")
    plt.xlabel(characteristic)
    plt.ylabel("Rely")
    plt.present()
    Equal Width Discretization (Picture by writer)

    The histogram exhibits equal-range width bins.

    Equal-frequency discretization

    This technique allocates the values of the variable into the bins that comprise an identical variety of knowledge factors. The bin widths should not the identical. The bin width is decided by quantiles, which divide the information into 4 equal components. Right here additionally, the variety of bins is outlined by the consumer as a hyperparameter. 

    The foremost drawback of equal-frequency discretization is that there can be many empty bins or bins with a couple of knowledge factors if the distribution of the information factors is skewed. This may end in a big lack of info.

    Let’s implement equal-width discretization utilizing the Iris dataset. technique='quantile' in KBinsDiscretizer() creates balanced bins. Every bin has (roughly) an equal variety of knowledge factors.

    # Import libraries
    import pandas as pd
    from sklearn.datasets import load_iris
    
    # Load the Iris dataset
    iris = load_iris()
    df = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    
    # Choose one characteristic
    characteristic = 'sepal size (cm)'
    X = df[[feature]]
    
    # Initialize
    equal_freq = KBinsDiscretizer(
        n_bins=3,
        encode='ordinal',
        technique='quantile'
    )
    
    bins_equl_freq = equal_freq.fit_transform(X)

    Arbitrary-interval discretization

    On this technique, the consumer allocates the information factors of a variable into bins in such a approach that it is smart (arbitrary). For instance, you might allocate the values of the variable temperature in bins representing “chilly”, “regular” and “scorching”. The precedence is given to the overall sense. There is no such thing as a have to have the identical bin width or an equal variety of knowledge factors in a bin.

    Right here, we manually outline bin boundaries primarily based on area information.

    # Import libraries
    import pandas as pd
    from sklearn.datasets import load_iris
    
    # Load the Iris dataset
    iris = load_iris()
    df = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    
    # Choose one characteristic
    characteristic = 'sepal size (cm)'
    X = df[[feature]]
    
    # Outline customized bins
    custom_bins = [4, 5.5, 6.5, 8]
    
    df['arbitrary'] = pd.lower(
        df[feature],
        bins=custom_bins,
        labels=[0,1,2]
    )

    Okay-means clustering-based discretization

    Okay-means clustering focuses on grouping comparable knowledge factors into clusters. This characteristic can be utilized for variable discretization. On this technique, bins are the clusters recognized by the k-means algorithm. Right here additionally, we have to outline the variety of clusters, ok, as a mannequin hyperparameter. There are a number of strategies to find out the optimum worth of ok. Learn this article to be taught these strategies. 

    Right here, we use KMeans algorithm to create teams which act as discretized classes.

    # Import libraries
    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.datasets import load_iris
    
    # Load the Iris dataset
    iris = load_iris()
    df = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    
    # Choose one characteristic
    characteristic = 'sepal size (cm)'
    X = df[[feature]]
    
    kmeans = KMeans(n_clusters=3, random_state=42)
    
    df['kmeans'] = kmeans.fit_predict(X)

    Determination tree-based discretization

    The choice tree-based discretization course of makes use of determination timber to seek out the bounds of the bins. Not like different strategies, this one mechanically finds the optimum variety of bins. So, the consumer doesn’t have to outline the variety of bins as a hyperparameter. 

    The discretization strategies that we mentioned up to now are supervised strategies. Nonetheless, this technique is an unsupervised technique that means that we additionally use goal values, y, to find out the bounds.

    # Import libraries
    import pandas as pd
    from sklearn.cluster import KMeans
    from sklearn.datasets import load_iris
    from sklearn.tree import DecisionTreeClassifier
    
    # Load the Iris dataset
    iris = load_iris()
    df = pd.DataFrame(iris.knowledge, columns=iris.feature_names)
    
    # Choose one characteristic
    characteristic = 'sepal size (cm)'
    X = df[[feature]]
    
    # Get the goal values
    y = iris.goal
    
    tree = DecisionTreeClassifier(
        max_leaf_nodes=3,
        random_state=42
    )
    
    tree.match(X, y)
    
    # Get leaf node for every pattern
    df['decision_tree'] = tree.apply(X)
    
    tree = DecisionTreeClassifier(
        max_leaf_nodes=3,
        random_state=42
    )
    
    tree.match(X, y)

    That is the overview of variablee discretization strategies. The implementation of every technique can be mentioned in separate articles.

    That is the tip of in the present day’s article.

    Please let me know you probably have any questions or suggestions.

    How about an AI course?

    See you within the subsequent article. Pleased studying to you!

    Iris dataset data

    • Quotation: Dua, D. and Graff, C. (2019). UCI Machine Studying Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: College of California, College of Data and Laptop Science.
    • Supply: https://archive.ics.uci.edu/ml/datasets/iris
    • License: R.A. Fisher holds the copyright of this dataset. Michael Marshall donated this dataset to the general public beneath the Inventive Commons Public Area Dedication License (CC0). You may be taught extra about totally different dataset license varieties here.

    Designed and written by: 
    Rukshan Pramoditha

    2025–03–04



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

    April 30, 2026

    Agentic AI: How to Save on Tokens

    April 29, 2026

    4 YAML Files Instead of PySpark: How We Let Analysts Build Data Pipelines Without Engineers

    April 29, 2026

    Ensembles of Ensembles of Ensembles: A Guide to Stacking

    April 29, 2026

    How AI Policy in South Africa Is Ruining Itself

    April 29, 2026

    PyTorch NaNs Are Silent Killers — So I Built a 3ms Hook to Catch Them at the Exact Layer

    April 28, 2026

    Comments are closed.

    Editors Picks

    Recognition is underrated – here’s why it’s your most valuable leadership tool

    April 30, 2026

    Motorola’s New Razr Folding Phones Command a Higher Price With Few Upgrades

    April 30, 2026

    CFTC Sues Wisconsin in Escalating Fight Over Prediction Market Regulation

    April 30, 2026

    Best AirPods for 2026: Expert Tested and Reviewed

    April 30, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Best Family Phone Plans for 2025

    October 6, 2025

    Traces of Leonardo da Vinci’s DNA May Have Been Discovered on a Red Chalk Drawing Called ‘Holy Child’

    January 8, 2026

    Nintendo’s Virtual Boy Is a Silly but Fun Blast From the Past

    February 4, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.