Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Single-floor Surya tiny house is designed for comfortable long-term living
    • Perplant raises €1 million to equip tractors with AI “eyes” to cut herbicide use and boost profits for farmers
    • These Privacy-Conscious Gay Dating Apps Want to Dethrone Grindr
    • Tether plans to launch GELT, an “official” stablecoin representing the Georgian lari, with the support of Georgia’s government in an unusual partnership (Reuters)
    • Does Your Internet Hold Up to Its Promise? Vote in the 2026 People’s Picks Awards
    • AI with Model-Based Design: Virtual Sensor Modeling
    • Fender Telecaster 75th anniversary celebration in Nashville
    • Lithuania’s Backoffice secures €150k to build the operating system for European hospitality
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, May 25
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply
    Artificial Intelligence

    Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply

    Editor Times FeaturedBy Editor Times FeaturedOctober 1, 2025No Comments9 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    mentioned about classification metrics like ROC-AUC and Kolmogorov-Smirnov (KS) Statistic in earlier blogs.

    On this weblog, we’ll discover one other essential classification metric referred to as the Gini Coefficient.


    Why do we have now a number of classification metrics?

    Each classification metric tells us the mannequin efficiency from a unique angle. We all know that ROC-AUC provides us the general rating means of a mannequin, whereas KS Statistic exhibits us the place the utmost hole between two teams happens.

    With regards to the Gini Coefficient, it tells us how a lot better our mannequin is than random guessing at rating the positives greater than the negatives.


    First, let’s see how the Gini Coefficient is calculated.

    For this, we once more use the German Credit Dataset.

    Let’s use the identical pattern knowledge that we used to know the calculation of Kolmogorov-Smirnov (KS) Statistic.

    Picture by Writer

    This pattern knowledge was obtained by making use of logistic regression on the German Credit score dataset.

    For the reason that mannequin outputs chances, we chosen a pattern of 10 factors from these chances to display the calculation of the Gini coefficient.

    Calculation

    Step 1: Type the information by predicted chances.

    The pattern knowledge is already sorted descending by predicting chances.

    Step 2: Compute Cumulative Inhabitants and Cumulative Positives.

    Cumulative Inhabitants: The cumulative variety of information thought-about as much as that row.

    Cumulative Inhabitants (%): The proportion of the whole inhabitants lined thus far.

    Cumulative Positives: What number of precise positives (class 2) we’ve seen up so far.

    Cumulative Positives (%): The proportion of positives captured thus far.

    Picture by Writer

    Step 3: Plot X and Y values

    X = Cumulative Inhabitants (%)

    Y = Cumulative Positives (%)

    Right here, let’s use Python to plot these X and Y values.

    Code:

    import matplotlib.pyplot as plt
    
    X = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
    Y = [0.0, 0.25, 0.50, 0.75, 0.75, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00]
    
    # Plot curve
    plt.determine(figsize=(6,6))
    plt.plot(X, Y, marker='o', coloration="cornflowerblue", label="Mannequin Lorenz Curve")
    plt.plot([0,1], [0,1], linestyle="--", coloration="grey", label="Random Mannequin (Diagonal)")
    plt.title("Lorenz Curve from Pattern Information", fontsize=14)
    plt.xlabel("Cumulative Inhabitants % (X)", fontsize=12)
    plt.ylabel("Cumulative Positives % (Y)", fontsize=12)
    plt.legend()
    plt.grid(True)
    plt.present()

    Plot:

    Picture by Writer

    The curve we get once we plot Cumulative Inhabitants (%) and Cumulative Positives (%) is known as the Lorenz curve.

    Step 4: Calculate the realm beneath the Lorenz curve.

    Once we mentioned ROC-AUC, we discovered the realm beneath the curve utilizing the trapezoid method.

    Every area between two factors was handled as a trapezoid, its space was calculated, after which all areas had been added collectively to get the ultimate worth.

    The identical technique is utilized right here to calculate the realm beneath the Lorenz curve.

    Space beneath the Lorenz curve

    Space of Trapezoid:

    $$
    textual content{Space} = frac{1}{2} occasions (y_1 + y_2) occasions (x_2 – x_1)
    $$

    From (0.0, 0.0) to (0.1, 0.25):
    [
    A_1 = frac{1}{2}(0+0.25)(0.1-0.0) = 0.0125
    ]

    From (0.1, 0.25) to (0.2, 0.50):
    [
    A_2 = frac{1}{2}(0.25+0.50)(0.2-0.1) = 0.0375
    ]

    From (0.2, 0.50) to (0.3, 0.75):
    [
    A_3 = frac{1}{2}(0.50+0.75)(0.3-0.2) = 0.0625
    ]

    From (0.3, 0.75) to (0.4, 0.75):
    [
    A_4 = frac{1}{2}(0.75+0.75)(0.4-0.3) = 0.075
    ]

    From (0.4, 0.75) to (0.5, 1.00):
    [
    A_5 = frac{1}{2}(0.75+1.00)(0.5-0.4) = 0.0875
    ]

    From (0.5, 1.00) to (0.6, 1.00):
    [
    A_6 = frac{1}{2}(1.00+1.00)(0.6-0.5) = 0.100
    ]

    From (0.6, 1.00) to (0.7, 1.00):
    [
    A_7 = frac{1}{2}(1.00+1.00)(0.7-0.6) = 0.100
    ]

    From (0.7, 1.00) to (0.8, 1.00):
    [
    A_8 = frac{1}{2}(1.00+1.00)(0.8-0.7) = 0.100
    ]

    From (0.8, 1.00) to (0.9, 1.00):
    [
    A_9 = frac{1}{2}(1.00+1.00)(0.9-0.8) = 0.100
    ]

    From (0.9, 1.00) to (1.0, 1.00):
    [
    A_{10} = frac{1}{2}(1.00+1.00)(1.0-0.9) = 0.100
    ]

    Complete Space Beneath Lorenz Curve:
    [
    A = 0.0125+0.0375+0.0625+0.075+0.0875+0.100+0.100+0.100+0.100+0.100 = 0.775
    ]

    We calculated the realm beneath the Lorenz curve, which is 0.775.

    Right here, we plotted Cumulative Inhabitants (%) and Cumulative Positives (%), and we will observe that the realm beneath this curve exhibits how rapidly the positives (class 2) are being captured as we transfer down the sorted record.

    In our pattern dataset, we have now 4 positives (class 2) and 6 negatives (class 1).

    For an ideal mannequin, by the point we attain 40% of the inhabitants, it captures 100% of the positives.

    The curve appears like this for an ideal mannequin.

    Picture by Writer

    Space beneath the lorenz curve for the proper mannequin.

    [
    begin{aligned}
    text{Perfect Area} &= text{Triangle (0,0 to 0.4,1)} + text{Rectangle (0.4,1 to 1,1)} [6pt]
    &= frac{1}{2} occasions 0.4 occasions 1 ;+; 0.6 occasions 1 [6pt]
    &= 0.2 + 0.6 [6pt]
    &= 0.8
    finish{aligned}
    ]

    We even have one other technique to calculate the Space beneath the curve for the proper mannequin.

    [
    text{Let }pi text{ be the proportion of positives in the dataset.}
    ]

    [
    text{Perfect Area} = frac{1}{2}pi cdot 1 + (1-pi)cdot 1
    ]
    [
    = frac{pi}{2} + (1-pi)
    ]
    [
    = 1 – frac{pi}{2}
    ]

    For our dataset:

    Right here, we have now 4 positives out of 10 information, so: π = 4/10 = 0.4.

    [
    text{Perfect Area} = 1 – frac{0.4}{2} = 1 – 0.2 = 0.8
    ]

    We calculated the realm beneath the lorenz curve for our pattern dataset and in addition for the proper mannequin with identical variety of positives and negatives.

    Now, if we undergo the dataset with out sorting, the positives are evenly unfold out. This implies the speed at which we accumulate positives is similar as the speed at which we transfer by means of the inhabitants.

    That is the random mannequin, and it at all times provides an space beneath the curve of 0.5.

    Picture by Writer

    Step 5: Calculate the Gini Coefficient

    [
    A_{text{model}} = 0.775
    ]

    [
    A_{text{random}} = 0.5
    ]
    [
    A_{text{perfect}} = 0.8
    ]
    [
    text{Gini} = frac{A_{text{model}} – A_{text{random}}}{A_{text{perfect}} – A_{text{random}}}
    ]
    [
    = frac{0.775 – 0.5}{0.8 – 0.5}
    ]
    [
    = frac{0.275}{0.3}
    ]
    [
    approx 0.92
    ]

    We bought Gini = 0.92, which implies nearly all of the positives are concentrated on the high of the sorted record. This exhibits that the mannequin does an excellent job of separating positives from negatives, coming near good.


    As we have now seen how the Gini Coefficient is calculated, let’s have a look at what we really did through the calculation.

    We thought-about a pattern of 10 factors consisting of output chances from logistic regression.

    We sorted the chances in descending order.

    Subsequent, we calculated Cumulative Inhabitants (%) and Cumulative Positives (%) after which plotted them.

    We bought a curve referred to as the Lorenz curve, and we calculated the realm beneath it, which is 0.775.

    Now let’s perceive what’s 0.775?

    Our pattern consists of 4 positives (class 2) and 6 negatives (class 1).

    The output chances are for sophistication 2, which implies the upper the chance, the extra possible the shopper belongs to class 2.

    In our pattern knowledge, the positives are captured inside 50% of the inhabitants, which implies all of the positives are ranked on the high.

    If the mannequin is ideal, then the positives are captured inside the first 4 rows, i.e., inside the first 40% of the inhabitants, and the realm beneath the curve for the proper mannequin is 0.8.

    However we bought AUC = 0.775, which is almost good.

    Right here, we try to calculate the effectivity of the mannequin. If extra positives are concentrated on the high, it means the mannequin is sweet at classifying positives and negatives.

    Subsequent, we calculated the Gini Coefficient, which is 0.92.

    [
    text{Gini} = frac{A_{text{model}} – A_{text{random}}}{A_{text{perfect}} – A_{text{random}}}
    ]

    The numerator tells us how a lot better our mannequin is than random guessing.

    The denominator tells us the utmost doable enchancment over random.

    The ratio places these two collectively, so the Gini coefficient at all times falls between 0 (random) and 1 (good).

    Gini is used to measure how shut the mannequin is to being good in separating constructive and destructive courses.

    However we might get a doubt about why we calculated Gini and why we didn’t cease after 0.775.

    0.775 is the realm beneath the Lorenz curve for our mannequin. It doesn’t inform us how shut the mannequin is to being good with out evaluating it to 0.8, which is the realm for the proper mannequin.

    So, we calculate Gini to standardize it in order that it falls between 0 and 1, which makes it straightforward to match fashions.


    Banks additionally use Gini Coefficient to judge credit score danger fashions alongside ROC-AUC and KS Statistic. Collectively, these measures give an entire image of mannequin efficiency.


    Now, let’s calculate ROC-AUC for our pattern knowledge.

    import pandas as pd
    from sklearn.metrics import roc_auc_score
    
    # Pattern knowledge
    knowledge = {
        "Precise": [2, 2, 2, 1, 2, 1, 1, 1, 1, 1],
        "Pred_Prob_Class2": [0.92, 0.63, 0.51, 0.39, 0.29, 0.20, 0.13, 0.10, 0.05, 0.01]
    }
    
    df = pd.DataFrame(knowledge)
    
    # Convert Precise: class 2 -> 1 (constructive), class 1 -> 0 (destructive)
    y_true = (df["Actual"] == 2).astype(int)
    y_score = df["Pred_Prob_Class2"]
    
    # Calculate ROC-AUC
    roc_auc = roc_auc_score(y_true, y_score)
    roc_auc

    We bought AUC = 0.9583

    Now, Gini = (2 * AUC) – 1 = (2 * 0.9583) – 1 = 0.92

    That is the relation between Gini & ROC-AUC.


    Now let’s calculate Gini Coefficient on a full dataset.

    Code:

    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import roc_auc_score
    
    # Load dataset
    file_path = "C:/german.knowledge"
    knowledge = pd.read_csv(file_path, sep=" ", header=None)
    
    # Rename columns
    columns = [f"col_{i}" for i in range(1, 21)] + ["target"]
    knowledge.columns = columns
    
    # Options and goal
    X = pd.get_dummies(knowledge.drop(columns=["target"]), drop_first=True)
    y = knowledge["target"]
    
    # Convert goal: make it binary (1 = good, 0 = dangerous)
    y = (y == 2).astype(int)
    
    # Practice-test cut up
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42, stratify=y
    )
    
    # Practice logistic regression
    mannequin = LogisticRegression(max_iter=10000)
    mannequin.match(X_train, y_train)
    
    # Predicted chances
    y_pred_proba = mannequin.predict_proba(X_test)[:, 1]
    
    # Calculate ROC-AUC
    auc = roc_auc_score(y_test, y_pred_proba)
    
    # Calculate Gini
    gini = 2 * auc - 1
    
    auc, gini
    

    We bought Gini = 0.60

    Interpretation:

    Gini > 0.5: acceptable.

    Gini = 0.6–0.7: good mannequin.

    Gini = 0.8+: wonderful, not often achieved.


    Dataset

    The dataset used on this weblog is the German Credit dataset, which is publicly obtainable on the UCI Machine Studying Repository. It’s offered beneath the Creative Commons Attribution 4.0 International (CC BY 4.0) License. This implies it may be freely used and shared with correct attribution.


    I hope you discovered this weblog helpful.

    In the event you loved studying, take into account sharing it along with your community, and be at liberty to share your ideas.

    In the event you haven’t learn my earlier blogs on ROC-AUC and Kolmogorov Smirnov Statistic, you possibly can test them out right here.

    Thanks for studying!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    The Ultimate Beginners’ Guide to Building an AI Agent in Python

    May 24, 2026

    Beyond the Model: Why Data Scientists Must Embrace APIs and API Documentation

    May 24, 2026

    From Prototype to Profit: Solving the Agentic Token-Burn Problem

    May 23, 2026

    How to Mathematically Choose the Optimal Bins for Your Histogram

    May 23, 2026

    Beyond the Scroll: How Social Media Algorithms Shape Your Reality

    May 23, 2026

    The Hidden Bottleneck in Quantum Machine Learning: Getting Data into a Quantum Computer

    May 22, 2026

    Comments are closed.

    Editors Picks

    Single-floor Surya tiny house is designed for comfortable long-term living

    May 25, 2026

    Perplant raises €1 million to equip tractors with AI “eyes” to cut herbicide use and boost profits for farmers

    May 25, 2026

    These Privacy-Conscious Gay Dating Apps Want to Dethrone Grindr

    May 25, 2026

    Tether plans to launch GELT, an “official” stablecoin representing the Georgian lari, with the support of Georgia’s government in an unusual partnership (Reuters)

    May 25, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    NPM flooded with malicious packages downloaded more than 86,000 times

    November 2, 2025

    Genesis G90 Wingback concept revealed at NYIAS

    April 2, 2026

    Viral band success spawns AI claims and hoaxes

    July 4, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.