Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • London-based raises €11 million to build on-device AI platform for next-generation gaming
    • This Startup Wants to Build Self-Driving Car Software—Super Fast
    • the UK government wants Apple, Google, and others to block explicit images at the OS level by default to protect kids and have adults verify their ages (Financial Times)
    • Are Sunbasket’s Healthy Meal Kits Worth the Cost in 2026? CNET Editors Put Them to the Test
    • Game creator sacked us for trying to unionise
    • Lessons Learned from Upgrading to LangChain 1.0 in Production
    • What even is the AI bubble?
    • Dog breeds carry wolf DNA, new study finds genetic advantages
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, December 15
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Spearman Correlation Coefficient for When Pearson Isn’t Enough
    Artificial Intelligence

    Spearman Correlation Coefficient for When Pearson Isn’t Enough

    Editor Times FeaturedBy Editor Times FeaturedNovember 14, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    on the Pearson correlation coefficient, we mentioned how it’s used to measure the energy of the linear relationship between two variables (years of expertise and wage).

    Not all relationships between variables are linear, and Pearson correlation works finest when the connection follows a straight-line sample.

    When the connection will not be linear however nonetheless strikes persistently in a single path, we use Spearman correlation coefficient to seize that sample.

    To know the Spearman correlation coefficient, let’s take into account the fish market dataset.

    This dataset consists of bodily attributes of every fish, resembling:

    • Weight – the load of the fish in grams (this might be our goal variable)
    • Length1, Length2, Length3 – varied size measurements (in cm)
    • Top – the peak of the fish (in cm)
    • Width – the diagonal width of the fish physique (in cm)

    We have to predict the load of the fish primarily based on varied size measurements, peak and width.

    This was the identical instance we used to know the maths behind a number of linear regression in an earlier weblog however used solely peak and width as impartial variables first to get the person equations for slopes and intercepts.

    Right here we try to suit a a number of linear regression mannequin, and we have now 5 impartial variables and one goal variable.

    Now let’s calculate the Pearson correlation coefficient between every impartial variable and the goal variable.

    Code:

    import pandas as pd
    
    # Load the Fish Market dataset
    df = pd.read_csv("C:/Fish.csv")
    
    # Drop the explicit 'Species' column 
    if 'Species' in df.columns:
        df_numeric = df.drop(columns=['Species'])
    else:
        df_numeric = df.copy()
    
    # Calculate Pearson correlation between every impartial variable and the goal (Weight)
    goal = 'Weight'
    pearson_corr = df_numeric.corr(methodology='pearson')[target].drop(goal)  # drop self-correlation
    
    pearson_corr.sort_values(ascending=False)

    The Pearson correlation coefficient between Weight and

    • Length3 is 0.923044
    • Length2 is 0.918618
    • Length1 is 0.915712
    • Width is 0.886507
    • Top is 0.724345

    Amongst all of the variables, Top has the weakest Pearson correlation coefficient, and we’d suppose that we must always drop this variable earlier than making use of the a number of linear regression mannequin.

    However earlier than that, is it right to drop an impartial variable primarily based on Pearson correlation coefficient?

    No.

    First, let’s have a look at the scatter plot between Top and Weight.

    Picture by Writer

    From the scatter plot we will observe that as peak will increase, weight additionally will increase, however the relationship will not be linear.

    At smaller heights, the load will increase slowly. At bigger heights, it will increase extra rapidly.

    Right here the pattern is non-linear however nonetheless monotonic, as a result of it strikes in a single path.

    Because the Pearson correlation coefficient assumes a straight-line relationship (linearity), it provides a decrease worth right here.

    That is the place the Spearman correlation coefficient is available in.

    Now let’s calculate the Spearman correlation coefficient between Top and Weight.

    Code:

    import pandas as pd
    from scipy.stats import spearmanr
    
    # Load the dataset
    df = pd.read_csv("C:/Fish.csv") 
    
    # Calculate Spearman correlation coefficient between Top and Weight
    spearman_corr = spearmanr(df["Height"], df["Weight"])[0]
    
    print(f"Spearman Correlation Coefficient: {spearman_corr:.4f}")

    The Spearman correlation coefficient is 0.8586, which signifies a powerful optimistic relationship between Top and Weight.

    Which means as the peak of the fish will increase, the load additionally tends to extend.

    Earlier, we received a Pearson correlation coefficient of 0.72 between Top and Weight, which underestimates the precise relationship between these variables.

    If we choose options solely primarily based on the Pearson correlation and take away the Top function, we’d lose an necessary variable that truly has a powerful relationship with the goal, resulting in much less related predictions.

    That is the place the Spearman correlation coefficient helps, because it captures non-linear however monotonic traits.

    Through the use of the Spearman correlation, we will additionally resolve the following steps, resembling making use of transformations like log or lag values or contemplating algorithms like resolution bushes or random forests that may deal with each linear and non-linear relationships.


    As we have now understood the importance of the Spearman correlation coefficient, now it’s time to perceive the maths behind it.

    How is the Spearman correlation coefficient calculated in a means that it captures the connection even when the info is non-linear and monotonic?

    To know this, let’s take into account a 10-point pattern from the dataset.

    Picture by Writer

    Now, we type the values in ascending order in every column after which assign ranks.

    Picture by Writer

    Now that we have now given ranks to each Top and Weight, we don’t hold them within the sorted order.

    Every worth wants to return to its authentic place within the dataset so that each fish’s peak rank is matched with its personal weight rank.

    We type the columns solely to assign ranks. After that, we place the ranks again of their authentic order after which calculate the Spearman correlation utilizing these two units of ranks.

    Picture by Writer

    Right here, whereas assigning ranks after sorting the values in ascending order within the Weight column, we encountered a tie at ranks 5 and 6, so we assigned each values the common rank of 5.5.

    Equally, we discovered one other tie throughout ranks 7, 8, 9, and 10, so we assigned all of them the common rank of 8.5.

    Now, we calculate the Spearman correlation coefficient, which is definitely the Pearson correlation utilized to the ranks.

    We already know the method for calculating Pearson correlation coefficient.

    [
    r = frac{text{Cov}(X, Y)}{s_X cdot s_Y}
    = frac{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{frac{1}{n – 1} sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]

    [
    = frac{sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]

    Now, the method for Spearman correlation coefficient is:

    [
    r_s =
    frac{
    sum_{i=1}^{n}
    underbrace{(R_{X_i} – bar{R}_X)}_{text{Rank deviation of } X_i}
    cdot
    underbrace{(R_{Y_i} – bar{R}_Y)}_{text{Rank deviation of } Y_i}
    }{
    sqrt{
    sum_{i=1}^{n}
    underbrace{(R_{X_i} – bar{R}_X)^2}_{text{Squared rank deviations of } X}
    }
    cdot
    sqrt{
    sum_{i=1}^{n}
    underbrace{(R_{Y_i} – bar{R}_Y)^2}_{text{Squared rank deviations of } Y}
    }
    }
    ]

    [
    begin{aligned}
    text{Where:}
    R_{X_i} & = text{ rank of the } i^text{th} text{ value in variable } X
    R_{Y_i} & = text{ rank of the } i^text{th} text{ value in variable } Y
    bar{R}_X & = text{ mean of all ranks in } X
    bar{R}_Y & = text{ mean of all ranks in } Y
    end{aligned}
    ]

    Now, let’s calculate the Spearman correlation coefficient for the pattern knowledge.

    [
    textbf{Step 1: Ranks from the original data}
    ]

    [
    begin{array}cccccccccc
    R_{x_i} & 3 & 1 & 2 & 5 & 8 & 4 & 7 & 9 & 10 & 6 [2pt]
    R_{y_i} & 1 & 2 & 4 & 5.5 & 8.5 & 3 & 5.5 & 8.5 & 8.5 & 8.5
    finish{array}
    ]

    [
    textbf{Step 2: Formula of Spearman’s correlation (Pearson on ranks)}
    ]

    [
    rho_s =
    frac{sum_{i=1}^{n}bigl(R_{x_i}-bar{R_x}bigr)bigl(R_{y_i}-bar{R_y}bigr)}
    {sqrt{sum_{i=1}^{n}bigl(R_{x_i}-bar{R_x}bigr)^2} ;
    sqrt{sum_{i=1}^{n}bigl(R_{y_i}-bar{R_y}bigr)^2}},
    qquad n = 10
    ]

    [
    textbf{Step 3: Mean of rank variables}
    ]

    [
    bar{R_x} = frac{3+1+2+5+8+4+7+9+10+6}{10} = frac{55}{10} = 5.5
    ]

    [
    bar{R_y} = frac{1+2+4+5.5+8.5+3+5.5+8.5+8.5+8.5}{10}
    = frac{55.5}{10} = 5.55
    ]

    [
    textbf{Step 4: Deviations and cross-products}
    ]

    [
    begin{array}c
    i & R_{x_i}-bar{R_x} & R_{y_i}-bar{R_y} & (R_{x_i}-bar{R_x})(R_{y_i}-bar{R_y}) hline
    1 & -2.5 & -4.55 & 11.38
    2 & -4.5 & -3.55 & 15.98
    3 & -3.5 & -1.55 & 5.43
    4 & -0.5 & -0.05 & 0.03
    5 & 2.5 & 2.95 & 7.38
    6 & -1.5 & -2.55 & 3.83
    7 & 1.5 & -0.05 & -0.08
    8 & 3.5 & 2.95 & 10.33
    9 & 4.5 & 2.95 & 13.28
    10 & 0.5 & 2.95 & 1.48
    end{array}
    ]

    [
    sum (R_{x_i}-bar{R_x})(R_{y_i}-bar{R_y}) = 68.0
    ]

    [
    textbf{Step 5: Sum of squares for each rank variable}
    ]

    [
    sum (R_{x_i}-bar{R_x})^2 = 82.5,
    qquad
    sum (R_{y_i}-bar{R_y})^2 = 82.5
    ]

    [
    textbf{Step 6: Substitute into the formula}
    ]

    [
    rho_s
    = frac{68.0}{sqrt{(82.5)(82.5)}}
    = frac{68.0}{82.5}
    = 0.824
    ]

    [
    textbf{Step 7: Interpretation}
    ]

    [
    rho_s = 0.824
    ]

    A price of ( rho_s = 0.824 ) exhibits a powerful optimistic monotonic relationship between Top and Weight as peak will increase, weight additionally tends to extend.

    That is how we calculate the spearman correlation coefficient.

    We even have one other method to calculate the Spearman correlation coefficient, however it’s used solely when there aren’t any tied ranks.

    [
    rho_s = 1 – frac{6sum d_i^2}{n(n^2 – 1)}
    ]

    the place:

    [
    begin{aligned}
    rho_s & : text{ Spearman correlation coefficient} [4pt]
    d_i & : textual content{ distinction between the ranks of every commentary, } (R_{x_i} – R_{y_i}) [4pt]
    n & : textual content{ whole variety of paired observations}
    finish{aligned}
    ]

    If ties are current, the rank variations not signify the precise distances between positions, and we as an alternative calculate ‘ρ’ utilizing the ‘Pearson correlation on ranks’ method.


    Dataset

    The dataset used on this weblog is the Fish Market dataset, which accommodates measurements of fish species bought in markets, together with attributes like weight, peak, and width.

    It’s publicly obtainable on Kaggle and is licensed beneath the Creative Commons Zero (CC0 Public Domain) license. This implies it may be freely used, modified, and shared for each non-commercial and industrial functions with out restriction.


    Spearman’s correlation coefficient helps us perceive how two variables transfer collectively when the connection will not be completely linear.

    By changing the info into ranks, it exhibits how nicely one variable will increase as the opposite will increase, capturing any upward or downward sample.

    It is vitally useful when the info has outliers, will not be usually distributed or when the connection is monotonic however curved.

    I hope this submit helped you see not simply tips on how to calculate the Spearman correlation coefficient, but in addition when to make use of it and why it is a crucial software in knowledge evaluation.

    Thanks for studying!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Lessons Learned from Upgrading to LangChain 1.0 in Production

    December 15, 2025

    The Machine Learning “Advent Calendar” Day 14: Softmax Regression in Excel

    December 14, 2025

    The Skills That Bridge Technical Work and Business Impact

    December 14, 2025

    Stop Writing Spaghetti if-else Chains: Parsing JSON with Python’s match-case

    December 14, 2025

    How to Increase Coding Iteration Speed

    December 13, 2025

    The Machine Learning “Advent Calendar” Day 13: LASSO and Ridge Regression in Excel

    December 13, 2025

    Comments are closed.

    Editors Picks

    London-based raises €11 million to build on-device AI platform for next-generation gaming

    December 15, 2025

    This Startup Wants to Build Self-Driving Car Software—Super Fast

    December 15, 2025

    the UK government wants Apple, Google, and others to block explicit images at the OS level by default to protect kids and have adults verify their ages (Financial Times)

    December 15, 2025

    Are Sunbasket’s Healthy Meal Kits Worth the Cost in 2026? CNET Editors Put Them to the Test

    December 15, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Best Filtered Water Bottles of 2025

    February 4, 2025

    How to Steam Vegetables in an Instant Pot: A Chef’s Step-by-Step Guide

    December 9, 2025

    Netflix Stops Allowing Streaming From Phone to TV: How to Watch Now

    December 4, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.