Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Portable water filter provides safe drinking water from any source
    • MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged
    • NCAA seeks faster trial over DraftKings disputed March Madness branding case
    • AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says
    • Extragalactic Archaeology tells the ‘life story’ of a whole galaxy
    • Swedish semiconductor startup AlixLabs closes €15 million Series A to scale atomic-level etching technology
    • Republican Mutiny Sinks Trump’s Push to Extend Warrantless Surveillance
    • Yocha Dehe slams Vallejo Council over rushed casino deal approval process
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»The Pearson Correlation Coefficient, Explained Simply
    Artificial Intelligence

    The Pearson Correlation Coefficient, Explained Simply

    Editor Times FeaturedBy Editor Times FeaturedNovember 1, 2025No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    construct a regression mannequin, which suggests becoming a straight line on the information to foretell future values, we first visualize our knowledge to get an concept of the way it appears to be like and to see the patterns and relationships.

    The info might seem to point out a optimistic linear relationship, however we affirm it by calculating the Pearson correlation coefficient, which tells us how shut our knowledge is to linearity.

    Let’s think about a easy Salary Dataset to grasp the Pearson correlation coefficient.

    The dataset consists of two columns:

    YearsExperience: the variety of years an individual has been working

    Wage (goal): the corresponding annual wage in US {dollars}

    Now we have to construct a mannequin that predicts wage based mostly on years of expertise.

    We are able to perceive that this may be performed with a easy linear regression mannequin as a result of we now have just one predictor and a steady goal variable.

    However can we instantly apply the easy linear regression algorithm similar to that?

    No.

    We now have a number of assumptions for linear regression to use, and one in all them is linearity.

    We have to test linearity, and for that, we calculate the correlation coefficient.


    However what’s linearity?

    Let’s perceive this with an instance.

    Picture by Creator

    From the desk above, we are able to see that for each one-year enhance in expertise, there’s a $5,000 enhance in wage.

    The change is fixed, and once we plot these values, we get a straight line.

    Such a relationship is named a linear relationship.


    Now in easy linear regression, we already know that we match a regression line on the information to foretell future values, and this may be efficient solely when the information has a linear relationship.

    So, we have to test for linearity in our knowledge.

    For that, let’s calculate the correlation coefficient.

    Earlier than that, we first visualize the information utilizing a scatter plot to get an concept of the connection between the 2 variables.

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    
    # Load the dataset
    df = pd.read_csv("C:/Salary_dataset.csv")
    
    # Set plot type
    sns.set(type="whitegrid")
    
    # Create scatter plot
    plt.determine(figsize=(8, 5))
    sns.scatterplot(x='YearsExperience', y='Wage', knowledge=df, coloration='blue', s=60)
    
    plt.title("Scatter Plot: Years of Expertise vs Wage")
    plt.xlabel("Years of Expertise")
    plt.ylabel("Wage (USD)")
    plt.tight_layout()
    plt.present()
    Picture by Creator

    We are able to observe from the scatter plot that as years of expertise will increase, wage additionally tends to extend.

    Though the factors don’t kind an ideal straight line, the connection seems to be sturdy and linear.

    To verify this, let’s now calculate the Pearson correlation coefficient.

    import pandas as pd
    
    # Load the dataset
    df = pd.read_csv("C:/Salary_dataset.csv")
    
    # Calculate Pearson correlation
    pearson_corr = df['YearsExperience'].corr(df['Salary'], technique='pearson')
    
    print(f"Pearson correlation coefficient: {pearson_corr:.4f}")

    Pearson correlation coefficient is 0.9782.

    We get the worth of correlation coefficient in between -1 and +1.

    Whether it is…
    near 1: sturdy optimistic linear relationship
    near 0: no linear relationship
    near -1: sturdy unfavourable linear relationship

    Right here, we obtained a correlation coefficient worth of 0.9782, which suggests the information principally follows a straight-line sample, and there’s a very sturdy optimistic relationship between the variables.

    From this, we are able to observe that easy linear regression is properly suited for modeling this relationship.


    However how will we calculate this Pearson correlation coefficient?

    Let’s think about a 10-point pattern knowledge from our dataset.

    Picture by Creator

    Now, let’s calculate the Pearson correlation coefficient.

    When each X and Y enhance collectively, the correlation is claimed to be optimistic. Alternatively, if one will increase whereas the opposite decreases, the correlation is unfavourable.

    First, let’s calculate the variance for every variable.

    Variance helps us perceive how far the values are unfold from the imply.

    We’ll begin by calculating the variance for X (Years of Expertise).
    To try this, we first have to compute the imply of X.

    [
    bar{X} = frac{1}{n} sum_{i=1}^{n} X_i
    ]

    [
    = frac{1.2 + 3.3 + 3.8 + 4.1 + 5.0 + 5.4 + 8.3 + 8.8 + 9.7 + 10.4}{10}
    ]
    [
    = frac{70.0}{10}
    ]
    [
    = 7.0
    ]

    Subsequent, we subtract every worth from the imply after which sq. it to cancel out the negatives.

    Picture by Creator

    We’ve calculated the squared deviations of every worth from the imply.
    Now, we are able to discover the variance of X by taking the typical of these squared deviations.

    [
    text{Sample Variance of } X = frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})^2
    ]

    [
    = frac{33.64 + 13.69 + 10.24 + 8.41 + 4.00 + 2.56 + 1.69 + 3.24 + 7.29 + 11.56}{10 – 1}
    ]
    [
    = frac{96.32}{9} approx 10.70
    ]

    Right here we divided by ‘n-1’ as a result of we’re coping with a pattern knowledge and utilizing ‘n-1’ provides us the unbiased estimate of variance.

    The pattern variance of X is 10.70, which tells us that the values of Years of Expertise are, on common, 10.70 squared models away from the imply.

    Since variance is a squared worth, we take the sq. root to interpret it in the identical unit as the unique knowledge.

    That is known as Customary Deviation.

    [
    s_X = sqrt{text{Sample Variance}} = sqrt{10.70} approx 3.27
    ]

    The usual deviation of X is 3.27, which implies that the values of Years of Expertise fall about 3.27 years above or beneath the imply.


    In the identical approach we calculate the variance and customary deviation of ‘Y’.

    [
    bar{Y} = frac{1}{n} sum_{i=1}^{n} Y_i
    ]

    [
    = frac{39344 + 64446 + 57190 + 56958 + 67939 + 83089 + 113813 + 109432 + 112636 + 122392}{10}
    ]
    [
    = frac{827239}{10}
    ]
    [
    = 82,!723.90
    ]
    [
    text{Sample Variance of } Y = frac{1}{n – 1} sum (Y_i – bar{Y})^2
    ]
    [
    = frac{7,!898,!632,!198.90}{9} = 877,!625,!799.88
    ]
    [
    text{Standard Deviation of } Y text{ is } s_Y = sqrt{877,!625,!799.88} approx 29,!624.75
    ]

    We calculated the variance and customary deviation of ‘X’ and ‘Y’.

    Now, the following step is to calculate the covariance between X and Y.

    We have already got the technique of X and Y, in addition to the deviations of every worth from their respective means.

    Now, we multiply these deviations to see how the 2 variables differ collectively.

    Picture by Creator

    By multiplying these deviations, we are attempting to seize how X and Y transfer collectively.

    If each X and Y are above their means, then the deviations are optimistic, which suggests the product is optimistic.

    If each X and Y are beneath their means, then the deviations are unfavourable, however since a unfavourable instances a unfavourable is optimistic, the product is optimistic.

    If one is above the imply and the opposite is beneath, the product is unfavourable.

    This product tells us whether or not the 2 variables have a tendency to maneuver within the identical route (each growing or each reducing) or in reverse instructions.

    Utilizing the sum of the product of deviations, we now calculate the pattern covariance.

    [
    text{Sample Covariance} = frac{1}{n – 1} sum_{i=1}^{n}(X_i – bar{X})(Y_i – bar{Y})
    ]

    [
    = frac{808771.5}{10 – 1}
    ]
    [
    = frac{808771.5}{9} = 89,!863.5
    ]

    We obtained a pattern covariance of 89863.5. This means that as expertise will increase, wage additionally tends to extend.

    However the magnitude of covariance will depend on the models of the variables (years × {dollars}), so it’s indirectly interpretable.

    This worth solely exhibits the route.

    Now we divide the covariance by the product of the usual deviations of X and Y.

    This offers us the Pearson correlation coefficient which might be known as as a normalized model of covariance.

    Since the usual deviation of X has models of years and Y has models of {dollars}, multiplying them provides us years instances {dollars}.

    These models cancel out once we divide, ensuing within the Pearson correlation coefficient, which is unitless.

    However the primary cause we divide covariance by the usual deviations is to normalize it, so the result’s simpler to interpret and might be in contrast throughout totally different datasets.

    [
    r = frac{text{Cov}(X, Y)}{s_X cdot s_Y}
    = frac{89,!863.5}{3.27 times 29,!624.75}
    = frac{89,!863.5}{96,!992.13} approx 0.9265
    ]

    So, the Pearson correlation coefficient (r) we calculated is 0.9265.

    This tells us there’s a very sturdy optimistic linear relationship between years of expertise and wage.

    This fashion we discover the Pearson correlation coefficient.

    The system for Pearson correlation coefficient is:

    [
    r = frac{text{Cov}(X, Y)}{s_X cdot s_Y}
    = frac{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{frac{1}{n – 1} sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{frac{1}{n – 1} sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]

    [
    = frac{sum_{i=1}^{n} (X_i – bar{X})(Y_i – bar{Y})}
    {sqrt{sum_{i=1}^{n} (X_i – bar{X})^2} cdot sqrt{sum_{i=1}^{n} (Y_i – bar{Y})^2}}
    ]


    We want to verify sure situations are met earlier than calculating the Pearson correlation coefficient:

    • The connection between the variables needs to be linear.
    • Each variables needs to be steady and numeric.
    • There needs to be no sturdy outliers.
    • The info needs to be usually distributed.

    Dataset

    The dataset used on this weblog is the Salary dataset.

    It’s publicly obtainable on Kaggle and is licensed beneath the Creative Commons Zero (CC0 Public Domain) license. This implies it may be freely used, modified, and shared for each non-commercial and industrial functions with out restriction.


    I hope this gave you a transparent understanding of how the Pearson correlation coefficient is calculated and when it’s used.

    Thanks for studying!



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026

    Comments are closed.

    Editors Picks

    Portable water filter provides safe drinking water from any source

    April 18, 2026

    MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged

    April 18, 2026

    NCAA seeks faster trial over DraftKings disputed March Madness branding case

    April 18, 2026

    AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Trump Family–Backed World Liberty Financial Sets Up $1.5 Billion Crypto Treasury

    August 13, 2025

    Want to Stream Hulu, Peacock and More for Free? Try Food Delivery Memberships

    September 19, 2025

    Xiaomi 17 Pro and 17 Pro Max: Specs, Release Date, Price, Features

    September 26, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.