Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • The ‘Lonely Runner’ Problem Only Appears Simple
    • Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)
    • Today’s NYT Connections Hints, Answers for April 19 #1043
    • Rugged tablet boasts built-in projector and night vision
    • Asus TUF Gaming A14 (2026) Review: GPU-Less Gaming Laptop
    • Mistral, which once aimed for top open models, now leans on being an alternative to Chinese and US labs, says it’s on track for $80M in monthly revenue by Dec. (Iain Martin/Forbes)
    • Today’s NYT Wordle Hints, Answer and Help for April 19 #1765
    • Powerful lightweight sports car available now
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, April 19
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»A Survival Analysis Guide with Python: Using Time-To-Event Models to Forecast Customer Lifetime
    Artificial Intelligence

    A Survival Analysis Guide with Python: Using Time-To-Event Models to Forecast Customer Lifetime

    Editor Times FeaturedBy Editor Times FeaturedApril 9, 2026No Comments14 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    to many areas of information, serving to us take care of uncertainty, calculate chances, and help selections alongside the best way.

    A type of areas that depends closely on statistics is the medical trade, utilizing instruments like T-Exams, A/B Exams, or Survival Evaluation. This final one is the topic of this text.

    Survival evaluation originated within the medical and organic sciences, the place they had been making an attempt to mannequin, as their major occasion, the demise of a affected person or organism. That’s the explanation for the identify.

    Nonetheless, statisticians understood that such evaluation was so highly effective that it could possibly be utilized to many different areas of life, and so it unfold to the enterprise area, much more after the surge of Knowledge Science.

    Let’s study extra about it.

    Survival Evaluation

    Survival Evaluation [SA] is a department of statistics used to foretell the period of time it takes for a selected occasion to happen.[1]

    Also called Time-to-event, this research can decide how lengthy it is going to take for one thing to occur whereas accounting for the truth that some occasions haven’t occurred but by the point the information is collected.

    The examples will not be solely within the medical and organic sciences, however all over the place.

    • Time till a machine fails
    • Time till a buyer cancels a subscription
    • Time till the shopper buys once more

    Now, provided that we try to estimate a quantity, moderately than a gaggle or class, this implies we’re coping with a kind of regression downside. So why can’t we go together with OLS Linear Regression?

    Why Use Survival Evaluation?

    Normal regression fashions like OLS or Logistic Regression wrestle with survival knowledge as a result of they’re designed to deal with accomplished occasions, not “ongoing” tales.

    Think about you need to predict who completed a 10-mile race, however the enter knowledge is an occasion that’s nonetheless occurring. The race is at 2 hours, and also you need to use the information you might have up to now to estimate one thing.

    The common regression algorithms will fail as a result of:

    • OLS: You solely have the information from those that have already completed the race. Utilizing solely their knowledge will create an enormous bias for sooner folks.
    • Logistic Regression: It will probably inform if somebody completed the race, most likely, but it surely treats those that completed at half-hour the identical as those that completed in 8 hours.

    The Fundamentals of Survival Evaluation

    Allow us to go over a couple of necessary ideas for understanding Survival Evaluation.

    First, we should perceive the delivery and demise of an information level.

    • Beginning: The second we began to measure that knowledge level. For instance, the second a affected person is recognized with most cancers, or the day an individual is employed by an organization. Discover that the observations don’t want to begin all on the similar time.
    • Loss of life: It occurs on the prevalence of the occasion of curiosity. The day the worker left the corporate.

    Now, the attention-grabbing factor about SA is that the research or the commentary can finish earlier than the occasion occurs. On this case, we can have one other necessary idea: the censored knowledge level.

    • Censoring (Non-death): If the research ends or a topic drops out earlier than the occasion occurs, the information is “censored,” which means we solely know they survived at the very least till that time.

    Knowledge will be censored in several methods, although.

    • Proper Censoring: Commonest. The occasion happens after the commentary interval ends or the topic drops out.
    Knowledge level C is right-censored. Picture by the creator.
    • Left Censoring: The occasion occurred earlier than the research began.

    Nice. You will need to observe that survival evaluation is a option to estimate the chance of an occasion occurring as a operate of time. By treating survival as a operate of time, we will reply questions {that a} single chance rating can’t, reminiscent of: “At what particular month does the chance of a buyer churning peak?”

    Now that we all know the fundamentals, let’s study extra concerning the capabilities concerned in SA.

    Survival Perform

    The survival operate S(t) expresses the chance of the occasion not occurring as a operate of time. It would naturally lower as time passes, since increasingly more people will expertise the occasion.

    So, making use of it to our worker churn instance, we might see the chance that an worker continues to be within the firm after N years.

    Survival Perform. Picture by the creator.

    Hazard Perform

    The hazard operate signifies the chance of the occasion occurring at a given cut-off date. It’s the reverse of the survival operate, and represents the chance of churn (as a substitute of the chance of staying within the firm).

    This operate will calculate what’s the chance that the staff who haven’t churned till now will achieve this from this cut-off date.

    Hazard Perform. Picture by the creator.

    Selecting Your Mannequin for Survival Evaluation

    As you see, SA is a subject that may get deep and dense actual fast. However let’s attempt to hold it easy.

    There are two important fashions used when performing survival evaluation. One is the Kaplan-Meier, which is less complicated however doesn’t think about the impact of extra predictor variables, and it requires a couple of assumptions to work.

    The opposite one is the Cox Proportional Hazard mannequin, which is the trade normal as a result of it will probably take different variables into the mannequin, it’s extra steady mathematically, and it really works properly even when some assumptions are violated.

    Let’s study extra about them.

    Kaplan-Meier

    • Works properly with right-censored knowledge (bear in mind? when the occasion happens after the commentary interval ends)
    • Intuitive mannequin
    • Non-parametric: doesn’t comply with any distribution
    • Assumptions are required, like dropouts will not be associated to the occasion; Entry time doesn’t have an effect on survival threat; and Occasion instances are identified precisely.
    • Returns a survival operate that appears like a staircase

    When to make use of:

    • Easy survival evaluation with out different covariates or predictors.
    • Nice for fast visualizations.

    Cox Proportional Hazard

    • Business normal
    • Accepts extra predictors or covariates
    • Works properly even when some assumptions are violated
    • Estimates a hazard operate, which are usually extra steady than survival capabilities

    When to make use of:

    • Estimate on knowledge with a number of predictor (covariate) variables.

    Subsequent, let’s get our palms on some code.

    Code

    On this part, we’ll discover ways to mannequin an SA utilizing each fashions beforehand introduced.

    The dataset chosen for this train is the Telco Customer Churn, which yow will discover within the UCI Machine Studying Repository underneath the Inventive Commons license.

    View of the dataset. Picture by the creator.

    Subsequent, let’s import the packages wanted.

    # Knowledge
    from ucimlrepo import fetch_ucirepo
    
    # Knowledge Wrangling
    import pandas as pd
    import numpy as np
    
    # DataViz
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Lifelines Survival Evaluation
    from lifelines import KaplanMeierFitter
    from lifelines import CoxPHFitter
    
    # fetch dataset 
    telco_churn = fetch_ucirepo(id=563) 
      
    # knowledge (as pandas dataframes) 
    X = telco_churn.knowledge.options 
    y = telco_churn.knowledge.targets 
      
    # Pandas df
    df = pd.concat([X, y], axis=1)
    df.head(3)

    Implementing Kaplan-Meier

    Now, as talked about, the Kaplan-Meier [KM] mannequin is de facto easy and easy to make use of, being a good selection for visualizations. All we want are two variables: one predictor and one label.

    Then, we will instantiate the KM mannequin and match it to the information, utilizing Subscription Size (complete months of subscription) because the predictor, and Churn because the occasion noticed.

    # Instantiate Ok-M
    kmf = KaplanMeierFitter()
    
    # Match the mannequin
    kmf.match(df['Subscription  Length'],
            event_observed=df['Churn'],
            label= 'Buyer Churn')

    Finished. Subsequent, we will visualize the survival operate.

    # Plot survival curve
    plt.determine(figsize=(12, 5))
    kmf.plot_survival_function()
    plt.title('Kaplan-Meier Survival Curve: Telco Buyer Lifetime')
    plt.xlabel('Time (months)')
    plt.ylabel('Chance of Remaining Subscribed')
    plt.grid(True)
    plt.present()

    That is so nice! We are able to see that greater than 90% of the purchasers stick with the Telecom firm for about 35 months.

    Kaplan-Meier mannequin is nice for visualizations. Picture by the creator.

    If we need to affirm, we will simply code that to study that 90% stick with the corporate for 34 months, truly.

    # Checking survival price at 34 months
    kmf.survival_function_at_times(34)
    Buyer Churn
    34	0.900613

    If we need to know what the median time is when folks churn, we will use KM’s attribute .median_survival_time_. That is the cut-off date (t) the place the survival chance drops to 50%. In our case, will probably be inf as a result of the survival operate by no means drops underneath 0.5. But when the consequence was 24 (for instance), it implies that on common, half of your prospects can have churned by month 24.

    # Time (t) when Survival drops underneath 50%
    median_survival = kmf.median_survival_time_
    print(f"Median Buyer Lifetime: {median_survival} months")

    We are able to additionally carry out different analyses, reminiscent of comparisons between teams. Think about that this Telco firm classifies its prospects into two teams:

    1. Heavy-users: Frequency of Use > median
    2. Comfortable-users: Frequency of Use <= median

    We are able to evaluate each survival capabilities from these two teams.

    # Column Teams
    df['Heavy_User'] = np.the place(df['Frequency of use'] > df['Frequency of use'].median(), 1, 0)
    df.head()
    
    plt.determine(figsize=(12, 5))
    plt.title('Kaplan-Meier Survival Curve: Telco Buyer Lifetime')
    plt.xlabel('Time (months)')
    plt.ylabel('Chance Churn')
    
    # Match the mannequin for Comfortable customers and plot
    kmf.match(df[df.Heavy_User == 0]['Subscription  Length'], df[df.Heavy_User == 0]['Churn'], label='Comfortable Person')
    ax = kmf.plot_survival_function()
    
    # Match the mannequin for Heavy customers and plot
    kmf.match(df[df.Heavy_User == 1]['Subscription  Length'], df[df.Heavy_User == 1]['Churn'], label='Heavy Person')
    ax = kmf.plot_survival_function(ax=ax)
    
    plt.present()

    And there it’s. Whereas heavy customers keep regular with the corporate all through the entire timeframe, the gentle customers will churn rapidly after the thirtieth month. Their median survival time is 40 months.

    Survival comparability between teams. Picture by the creator.

    When evaluating teams, you could be sure that the distinction is statistically important. For that, the bundle lifelines has the log-rank take a look at applied. It’s a speculation take a look at:

    • Ho (null speculation): The survival curves of two populations don’t differ.
    • Ha (various speculation): The survival curves of two populations are completely different.
    from lifelines.statistics import logrank_test
    # 3. Carry out the Log-Rank Take a look at
    outcomes = logrank_test(df[df.Heavy_User == 0]['Subscription  Length'],
                           df[df.Heavy_User == 1]['Subscription  Length'],
                           event_observed_A= df[df.Heavy_User == 0]['Churn'], 
                           event_observed_B= df[df.Heavy_User == 1]['Churn'])
    
    # 4. Print Outcomes
    print(f"P-value: {outcomes.p_value}")
    print(f"Take a look at Statistic: {outcomes.test_statistic}")
    
    if outcomes.p_value < 0.05:
        print("End result: Statistically important distinction between teams.")
    else:
        print("End result: No important distinction detected.")
    P-value: 7.23487469906141e-103
    Take a look at Statistic: 463.7794219211866
    End result: Statistically important distinction between teams.

    Implementing Cox Proportional Hazard

    The primary cool factor that you are able to do with the Cox Proportional Hazard [CPH] Mannequin is checking how different variables can affect the survival of your noticed particular person.

    Let’s break it down.

    1. We begin by selecting some covariates
    2. We filter the dataset
    3. Instantiate the mannequin
    4. Match the mannequin
    # 1. Put together the information
    # Choosing the time, the occasion, and our chosen covariates
    cols_to_use = [
        'Subscription  Length', # Time (t)
        'Churn',                 # Event (E)
        'Charge  Amount',        # Covariate 1
        'Complains',             # Covariate 2
        'Frequency of use'       # Covariate 3
    ]
    
    # Dropping any lacking values for the mannequin
    df_model = df[cols_to_use].dropna()
    
    # 2. Initialize and match the Cox mannequin
    # Use the penalizer to stabilize the maths if not converging.
    cph = CoxPHFitter(penalizer=0.1)
    cph.match(df_model, 
            duration_col='Subscription  Size', 
            event_col='Churn')
    
    # 3. Show the outcomes
    cph.print_summary()
    
    # 4. Visualize the affect of covariates
    cph.plot()

    That is our stunning consequence.

    CPH mannequin. Picture by the creator.

    How can we interpret this?

    The dashed vertical line at 0.0 is the impartial level.

    • If a variable’s level sits at 0, it has no impact on churn.
    • To the Proper (> 0): Will increase the hazard (makes churn occur sooner).
    • To the Left (< 0): Decreases the hazard (makes the shopper keep longer).
    • On the desk, a very powerful column for enterprise stakeholders is the Hazard Ration exp(coef). It tells us the multiplier impact on the chance of churn.

    [TABLE] Complains (5.36): A buyer who complains is 5.36 instances (or 436%) extra seemingly to churn at any given time than a buyer who doesn’t complain. It is a huge impact.

    [GRAPHIC] Complains (Excessive Hazard): That is our strongest predictor. Clients with complaints are roughly 5.4 instances extra seemingly to churn at any given second in comparison with those that don’t.

    [TABLE] Frequency of use (0.99): Whereas the p-value says that is technically important, an HR of 0.99 is successfully 1. It means the impression on churn is negligible (solely a 1% change).

    [GRAPHIC] Frequency of Use (Impartial): The sq. is sitting virtually precisely on the 0.0 line. On this particular mannequin, how typically a buyer makes use of the service doesn’t considerably change when they churn.

    [TABLE] Cost Quantity (0.83): For each one-unit enhance in cost, the chance of churn drops by 17% ($1 – 0.83 = 0.17$). Increased-paying prospects are extra steady.

    [GRAPHIC] Cost Quantity (Protecting Issue): The sq. is to the left of the zero line. Increased prices are related to a decrease threat of churn.

    We are able to additionally check out each the Survival and the Hazard capabilities for this mannequin.

    Survival and Hazard capabilities from the CPH mannequin. Picture by the creator.

    The curve is just like the KM mannequin. Let’s evaluate the survival chance on the similar thirty fourth month.

    # Extract the baseline survival chance at time 34
    survival_at_34 = cph.baseline_survival_.loc[34]
    print(f"Baseline Survival Chance at interval 34: {survival_at_34.values[0]:.4f}")
    Baseline Survival Chance at interval 34: 0.9294

    It’s virtually 3% larger, at ~93%

    And to shut this text, let’s decide two completely different prospects, one with out complaints and the opposite with complaints, and let’s evaluate their survival chances on the thirty fourth month.

    # 1. Decide a buyer (or predict for a brand new one)
    particular person = df_model.iloc[[110,111]]
    
    # 2. Predict their full survival curve
    pred_survival = cph.predict_survival_function(particular person)
    
    # 3. Get the worth at time 34
    prob110_at_34 = pred_survival.loc[34].values[0]
    prob111_at_34 = pred_survival.loc[34].values[1]
    
    print(f"Buyer 110 (no complaints) Chance of 'Surviving' to interval 34: {prob110_at_34:.2%}")
    print(f"Buyer 111 (sure compaints) Chance of 'Surviving' to interval 34: {prob111_at_34:.2%}")
    Buyer 110 (no complaints) Chance of 'Surviving' to interval 34: 93.94%
    Buyer 111 (sure compaints) Chance of 'Surviving' to interval 34: 61.68%

    Huge distinction, huh? Greater than 30%. And we will lastly calculate the time in months when every buyer is anticipated to churn.

    # Time Till Churn (Anticipated life) by buyer
    pred_churn = cph.predict_expectation(df_model.iloc[[110,111]])
    
    # Get the values in months
    prob110_churn = pred_churn.loc[110]
    prob111_churn = pred_churn.loc[111]
    
    print(f"Buyer 110 (no complaints) anticipated churn at: {prob110_churn: .0f} months")
    print(f"Buyer 111 (sure compaints)  anticipated churn at: {prob111_churn:.0f} months")
    Buyer 110 (no complaints) anticipated churn at:  41 months
    Buyer 111 (sure compaints)  anticipated churn at: 31 months

    Positively, complaints make a distinction in churn for this Telco firm.

    Earlier than You Go

    Effectively, survival evaluation is rather more than only a statistical operate. Corporations can use it to know buyer conduct.

    The Kaplan-Meier and Cox Proportional Hazard fashions present actionable insights into subscriber longevity. We’ve seen how variables like buyer worth and repair complaints immediately have an effect on churn, permitting determination makers to pursue extra focused retention methods.

    Knowledge professionals who perceive these fashions can construct a robust device for firms to enhance their relationship with their consumer base. Use these instruments to remain forward of the curve. Actually.

    Should you appreciated this content material, discover me on my web site.

    https://gustavorsantos.me

    GitHub Repository

    https://github.com/gurezende/Survival-Analysis

    References

    [1. Survival Analysis Definition] (https://en.wikipedia.org/wiki/Survival_analysis)

    [2. The Complete Introduction to Survival Analysis in Python] (https://medium.com/data-science/the-complete-introduction-to-survival-analysis-in-python-7523e17737e6)

    [3. Introduction to Customer Survival Analysis: Understanding Customer Lifetimes] (https://medium.com/@slavyolov/introduction-to-customer-survival-analysis-understanding-customer-lifetimes-6e4ba41d7724)

    [4. Ultimate Guide to Survival Analysis] (https://www.graphpad.com/guides/survival-analysis)

    [5. What is the difference between Kaplan-Meier (KM) and Cox Proportional Hazards (CPH) ratio?] (https://www.droracle.ai/articles/218904/what-is-the-difference-between-kaplan-meier-km-and-cox)

    [6. Lifelines Documentation] (https://lifelines.readthedocs.io/en/latest/)

    [7. Survival Analysis in R For Beginners] (https://www.datacamp.com/tutorial/survival-analysis-R)



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    AI Agents Need Their Own Desk, and Git Worktrees Give Them One

    April 18, 2026

    Your RAG System Retrieves the Right Data — But Still Produces Wrong Answers. Here’s Why (and How to Fix It).

    April 18, 2026

    Europe Warns of a Next-Gen Cyber Threat

    April 18, 2026

    How to Learn Python for Data Science Fast in 2026 (Without Wasting Time)

    April 18, 2026

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    The ‘Lonely Runner’ Problem Only Appears Simple

    April 19, 2026

    Binance and Bitget to probe a rally in RaveDAO’s RAVE token, which surged 4,500% in a week, after ZachXBT alleged RAVE insiders engineered a large short squeeze (Francisco Rodrigues/CoinDesk)

    April 19, 2026

    Today’s NYT Connections Hints, Answers for April 19 #1043

    April 19, 2026

    Rugged tablet boasts built-in projector and night vision

    April 19, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Florida Gambling Commission seizes over $157,000 and 231 illegal machines in raid

    January 13, 2026

    UK FinTech Coremont gains €34 million growth funding to accelerate institutional analytics platform

    December 6, 2025

    Hundreds of Thousands of User Chats with AI Chatbot Grok Are Now Public

    August 20, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.