Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • How to Study the Monotonicity and Stability of Variables in a Scoring Model using Python
    • Vision-only manipulation is hitting a wall
    • Brain-inspired AI chip could save 70% energy
    • Liquid Instruments jags more taxpayer funding in $70 million Series C
    • MAGA Is Confused About ‘Animal Farm’
    • Meta says it might be forced to withdraw its apps from New Mexico if a judge orders it to adopt the state’s proposed safety features (Thomas Barrabi/New York Post)
    • Samsung Chip Profits Soar Amid the Tech World’s RAM Shortages
    • DAIMON Robotics Wants to Give Robot Hands a Sense of Touch
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, April 30
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»EDA in Public (Part 3): RFM Analysis for Customer Segmentation in Pandas
    Artificial Intelligence

    EDA in Public (Part 3): RFM Analysis for Customer Segmentation in Pandas

    Editor Times FeaturedBy Editor Times FeaturedJanuary 1, 2026No Comments14 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    ! Should you’ve been following alongside, we’ve come a great distance. In Part 1, we did the “soiled work” of cleansing and prepping.

    In Part 2, we zoomed out to a high-altitude view of NovaShop’s world — recognizing the large storms (high-revenue nations) and the seasonal patterns (the large This fall rush).

    However right here’s the factor: a enterprise doesn’t truly promote to “months” or “nations.” It sells to human beings.

    Should you deal with each buyer precisely the identical, you’re making two very costly errors:

    • Over-discounting: Giving a “20% off” coupon to somebody who was already reaching for his or her pockets.
    • Ignoring the “Quiet” Ones: Failing to note when a previously loyal buyer stops visiting, till they’ve been gone for six months and it’s too late to win them again.

    The Resolution? Behavioural Segmentation.

    As an alternative of guessing, we’re going to make use of the info to let the shoppers inform us who they’re. We do that utilizing the gold normal of retail analytics: RFM Evaluation.

    • Recency (R): How just lately did they purchase? (Are they nonetheless engaged with us?)
    • Frequency (F): How usually do they purchase? (Are they loyal, or was it a one-off?)
    • Financial (M): How a lot do they spend? (What’s their complete enterprise affect?)

    By the tip of this half, we’ll transfer past “Prime 10 Merchandise” and really assign a particular, actionable Label to each single buyer in NovaShop’s database.

    Information Preparation: The “Lacking ID” Pivot

    Earlier than we are able to begin scoring, we now have to handle a call we made again in Half 1.

    Should you bear in mind our Preliminary Inspection, we seen that about 25% of our rows have been lacking a CustomerID. On the time, we made a strategic enterprise resolution to hold these rows. We would have liked them to calculate the correct complete income and see which merchandise have been common total.

    For RFM evaluation, the foundations change. You can not monitor habits with out a constant identification. We will’t know the way “frequent” a buyer is that if we don’t know who they’re!

    So, our first step in Half 3 is to isolate our “Trackable Universe” by filtering for rows the place a CustomerID exists.

    Engineering the RFM Metrics

    Now that we now have a dataset the place each row is linked to a particular particular person, we have to combination all their particular person transactions into three abstract numbers: Recency, Frequency, and Financial.

    Defining the Snapshot Date

    Earlier than calculating RFM, we’d like a reference time limit, generally referred to as the snapshot date.

    Right here, we take the newest transaction date within the dataset and add at some point. This snapshot date represents the second at which we’re evaluating buyer behaviour.

    snapshot_date = df['InvoiceDate'].max() + dt.timedelta(days=1)

    We added at some point, so prospects who purchased on the newest date nonetheless have a Recency worth of 1 day, not 0. This retains the metric intuitive and avoids edge-case issues.

    Aggregating Transactions on the Buyer Degree

    rfm = df.groupby(‘CustomerID’).agg({
    ‘InvoiceDate’: lambda x: (snapshot_date — x.max()).days,
    ‘InvoiceNo’: ‘nunique’,
    ‘Income’: ‘sum’
    })

    Every row in our dataset represents a single transaction. To calculate RFM, we have to collapse these transactions into one row per buyer.

    We do that by grouping the info by CustomerID and making use of totally different aggregation features:

    • Recency: For every buyer, we discover their most up-to-date buy date and calculate what number of days have handed since then.
    • Frequency: We depend the variety of distinctive invoices related to every buyer. This tells us how usually they’ve made purchases.
    • Financial: We sum the overall income generated by every buyer throughout all transactions.

    Renaming Columns for Readability

    rfm.rename(columns={
    'InvoiceDate': 'Recency',
    'InvoiceNo': 'Frequency',
    'Income': 'Financial'
    }, inplace=True)py

    The aggregation step retains the unique column names, which could be complicated. Renaming them makes the dataframe instantly readable and aligns it with normal RFM terminology.

    Now every column clearly solutions a enterprise query:

    • Recency → How just lately did the shopper buy?
    • Frequency → How usually do they buy?
    • Financial → How a lot income do they generate?

    Inspecting the End result

    print(rfm.head())

    The ultimate rfm dataframe accommodates one row per buyer, with three intuitive metrics summarizing their habits. 

    Output:

    Let’s stroll via this the way in which we might with NovaShop in an actual dialog.

    “When was the final time this buyer purchased from us?”

    That’s precisely what Recency solutions.

    Take Buyer 12347:

    • Recency = 2
    • Translation: “This buyer purchased one thing simply two days in the past.”

    They’re contemporary. They bear in mind the model. They’re nonetheless engaged.

    Now examine that to Buyer 12346:

    • Recency = 326
    • Translation: “They haven’t purchased something in virtually a 12 months.”

    Although this buyer spent rather a lot previously, they’re presently silent.

    From NovaShop’s perspective: Recency tells us who’s nonetheless listening and who would possibly want a nudge (or a wake-up name).

    “Is that this a one-time purchaser or somebody who retains coming again?”

    That’s the place Frequency is available in.

    Look once more at Buyer 12347:

    • Frequency = 7
    • They didn’t simply purchase as soon as — they got here again many times.

    Now take a look at a number of others:

    • Frequency = 1
    • One buy, then gone.

    From a enterprise perspective, frequency separates informal buyers from loyal prospects.

    “Who truly brings within the cash?”

    That’s the Financial column.
    And that is the place issues get fascinating.

    Buyer 12346:

    • Financial = £77,183.60
    • Frequency = 1
    • Recency = 326

    This tells a really particular story:

    A single, very massive order… a very long time in the past… and nothing since.

    Now examine that to Buyer 12347:

    • Decrease complete spend
    • A number of purchases
    • Very latest exercise

    Vital perception for NovaShop: A “high-value” buyer previously isn’t essentially a invaluable buyer at the moment.

    Why This View Modifications the Dialog

    If NovaShop solely checked out complete income, they could focus all their consideration on prospects like 12346.

    However RFM exhibits us that:

    • Some prospects spent rather a lot as soon as and disappeared
    • Some spend much less however keep loyal
    • Some are energetic proper now and able to be engaged

    This output helps NovaShop cease guessing and begin prioritizing:

    • Who ought to get retention emails?
    • Who wants reactivation campaigns?
    • Who’s already loyal and must be rewarded?

    Proper now, these are nonetheless uncooked numbers.

    Within the subsequent step, we’ll rank and rating these prospects, so NovaShop doesn’t should interpret rows manually. As an alternative, they’ll see clear segments like:

    • Champions
    • Loyal Clients
    • At-Threat
    • Misplaced

    That’s the place this turns into an actual decision-making instrument — not only a dataframe.

    Turning RFM Numbers Into Significant Buyer Segments

    At this stage, NovaShop has a desk stuffed with numbers. Helpful — however not precisely decision-friendly.

    A advertising group can’t realistically scan lots of or 1000’s of rows asking:

    • Is a Recency of 19 good or unhealthy?
    • Is Frequency = 2 spectacular?
    • How a lot Financial worth is “excessive”?

    Our objective is to rank prospects relative to 1 one other and switch uncooked values into scores.

    Step 1: Rating Clients by Every RFM Metric

    As an alternative of treating Recency, Frequency, and Financial as absolute values, we take a look at the place every buyer stands in comparison with everybody else.

    • Clients with newer purchases ought to rating greater
    • Clients who purchase extra usually ought to rating greater
    • Clients who spend extra ought to rating greater

    In observe, we do that by splitting every metric into quantiles (normally 4 or 5 buckets).

    Nevertheless, there’s a small real-world wrinkle. That is one thing I got here throughout whereas engaged on this mission

    In transactional datasets, it’s frequent to see:

    • Many purchasers with the identical Frequency (e.g. one-time consumers)
    • Extremely skewed Financial values
    • Small samples the place quantile binning can fail

    To maintain issues sturdy and readable, we’ll wrap the scoring logic in a small helper perform.

    def rfm_score(collection, ascending=True, n_bins=5):
    # Rank the values to make sure uniqueness
    ranked = collection.rank(methodology=’first’, ascending=ascending)
    
    # Use pd.qcut on the ranks to assign bins
    return pd.qcut(
    ranked,
    q=n_bins,
    labels=vary(1, n_bins+1)
    ).astype(int)

    To elucidate what’s happening right here:

    • We’re making a helper perform that turns a uncooked numeric column right into a clear RFM rating utilizing quantile-based binning.
    • First, the values are ranked. So, as a substitute of binning the uncooked values immediately, we rank them first. This step ensures distinctive ordering, even when many shoppers share the identical worth (a typical situation in RFM information). 
    • The ascending flag lets us flip the logic relying on the metric — for instance, decrease recency is best, whereas greater frequency and financial values are higher.
    • Subsequent, we’re making use of quantile-based binning. qcut splits the ranked values into n_bins equally sized teams. Every buyer is assigned a rating from 1 to five (by default), the place the rating represents their relative place inside the distribution.
    • Lastly, the outcomes might be transformed to integers for simple use in evaluation and segmentation.

    In brief, this perform supplies a sturdy and reusable method to attain RFM metrics with out working into duplicate bin edge errors — and with out overcomplicating the logic.

    Step 2: Making use of the Scores

    Now we are able to rating every metric cleanly and constantly:

    # Assign R, F, M scores
    rfm['R_Score'] = rfm_score(rfm['Recency'], ascending=False) # Latest purchases = excessive rating
    rfm['F_Score'] = rfm_score(rfm['Frequency']) # Extra frequent = excessive rating
    rfm['M_Score'] = rfm_score(rfm['Monetary']) # Larger spend = excessive rating

    The one particular case right here is Recency:

    • Decrease values imply newer exercise
    • So we reverse the rating with ascending=False
    • All the things else follows the pure “greater is best” rule.

    What This Means for NovaShop

    As an alternative of seeing this:

    Recency = 326
    Frequency = 1
    Financial = 77,183.60

    NovaShop now sees one thing like:

    R = 1, F = 1, M = 5

    That’s immediately extra interpretable:

    • Not latest
    • Not frequent
    • Excessive spender (traditionally)

    Step 3: Making a Mixed RFM Rating

    Now we mix these three scores right into a single RFM code:

    rfm['RFM_Score'] = (
    rfm['R_Score'].astype(str) +
    rfm['F_Score'].astype(str) +
    rfm['M_Score'].astype(str)
    )

    This produces values like:

    • 555 → Finest prospects
    • 155 → Excessive spenders who haven’t returned
    • 111 → Clients who’re seemingly gone

    Every buyer now carries a compact behavioral fingerprint. And we’re not completed but.

    Translating RFM Scores Into Buyer Segments

    Uncooked scores are good, however let’s be trustworthy: no advertising supervisor needs to take a look at 555, 154, or 311 all day.

    NovaShop wants labels that make sense at a look. That’s the place RFM segments are available.

    Step 1: Defining Segments

    Utilizing RFM scores, we are able to classify prospects into significant classes. Right here’s a typical method:

    • Champions: Prime Recency, high Frequency, high Financial (555) — your finest prospects
    • Loyal Clients: Common consumers, is probably not spending essentially the most, however hold coming again
    • Massive Spenders: Excessive Financial, however not essentially latest or frequent
    • At-Threat: Used to purchase, however haven’t returned just lately
    • Misplaced: Low scores in all three metrics — seemingly disengaged
    • Promising / New: Latest prospects with decrease frequency or financial spend

    This transforms summary numbers right into a narrative that advertising and administration can act on.

    Step 2: Mapping Scores to Segments

    Right here’s an instance utilizing easy conditional logic:

    def rfm_segment(row):
    if row['R_Score'] >= 4 and row['F_Score'] >= 4 and row['M_Score'] >= 4:
    return 'Champions'
    elif row['F_Score'] >= 4:
    return 'Loyal Clients'
    elif row['M_Score'] >= 4:
    return 'Massive Spenders'
    elif row['R_Score'] <= 2:
    return 'At-Threat'
    else:
    return 'Others'
    rfm['Segment'] = rfm.apply(rfm_segment, axis=1)

    Now every buyer has a human-readable label, making it instantly actionable.

    Let’s evaluation our outcomes utilizing rfm.head()

    Step 3: Turning Segments into Technique

    With labeled segments, NovaShop can:

    • Reward Champions → Unique offers, loyalty factors
    • Re-engage Massive Spenders & At-Threat prospects → Personalised emails or reductions
    • Focus advertising properly → Don’t waste effort on prospects who’re really misplaced

    That is the second the place information turns into technique.

    What NovaShop Ought to Do Subsequent (Key Takeaways & Suggestions)

    Initially of this evaluation, NovaShop had a well-recognized drawback:
    Lots of transactional information, however restricted readability on buyer behaviour.

    By making use of the RFM framework, we’ve turned uncooked buy historical past into a transparent, structured view of who NovaShop’s prospects are — and the way they behave.

    Now let’s discuss what to truly do with it.

    1. Shield and Reward Your Finest Clients

    Champions and Loyal Clients are already doing what each enterprise needs:

    • They purchase just lately
    • They purchase usually
    • They generate constant income

    These prospects don’t want heavy reductions — they want recognition.

    Beneficial actions:

    • Early entry to gross sales
    • Loyalty factors or VIP tiers
    • Personalised thank-you emails

    The objective right here isn’t acquisition, it’s retention.

    2. Re-Have interaction Excessive-Worth Clients Earlier than They’re Misplaced

    Probably the most harmful phase for NovaShop isn’t “Misplaced” prospects.
    It’s At-Threat and Massive Spenders.

    These prospects:

    • Have proven clear worth previously
    • However haven’t bought just lately
    • Are one step away from churning utterly

    Beneficial actions:

    • Focused win-back campaigns
    • Personalised gives (not blanket reductions)
    • Reminder emails tied to previous buy habits

    Successful again an present buyer is sort of at all times cheaper than buying a brand new one.

    3. Don’t Over-Spend money on Actually Misplaced Clients

    Some prospects will inevitably churn. RFM helps NovaShop determine these prospects early and keep away from spending advert funds, reductions and advertising effort on customers who’re unlikely to return. This isn’t about being chilly — it’s about being environment friendly.

    4. Use RFM as a Residing Framework, Not a One-Off Evaluation

    The actual energy of RFM comes when it’s:

    • Recomputed month-to-month or quarterly
    • Built-in into dashboards
    • Used to trace motion between segments over time

    For NovaShop, this implies asking questions like:

    • What number of At-Threat prospects grew to become Loyal this month?
    • Are Champions growing or shrinking?
    • Which campaigns truly transfer prospects up the ladder?

    RFM turns buyer behaviour into one thing measurable and trackable.

    Last Ideas: Closing the EDA in Public Sequence

    After I began this EDA in Public collection, I wasn’t making an attempt to construct the right evaluation or reveal superior strategies. I wished to decelerate and share how I truly suppose when working with actual information. Not the polished model, however the messy, iterative course of that normally stays hidden.

    This mission started with a loud CSV and lots of open questions. Alongside the way in which, there have been small points that solely surfaced as soon as I paid nearer consideration — dates saved as strings, assumptions that didn’t fairly maintain up, metrics that wanted context earlier than they made sense. Working via these moments in public was uncomfortable at instances, but in addition genuinely invaluable. Every correction made the evaluation stronger and extra trustworthy.

    One factor this course of bolstered for me is that almost all significant insights don’t come from complexity. They arrive from slowing down, structuring the info correctly, and asking higher questions. By the point I reached the RFM evaluation, the worth wasn’t within the formulation themselves — it was in what they compelled me to confront. A buyer who spent rather a lot as soon as isn’t essentially invaluable at the moment. Recency issues. Frequency issues. And none of those metrics imply a lot in isolation.

    Ending the collection with RFM felt deliberate. It sits on the level the place technical work meets enterprise considering, the place tables flip into conversations and numbers flip into choices. It’s additionally the place exploratory evaluation stops being purely descriptive and begins changing into sensible. At that stage, the objective is not simply to know the info, however to resolve what to do subsequent.

    Doing this work in public modified how I method evaluation. Writing issues out compelled me to elucidate my reasoning, query my assumptions, and be comfy displaying imperfect work. It jogged my memory that EDA isn’t a guidelines you rush via — it’s a dialogue with the info. Sharing that dialogue makes you extra considerate and extra accountable.

    This can be the ultimate a part of the EDA in Public collection, however it doesn’t really feel like an endpoint. All the things right here might evolve into dashboards, automated pipelines, or deeper buyer evaluation. 

    And should you’re a founder, analyst, or group working with buyer or gross sales information and making an attempt to make sense of it, this sort of exploratory work is commonly the place the most important readability comes from. These are precisely the sorts of issues I take pleasure in working via — slowly, thoughtfully, and with the enterprise context in thoughts.

    Should you’re documenting your individual analyses, I’d like to see the way you method it. And should you’re wrestling with related questions in your information and wish to speak via them, be happy to succeed in out on any of the platforms under. Good information conversations normally begin there.

    Thanks for following alongside!

    Medium

    LinkedIn

    Twitter

    YouTube



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    How to Study the Monotonicity and Stability of Variables in a Scoring Model using Python

    April 30, 2026

    A Gentle Introduction to Stochastic Programming

    April 30, 2026

    Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings

    April 30, 2026

    DeepSeek’s new AI model is rolling out quietly, not to the Wall Street market shock

    April 30, 2026

    System Design Series: Apache Flink from 10,000 Feet, and Building a Flink-powered Recommendation Engine

    April 30, 2026

    Agentic AI: How to Save on Tokens

    April 29, 2026

    Comments are closed.

    Editors Picks

    How to Study the Monotonicity and Stability of Variables in a Scoring Model using Python

    April 30, 2026

    Vision-only manipulation is hitting a wall

    April 30, 2026

    Brain-inspired AI chip could save 70% energy

    April 30, 2026

    Liquid Instruments jags more taxpayer funding in $70 million Series C

    April 30, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Smarter dual-armed garden robot gets ready to roll in 2025

    May 28, 2025

    Elon Musk Shared, Then Deleted X Post Absolving Hitler, Stalin and Mao of Genocide

    March 21, 2025

    Is an iPhone 17 Upgrade Worth It? Here’s How It Compares to Apple’s Older Models

    November 18, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.