Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt
    • GM reimagines Hummer off-roader with California ideas unit
    • London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform
    • How to Edit, Merge, and Split PDFs With Free Online Tools
    • Florida crackdown targets illegal machines in Sarasota
    • Audiophile-Oriented Noble Audio Debuts More Affordable Osprey Earbuds
    • New radio bursts detected from binary stars
    • Remarkable, Catalysr and Indigenous pre-accelerators score NSW government support for diverse founders
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Tuesday, June 2
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»EDA in Public (Part 2): Product Deep Dive & Time-Series Analysis in Pandas
    Artificial Intelligence

    EDA in Public (Part 2): Product Deep Dive & Time-Series Analysis in Pandas

    Editor Times FeaturedBy Editor Times FeaturedDecember 20, 2025No Comments10 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    ! Welcome again to the “EDA in Public” sequence! That is Half 2 of the sequence; in the event you haven’t seen Half 1 but, learn it here. Right here’s a recap of what we conquered.

    In Half 1, we took a messy, half-million-row gross sales file and acquired it into form. Particularly, we:

    • Sampled the information for quicker processing.
    • Fastened lacking values and standardised textual content entries (like ‘EIRE’ to ‘Eire’).
    • Filtered out all of the noise — returns, cancellations, and transactions with zero worth.
    • Engineered our most crucial function: the Income column.
    • Achieved our first enterprise perception: the High 10 Income-Producing Nations.

    We now have a clear, revenue-ready Pandas DataFrame able to be exploited for insights! By the top of this sequence, I wish to have mastered exploratory knowledge evaluation utilizing Pandas. Be happy to observe alongside in the event you’re a knowledge fanatic.

    Now we shift our focus to delivering really impactful enterprise intelligence for NovaShop. Our total objective for Half 2 is to reply the basic questions: What merchandise are the perfect performers, and when is the perfect time to promote them?

    This half is all about evaluation utilizing highly effective Pandas aggregation methods (.groupby()) and have engineering utilizing the datetime column (.dt accessor). We’ll break down our evaluation into two key areas:

    • Product Deep Dive: Figuring out the merchandise that transfer essentially the most models versus people who usher in essentially the most money.
    • Gross sales Timing (Time-Sequence): Uncovering the seasonality of gross sales (month-to-month peaks) and the operational traits (busiest hour of the day).

    With out additional ado, let’s get to it.

    Loading the clear knowledge

    To enhance efficiency, let’s export our cleaned dataset and import it once more for evaluation

    df.to_csv('online_retail_clean.csv', index=False)

    Now, I can create a brand new mission and begin afresh.

    When loading CSVs that comprise date columns, I attempt to use the parse_dates argument in pd.read_csv(). This protects me the step of manually changing the column sort later, guaranteeing the date options are appropriate from the beginning.

    import pandas as pd
    df = pd.read_csv('online_retail_clean.csv', parse_dates=['InvoiceDate'])
    print(f"Efficiently loaded clear knowledge. Whole rows: {len(df)}")

    Output:

    Efficiently loaded clear knowledge. Whole rows: 52933

    Product Deep Dive: Who Are the True Cash Makers?

    With the clear knowledge loaded, let’s soar straight into product efficiency. Merchandise will be thought-about “best-sellers” in two other ways: by quantity (models bought) or by worth (income generated).

    Perception 2: High 10 Bestsellers (by Amount)

    The amount checklist tells us which merchandise are widespread and in excessive demand, even when their worth is low. That is vital for warehouse stocking and stock administration.

    We’ll group by the product Description, sum the Amount, and type it descendingly.

    # High 10 Bestsellers (by Amount)
    df.groupby(‘Description’)[‘Quantity’].sum().sort_values(ascending=False).head(10)

    Output:

    Description
    MEDIUM CERAMIC TOP STORAGE JAR 74698
    WORLD WAR 2 GLIDERS ASSTD DESIGNS 9419
    GIN + TONIC DIET METAL SIGN 5016
    ASSORTED COLOURS SILK FAN 4684
    JUMBO BAG RED RETROSPOT 4671
    PACK OF 72 RETROSPOT CAKE CASES 3561
    MINI PAINT SET VINTAGE 3285
    ASSORTED COLOUR BIRD ORNAMENT 2783
    BROCADE RING PURSE 2525
    POPCORN HOLDER 2508
    Identify: Amount, dtype: int64

    Perception 3: High 10 Income Drivers (The True Cash-Makers)

    That is what NovaShop actually needs to know. These merchandise drive profitability. They won’t promote as many models because the gadgets within the earlier checklist, however their greater worth or bigger bulk orders make them much more worthwhile.

    We observe the identical course of, however this time we mixture on the Income column we engineered in Half 1.

    # High 10 Income Drivers (The True Cash-Makers)
    df.groupby(‘Description’)[‘Revenue’].sum().sort_values(ascending=False).head(10)

    Output:

    Description
    MEDIUM CERAMIC TOP STORAGE JAR 77785.74
    DOTCOM POSTAGE 18456.13
    REGENCY CAKESTAND 3 TIER 15093.38
    AMAZON FEE 13541.33
    PARTY BUNTING 9491.60
    GIN + TONIC DIET METAL SIGN 9480.51
    JUMBO BAG RED RETROSPOT 8996.65
    Handbook 8966.96
    CHILLI LIGHTS 8395.20
    POSTAGE 7194.51
    Identify: Income, dtype: float64

    Dialogue: What These Lists Inform NovaShop

    That is the place the facility of EDA really shines, as a result of these two lists inform a really completely different story:

    • The All-Star Product: The “MEDIUM CERAMIC TOP STORAGE JAR” is clearly the winner, dominating each lists by a large margin. It’s excessive quantity and excessive worth. This merchandise ought to be NovaShop’s major focus for stock, advertising, and enlargement.
    • A shocking income supply: Whereas exploring the information, I seen gadgets like “DOTCOM POSTAGE,” “AMAZON FEE,” “Handbook,” and “POSTAGE.” These aren’t precise merchandise — they’re service charges and guide changes. But collectively, they create in nearly £48,000 in income. NovaShop ought to double-check whether or not these charges are being counted of their gross revenue. It’s a simple factor to overlook, however it might be a quiet and significant supply of earnings hiding in plain sight.
    • The Excessive-Worth Bodily Items: Objects just like the “REGENCY CAKESTAND 3 TIER” and “CHILLI LIGHTS” make the income checklist however not the amount checklist. This tells us they’re high-price gadgets that contribute considerably to the whole money stream, and they’re value focused advertising efforts.

    We now have a whole image of what NovaShop sells finest. Time to determine when!

    Getting ready for Time-Sequence Evaluation: Characteristic Engineering

    We all know what bought effectively. Now, let’s work out when gross sales peak. Our objective is to interrupt down the InvoiceDate column into parts that enable us to group and analyze gross sales by 12 months, month, day, and even hour.

    This can be a traditional function engineering transfer. Since we already made positive InvoiceDate is a correct Pandas datetime object (because of parse_dates in our loading step!), we are able to use the extraordinarily helpful .dt accessor.

    Extracting Time Elements

    To know seasonality (month-to-month) and operational effectivity (hourly), we’ll create 4 new columns.

    • Yr — For long-term comparisons
    • Month — For figuring out seasonal peaks (e.g., This fall rush).
    • DayName — To seek out the busiest day of the week.
    • Hour — To find out peak operational hours.

    Let’s generate these columns:

    print(“n — — Time Sequence Characteristic Extraction — -”)
    
    # Extract Yr, Month, DayName, and Hour from the InvoiceDate column
    df[‘Year’] = df[‘InvoiceDate’].dt.12 months
    df[‘Month’] = df[‘InvoiceDate’].dt.month
    df[‘DayName’] = df[‘InvoiceDate’].dt.day_name()
    df[‘Hour’] = df[‘InvoiceDate’].dt.hour
    print(“New time options added: Yr, Month, DayName, Hour.”)
    print(df[[‘InvoiceDate’, ‘Year’, ‘Month’, ‘DayName’, ‘Hour’]].head())

    Output:

    --- Time Sequence Characteristic Extraction ---
    New time options added: Yr, Month, DayName, Hour.
    InvoiceDate Yr Month DayName Hour
    0 2011-06-01 12:05:00 2011 6 Wednesday 12
    1 2011-05-27 17:14:00 2011 5 Friday 17
    2 2011-04-21 17:05:00 2011 4 Thursday 17
    3 2011-11-16 10:39:00 2011 11 Wednesday 10
    4 2011-03-10 08:40:00 2011 3 Thursday 8

    Good! We are able to transfer proper into mapping the gross sales patterns within the subsequent part.

    Mapping Gross sales Patterns (Time-Sequence Insights)

    With our time options prepared, we are able to now ask and reply our questions on when gross sales happen. We’ll begin huge (month-to-month traits) and drill down small (hourly traits).

    Perception 4: Gross sales Developments by Month (Seasonality)

    Month-to-month aggregation is essential for recognizing seasonality — the predictable sample of gross sales repeating yearly. This knowledge is used for monetary forecasting and planning stock ranges for the height season.

    We’ll group by the Month column and calculate the whole income for every.

    print(“n — — Perception 4: Gross sales Developments by Month (Seasonality) — -”)
    # Group by month and sum the income
    monthly_revenue = df.groupby(‘Month’)[‘Revenue’].sum().sort_values(ascending=False)
    # Print the outcomes, which ought to present the seasonal peaks
    print(monthly_revenue)

    Output:

    --- Perception 4: Gross sales Developments by Month (Seasonality) ---
    Month
    11 143576.360
    1 142013.420
    12 133178.980
    10 119533.540
    9 103929.420
    3 72968.270
    8 71302.890
    5 70424.510
    6 68722.520
    7 68713.831
    4 51882.010
    2 50178.730
    Identify: Income, dtype: float64

    Interpretation: The Huge This fall and January Shock

    The information clearly reveals that NovaShop’s gross sales are closely skewed towards the top of the 12 months, confirming the everyday retail This fall (October, November, December) rush.

    • Peak Season: November is the height month by a slight margin, adopted carefully by December. That is the time to top off on the Medium Ceramic High Storage Jars and run focused adverts.
    • The January Surge: The second-highest month is January! This is a wonderful perception for NovaShop. It suggests clients could also be utilizing Christmas reward cash or profiting from post-holiday gross sales. This era shouldn’t be handled as a stoop however as a secondary high-volume gross sales alternative.
    • Low Season: February and April are the bottom performers, which helps NovaShop plan for decrease staffing and advertising budgets throughout these intervals.

    Perception 5: Gross sales Developments by Day of Week and Hour (Operational Planning)

    Whereas seasonality is about finance and stock, every day and hourly traits are about operations. NovaShop can use this to schedule warehouse workers, optimize advert spend, and time electronic mail campaigns.
    We’ll run two separate aggregations right here: one for the day of the week and one for the hour of the day.

    Income by Day of Week

    # 1. Gross sales by Day of Week
    daily_revenue = df.groupby(‘DayName’)[‘Revenue’].sum()
    print(“n — — Income by Day of Week — -”)
    print(daily_revenue)

    Output:

    --- Income by Day of Week ---
    DayName
    Friday 163861.320
    Monday 171026.230
    Sunday 83125.890
    Thursday 219342.980
    Tuesday 282796.741
    Wednesday 176271.320
    Identify: Income, dtype: float64

    Interpretation: Tuesday is the Energy Day

    The information reveals a transparent and actionable sample for the work week:

    • Peak Gross sales: Tuesday completely dominates the week, producing nearly 30% extra income than the subsequent busiest day, Thursday. This can be a prime day for launching new merchandise or operating high-impact flash gross sales.
    • Stoop Days: Unsurprisingly, gross sales drop dramatically on Sunday. NovaShop would possibly think about this a low-priority day for electronic mail advertising or customer support staffing.
    • Lacking Day: The dataset incorporates no transactions for Saturday in our pattern (or maybe the complete set, which is widespread in B2B datasets).

    Income by Hour of Day

    # 2. Gross sales by Hour of Day
    hourly_revenue = df.groupby(‘Hour’)[‘Revenue’].sum().sort_index()
    print(“n — — Income by Hour of Day — -”)
    print(hourly_revenue)

    Output:

    --- Income by Hour of Day ---
    Hour
    7 2830.910
    8 26597.800
    9 76768.750
    10 209809.971
    11 115233.600
    12 142474.460
    13 130348.290
    14 119241.520
    15 145178.980
    16 70620.550
    17 36148.030
    18 14022.790
    19 4397.130
    20 2751.700

    Interpretation: The Late Morning Growth

    The hourly knowledge factors on to operational wants:

    • Huge Peak: Gross sales explode at 10 AM (Hour 10), producing nearly double the income of the subsequent few hours. That is the optimum time for NovaShop to launch electronic mail advertising and social media campaigns.
    • Sustained Exercise: Gross sales stay excessive from 11 AM by means of 3 PM, suggesting a powerful noon and post-lunch buying window.
    • Finish of Day: Exercise falls off sharply after 4 PM, with little or no income generated within the night (7 PM onward). This means NovaShop’s buyer base is primarily desk-based, buying throughout conventional working hours.

    Conclusion: Setting the Stage for Buyer Segmentation

    We have now formally completed the heavy-duty EDA! We all know what sells finest (the Ceramic Jar and costs), the place gross sales are strongest (the UK), and when the gross sales peaks happen (This fall, Tuesdays, and 10 AM).

    Nonetheless, we’re nonetheless lacking essentially the most worthwhile piece of the puzzle: The Who. The perfect firms don’t simply know what sells; they know who their most respected clients are and the right way to hold them.

     In Half 3, we are going to dive into essentially the most superior evaluation method on this sequence: RFM (Recency, Frequency, Financial) Buyer Segmentation. This can enable us to categorise clients into tiers like “Champions,” “Loyal Clients,” and “At-Danger,” giving NovaShop a real roadmap for personalised advertising.

    I hope you discovered this text useful. Be happy to say hello to me on any of those platforms. I actually recognize your suggestions.

    Medium

    LinkedIn

    Twitter

    YouTube



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Escaping the Valley of Choice in BI

    June 2, 2026

    Ensuring Data Integrity with Cryptographic Hashing and the Ethereum Blockchain

    June 1, 2026

    RAG Is Not Machine Learning, and the ML Toolkit Solves the Wrong Problem

    June 1, 2026

    How to Combine Claude Code and Codex for Maximum Coding Power

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Comments are closed.

    Editors Picks

    Robots-Blog | Humanoide Robotik aus Deutschland: igus bringt neuen Serviceroboter auf den Markt

    June 2, 2026

    GM reimagines Hummer off-roader with California ideas unit

    June 2, 2026

    London’s DEScycle secures over €10 million in grant funding to scale critical metals recovery platform

    June 2, 2026

    How to Edit, Merge, and Split PDFs With Free Online Tools

    June 2, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Today’s NYT Mini Crossword Answers for April 27

    April 27, 2026

    Anyone Can Now Access the Largest Space Map Ever Created. Here’s How

    June 11, 2025

    Amazon acquires Zurich-based Rivr, developer of stair-climbing robots for doorstep delivery

    March 21, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.