Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Lamborghini Design 90: The superbike nobody wanted
    • Canyon Spectral:ON CF 8 Electric Mountain Bike: Beginner-Friendly, Under $5K
    • US-sanctioned currency exchange says $15 million heist done by “unfriendly states”
    • This New Air Purifier Filter Can Remove Cannabis Smoke Odor, Just in Time for 4/20
    • Portable water filter provides safe drinking water from any source
    • MAGA Is Increasingly Convinced the Trump Assassination Attempt Was Staged
    • NCAA seeks faster trial over DraftKings disputed March Madness branding case
    • AI Trusted Less Than Social Media and Airlines, With Grok Placing Last, Survey Says
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Saturday, April 18
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Write Pandas Like a Pro With Method Chaining Pipelines
    Artificial Intelligence

    Write Pandas Like a Pro With Method Chaining Pipelines

    Editor Times FeaturedBy Editor Times FeaturedApril 12, 2026No Comments17 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    I not too long ago and instantly closed it.

    Not as a result of it was flawed. The code labored. The numbers checked out.

    However I had no concept what was happening.

    There have been variables all over the place. df1, df2, final_df, final_final. Every step made sense in isolation, however as an entire it felt like I used to be tracing a maze. I needed to learn line by line simply to grasp what I had already executed.

    And the humorous factor is, that is how most of us begin with Pandas.

    You study a number of operations. You filter right here, create a column there, group and mixture. It will get the job executed. However over time, your code begins to really feel more durable to belief, more durable to revisit, and undoubtedly more durable to share.

    That was the purpose I noticed one thing.

    The hole between newbie and intermediate Pandas customers isn’t about figuring out extra capabilities. It’s about the way you construction your transformations.

    There’s a sample that quietly modifications every thing when you see it. Your code turns into simpler to learn. Simpler to debug. Simpler to construct on.

    It’s known as technique chaining.

    On this article, I’ll stroll by means of how I began utilizing technique chaining correctly, together with assign() and pipe(), and the way it modified the way in which I write Pandas code. When you’ve got ever felt like your notebooks are getting messy as they develop, this may most likely click on for you.

    The Shift: What Intermediate Pandas Customers Do In another way

    At first, I assumed getting higher at Pandas meant studying extra capabilities.

    Extra methods. Extra syntax. Extra methods to control knowledge.

    However the extra I constructed, the extra I observed one thing. The individuals who had been truly good at Pandas weren’t essentially utilizing extra capabilities than I used to be. Their code simply seemed… totally different.

    Cleaner. Extra intentional. Simpler to comply with.

    As a substitute of writing step-by-step code with numerous intermediate variables, they wrote transformations that flowed into one another. You possibly can learn their code from high to backside and perceive precisely what was taking place to the info at every stage.

    It nearly felt like studying a narrative.

    That’s when it clicked for me. The actual improve isn’t about what you employ. It’s about how you construction it.

    As a substitute of pondering:

    “What do I do subsequent to this DataFrame?”

    You begin pondering:

    “What transformation comes subsequent?”

    That small shift modifications every thing.

    And that is the place technique chaining is available in.

    Technique chaining is not only a cleaner technique to write Pandas. It’s a totally different method to consider working with knowledge. Every step takes your DataFrame, transforms it, and passes it alongside. No pointless variables. No leaping round.

    Only a clear, readable stream from uncooked knowledge to ultimate end result.

    Within the subsequent part, I’ll present you precisely what this seems like utilizing an actual instance.

    The “Earlier than”: How Most of Us Write Pandas

    To make this concrete, let’s say we need to reply a easy query:

    Which product classes are producing probably the most income every month?

    I pulled a small gross sales dataset with order particulars, product classes, costs, and dates. Nothing fancy.

    import pandas as pd
    df = pd.read_csv("gross sales.csv")
    print(df.head())

    Output

    order_id customer_id     product     class  amount  value  order_date
    0      1001        C001      Laptop computer  Electronics         1   1200  2023-01-05
    1      1002        C002  Headphones  Electronics         2    150  2023-01-07
    2      1003        C003    Sneakers      Vogue         1     80  2023-01-10
    3      1004        C001     T-Shirt      Vogue         3     25  2023-01-12
    4      1005        C004     Blender         Residence         1     60  2023-01-15

    Now, right here is how I might have written this not too way back:

    # Create a brand new column for income
    df["revenue"] = df["quantity"] * df["price"]
    
    # Filter for orders from 2023 onwards
    df_filtered = df[df["order_date"] >= "2023-01-01"]
    
    # Convert order_date to datetime and extract month
    df_filtered["month"] = pd.to_datetime(df_filtered["order_date"]).dt.to_period("M")
    
    # Group by class and month, then sum income
    grouped = df_filtered.groupby(["category", "month"])["revenue"].sum()
    
    # Convert Collection again to DataFrame
    end result = grouped.reset_index()
    
    # Type by income descending
    end result = end result.sort_values(by="income", ascending=False)
    print(end result)

    This works. You get your reply.

      class    month  income
    1  Electronics  2023-02     2050
    2  Electronics  2023-03     1590
    0  Electronics  2023-01     1500
    8         Residence  2023-03      225
    6         Residence  2023-01      210
    5      Vogue  2023-03      205
    7         Residence  2023-02      180
    4      Vogue  2023-02      165
    3      Vogue  2023-01      155

    However there are a number of issues that begin to present up as your evaluation grows.

    First, the stream is tough to comply with. It’s important to maintain observe of df, df_filtered, grouped, and end result. Every variable represents a barely totally different state of the info.

    Second, the logic is scattered. The transformation is occurring step-by-step, however not in a method that feels related. You might be mentally stitching issues collectively as you learn.

    Third, it’s more durable to reuse or take a look at. If you wish to tweak one a part of the logic, you now must hint the place every thing is being modified.
    That is the sort of code that works effective at this time… however turns into painful while you come again to it per week later.

    Now evaluate that to how the identical logic seems while you begin pondering in transformations as an alternative of steps.

    The “After”: When All the pieces Clicks

    Now let’s resolve the very same downside once more.

    Identical dataset. Identical purpose.

    Which product classes are producing probably the most income every month?

    Right here’s what it seems like while you begin pondering in transformations:

    end result = (
    pd.read_csv("gross sales.csv") # Begin with uncooked knowledge
    .assign(
    # Create income column
    income=lambda df: df["quantity"] * df["price"],
    
    # Convert order_date to datetime
    order_date=lambda df: pd.to_datetime(df["order_date"]),
    
    # Extract month from order_date
    month=lambda df: df["order_date"].dt.to_period("M")
    )
    # Filter for orders from 2023 onwards
    .loc[lambda df: df["order_date"] >= "2023-01-01"]
    
    # Group by class and month, then sum income
    .groupby(["category", "month"], as_index=False)["revenue"]
    .sum()
    
    # Type by income descending
    .sort_values(by="income", ascending=False)
    )
    print(end result)

    Identical output. Utterly totally different really feel.

     class    month  income
    1  Electronics  2023-02     2050
    2  Electronics  2023-03     1590
    0  Electronics  2023-01     1500
    8         Residence  2023-03      225
    6         Residence  2023-01      210
    5      Vogue  2023-03      205
    7         Residence  2023-02      180
    4      Vogue  2023-02      165
    3      Vogue  2023-01      155

    The very first thing you discover is that every thing flows. There isn’t a leaping between variables or making an attempt to recollect what df_filtered or grouped meant.

    Every step builds on the final one.

    You begin with the uncooked knowledge, then:

    • create income
    • convert dates
    • extract the month
    • filter
    • group
    • mixture
    • kind

    Multi function steady pipeline.

    You’ll be able to learn it high to backside and perceive precisely what is occurring to the info at every stage.

    That’s the half that shocked me probably the most.

    It’s not simply shorter code. It’s clearer code.

    And when you get used to this, going again to the outdated method feels… uncomfortable.

    There are a few issues taking place right here that make this work so properly.

    We’re not simply chaining strategies. We’re utilizing a number of particular instruments that make chaining truly sensible.

    Within the subsequent part, let’s break these down.

    Breaking Down the Sample

    After I first noticed this type of Pandas code, it seemed a bit intimidating.

    All the pieces was chained collectively. No intermediate variables. Loads taking place in a small area.

    However as soon as I slowed down and broke it into items, it began to make sense.

    There are actually simply three concepts carrying every thing right here:

    • technique chaining
    • assign()
    • pipe()

    Let’s undergo them one after the other.

    Technique Chaining (The Basis)

    At its core, technique chaining is easy. Every step takes a DataFrame, applies a metamorphosis, and returns a brand new DataFrame. That new DataFrame is instantly handed into the subsequent step.

    So as an alternative of this:

    df = step1(df) 
    
    df = step2(df) 
    
    df = step3(df)

    You do that:

    df = step1(df).step2().step3()

    That’s actually it.

    However the influence is larger than it seems.

    It forces you to suppose when it comes to stream. Every line turns into one transformation. You might be not leaping round or storing short-term states. You might be simply shifting ahead.

    That’s the reason the code begins to really feel extra readable. You’ll be able to comply with the transformation from begin to end with out holding a number of variations of the info in your head.

    assign() — Holding All the pieces within the Stream

    That is the one that actually unlocked chaining for me.

    Earlier than this, anytime I needed to create a brand new column, I might break the stream:

    df["revenue"] = df["quantity"] * df["price"]

    That works, however it interrupts the pipeline.

    assign() permits you to do the identical factor with out breaking the chain:

    .assign(income=lambda df: df["quantity"] * df["price"])

    At first, the lambda df: half felt bizarre.

    However the concept is easy. You might be saying:

    “Take the present DataFrame, and use it to outline this new column.”

    The important thing profit is that every thing stays in a single place. You’ll be able to see the place the column is created and the way it’s used, all throughout the similar stream.

    It additionally encourages a cleaner type the place transformations are grouped logically as an alternative of scattered throughout the pocket book.

    pipe() — The place Issues Begin to Really feel Highly effective

    pipe() is the one I ignored at first.

    I assumed, “I can already chain strategies, why do I want this?”
    Then I bumped into an issue.

    Some transformations are simply too complicated to suit neatly into a series.

    You both:
    write messy inline logic
    or break the chain utterly

    That’s the place pipe() is available in.

    It permits you to go your DataFrame right into a customized perform with out breaking the stream.

    For instance:

    def filter_high_value_orders(df):
        return df[df["revenue"] > 500]
    
    df = (
        pd.read_csv("gross sales.csv")
        .assign(income=lambda df: df["quantity"] * df["price"])
        .pipe(filter_high_value_orders)
    )

    Now your logic is cleaner, reusable and simpler to check

    That is the purpose the place issues began to really feel totally different for me.

    As a substitute of writing lengthy scripts, I used to be beginning to construct small, reusable transformation steps.

    And that’s when it clicked.

    This isn’t nearly writing cleaner Pandas code. It’s about writing code that scales as your evaluation will get extra complicated.

    Within the subsequent part, I need to present how this modifications the way in which you concentrate on working with knowledge totally.

    Considering in Pipelines (The Actual Improve)

    Up till this level, it would really feel like we simply made the code look nicer.

    However one thing deeper is occurring right here.

    Whenever you begin utilizing technique chaining constantly, the way in which you concentrate on working with knowledge begins to vary.

    Earlier than, my strategy was very step-by-step.

    I might take a look at a DataFrame and suppose:

    “What do I do subsequent?”

    • Filter it.
    • Modify it.
    • Retailer it.
    • Transfer on.

    Every step felt a bit disconnected from the final.

    However with technique chaining, that query modifications.

    Now it turns into:

    “What transformation comes subsequent?”

    That shift is small, however it modifications the way you construction every thing.

    You cease pondering when it comes to remoted steps and begin pondering when it comes to a stream. A pipeline. Information is available in, will get remodeled stage by stage, and produces an output.

    And the code displays that.

    Every line is not only doing one thing. It’s a part of a sequence. A transparent development from uncooked knowledge to perception.

    This additionally makes your code simpler to purpose about.

    If one thing breaks, you would not have to scan your entire pocket book. You’ll be able to take a look at the pipeline and ask:

    • which transformation may be flawed?
    • the place did the info change in an sudden method?

    It turns into simpler to debug as a result of the logic is linear and visual.

    One other factor I observed is that it naturally pushes you towards higher habits.

    • You begin writing smaller transformations.
    • You begin naming issues extra clearly.
    • You begin serious about reuse with out even making an attempt.

    And that’s the place it begins to really feel much less like “simply Pandas” and extra like constructing precise knowledge workflows.

    At this level, you aren’t simply analyzing knowledge.

    You might be designing how knowledge flows.

    Actual-World Refactor: From Messy to Clear

    Let me present you the way this truly performs out.

    As a substitute of leaping straight from messy code to an ideal chain, I need to stroll by means of how I might refactor this step-by-step. That is normally the way it occurs in actual life anyway.

    Step 1: The Beginning Level (Messy however Works)

    df = pd.read_csv("gross sales.csv")  # Load dataset
    
    # Create income column
    df["revenue"] = df["quantity"] * df["price"]
    
    # Filter orders from 2023 onwards
    df_filtered = df[df["order_date"] >= "2023-01-01"]
    
    # Convert order_date and extract month
    df_filtered["month"] = pd.to_datetime(df_filtered["order_date"]).dt.to_period("M")
    
    # Group by class and month, then sum income
    grouped = df_filtered.groupby(["category", "month"])["revenue"].sum()
    
    # Convert to DataFrame
    end result = grouped.reset_index()
    
    # Type outcomes
    end result = end result.sort_values(by="income", ascending=False)

    Nothing flawed right here. That is how most of us begin.

    However we will already see:

    • too many intermediate variables
    • transformations are scattered
    • more durable to comply with because it grows

    Step 2: Cut back Pointless Variables

    First, take away variables that aren’t actually wanted.

    df = pd.read_csv("gross sales.csv")  # Load dataset
    
    # Create new columns upfront
    df["revenue"] = df["quantity"] * df["price"]
    df["month"] = pd.to_datetime(df["order_date"]).dt.to_period("M")
    
    end result = (
        # Filter related rows
        df[df["order_date"] >= "2023-01-01"]
        
        # Combination income by class and month
        .groupby(["category", "month"])["revenue"]
        .sum()
        
        # Convert to DataFrame
        .reset_index()
        
        # Type outcomes
        .sort_values(by="income", ascending=False)
    )

    Already higher. There are fewer shifting components, and a few stream is beginning to seem

    Step 3: Introduce Fundamental Chaining

    Now we begin chaining extra intentionally.

    end result = (
        pd.read_csv("gross sales.csv")  # Begin with uncooked knowledge
        
        .assign(
            # Create income column
            income=lambda df: df["quantity"] * df["price"],
            
            # Extract month from order_date
            month=lambda df: pd.to_datetime(df["order_date"]).dt.to_period("M")
        )
        
        # Filter for latest orders
        .loc[lambda df: df["order_date"] >= "2023-01-01"]
        
        # Group and mixture
        .groupby(["category", "month"])["revenue"]
        .sum()
        
        # Convert to DataFrame
        .reset_index()
        
        # Type outcomes
        .sort_values(by="income", ascending=False)
    )

    At this level, the stream is obvious, transformations are grouped logically, and we’re not leaping between variables.

    Step 4: Clear It Up Additional

    Small tweaks make a giant distinction.

    end result = (
        pd.read_csv("gross sales.csv")  # Load knowledge
        
        .assign(
            # Create income
            income=lambda df: df["quantity"] * df["price"],
            
            # Guarantee order_date is datetime
            order_date=lambda df: pd.to_datetime(df["order_date"]),
            
            # Extract month from order_date
            month=lambda df: df["order_date"].dt.to_period("M")
        )
        
        # Filter related time vary
        .loc[lambda df: df["order_date"] >= "2023-01-01"]
        
        # Combination income
        .groupby(["category", "month"], as_index=False)["revenue"]
        .sum()
        
        # Type outcomes
        .sort_values(by="income", ascending=False)
    )

    Now there aren’t any redundant conversions, there’s cleaner grouping and extra constant construction.

    Step 5: When pipe() Turns into Helpful

    Let’s say the logic grows. Perhaps we solely care about high-revenue rows.

    As a substitute of stuffing that logic into the chain, we extract it:

    def filter_high_revenue(df):
        # Hold solely rows the place income is above threshold
        return df[df["revenue"] > 500]

    Now we plug it into the pipeline:

    end result = (
        pd.read_csv("gross sales.csv")  # Load knowledge
        
        .assign(
            # Create income
            income=lambda df: df["quantity"] * df["price"],
            
            # Convert and extract time options
            order_date=lambda df: pd.to_datetime(df["order_date"]),
            month=lambda df: df["order_date"].dt.to_period("M")
        )
        
        # Apply customized transformation
        .pipe(filter_high_revenue)
        
        # Filter by date
        .loc[lambda df: df["order_date"] >= "2023-01-01"]
        
        # Combination outcomes
        .groupby(["category", "month"], as_index=False)["revenue"]
        .sum()
        
        # Type output
        .sort_values(by="income", ascending=False)
    )

    That is the place it begins to really feel totally different. Your code is not only a script. Now, it’s a sequence of reusable transformations.

    What I like about this course of is that you do not want to leap straight to the ultimate model.

    You’ll be able to evolve your code regularly.

    • Begin messy.
    • Cut back variables.
    • Introduce chaining.
    • Extract logic when wanted.

    That’s how this sample truly sticks.

    Subsequent, let’s discuss a number of errors I made whereas studying this so you don’t run into the identical points.

    Widespread Errors (I Made Most of These)

    After I began utilizing technique chaining, I undoubtedly overdid it.
    All the pieces felt cleaner, so I attempted to power every thing into a series. That led to some… questionable code.

    Listed below are a number of errors I bumped into so that you would not have to.

    1. Over-Chaining All the pieces

    Sooner or later, I assumed longer chains = higher code.
    Not true.

    # This will get exhausting to learn in a short time
    df = (
        df
        .assign(...)
        .loc[...]
        .groupby(...)
        .agg(...)
        .reset_index()
        .rename(...)
        .sort_values(...)
        .question(...)
    )

    Sure, it’s technically clear. However now it’s doing an excessive amount of in a single place.

    Repair:

    • Break your chain when it begins to really feel dense.
    • Group associated transformations collectively
    • Cut up logically totally different steps
    • Suppose readability first, not cleverness.

    2. Forcing Logic Into One Line

    I used to cram complicated logic into assign() or loc() simply to maintain the chain going.

    That normally makes issues worse.

    .assign( 
      revenue_flag=lambda df: np.the place( 
        (df["quantity"] * df["price"] > 500) & (df["category"] == "Electronics"), 
        "Excessive", 
        "Low" ) )

    This works, however it isn’t very readable.

    Repair:

    If the logic is complicated, extract it.

    def add_revenue_flag(df):
        df["revenue_flag"] = np.the place(
            (df["quantity"] * df["price"] > 500) & (df["category"] == "Electronics"),
            "Excessive",
            "Low"
        )
        return df
    
    df = df.pipe(add_revenue_flag)

    Cleaner. Simpler to check. Simpler to reuse.

    3. Ignoring pipe() for Too Lengthy

    I averted pipe() at first as a result of it felt pointless. However with out it, you hit a ceiling.

    You both:
    break your chain
    or write messy inline logic

    Repair:

    • Use pipe() as quickly as your logic stops being easy.
    • It’s what turns your code from a script into one thing modular.

    4. Dropping Readability With Poor Naming

    Whenever you begin utilizing customized capabilities with pipe(), naming issues quite a bit.

    Unhealthy:
    def remodel(df): ...

    Higher:
    def filter_high_revenue(df): ...

    Now your pipeline reads like a narrative:
    .pipe(filter_high_revenue)

    That small change makes a giant distinction.

    5. Considering This Is About Shorter Code

    This one took me some time to appreciate. Technique chaining isn’t about writing fewer strains. It’s about writing code that’s simpler to learn, purpose about and are available again to later

    Typically the chained model is longer. That’s effective. Whether it is clearer, it’s higher.

    Let’s wrap this up and tie it again to the “intermediate” concept.

    Conclusion: Leveling Up Your Pandas Sport

    Should you’ve adopted alongside, you’ve seen a small shift with a huge impact.

    By pondering in transformations as an alternative of steps, utilizing technique chaining, assign(), and pipe(), your code stops being only a assortment of strains and turns into a transparent, readable stream.

    Right here’s what modifications while you internalize this sample:

    • You’ll be able to learn your code high to backside with out getting misplaced.
    • You’ll be able to reuse transformations simply, making your notebooks extra modular.
    • You’ll be able to debug and take a look at with out tracing dozens of intermediate variables.
    • You begin pondering in pipelines, not simply steps.

    That is precisely what separates a newbie from an intermediate Pandas consumer.

    You’re not simply “making it work.” You’re designing your evaluation in a method that scales, is maintainable, and appears good to anybody who reads it—even future you.

    Attempt It Your self

    Choose a messy pocket book you’ve been engaged on and refactor only one half utilizing technique chaining.

    • Begin with assign() for brand new columns
    • Use loc[] to filter
    • Introduce pipe() for any customized logic

    You’ll be shocked how a lot clearer your pocket book turns into, nearly instantly.

    That’s it. You’ve simply unlocked intermediate Pandas.

    The next move? Hold working towards, construct your personal pipelines, and see how your serious about knowledge transforms alongside together with your code.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    A Practical Guide to Memory for Autonomous LLM Agents

    April 17, 2026

    You Don’t Need Many Labels to Learn

    April 17, 2026

    Beyond Prompting: Using Agent Skills in Data Science

    April 17, 2026

    6 Things I Learned Building LLMs From Scratch That No Tutorial Teaches You

    April 17, 2026

    Introduction to Deep Evidential Regression for Uncertainty Quantification

    April 17, 2026

    memweave: Zero-Infra AI Agent Memory with Markdown and SQLite — No Vector Database Required

    April 17, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    Lamborghini Design 90: The superbike nobody wanted

    April 18, 2026

    Canyon Spectral:ON CF 8 Electric Mountain Bike: Beginner-Friendly, Under $5K

    April 18, 2026

    US-sanctioned currency exchange says $15 million heist done by “unfriendly states”

    April 18, 2026

    This New Air Purifier Filter Can Remove Cannabis Smoke Odor, Just in Time for 4/20

    April 18, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Virtual Power Plants Are Having Their Moment

    December 9, 2025

    Former chairman of Mashpee tribe pleads guilty to $177,000 tax fraud

    July 29, 2025

    Multi-fuel military drone can spy, jam, and strike from the sea

    February 19, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.