Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Universal Audio Volt 876 USB Audio Interface Review: Pro-Level Polish
    • New York City-based Mecka AI, which trains robots with human data sourced from body sensors and iPhones, raised $60M, including a $25M Series A (Ben Weiss/Fortune)
    • Is Instagram Down? What to Know
    • It’s the Lessons We Learned Along the Way. Or, Is It?
    • The forever chemicals impacting your health
    • WiseTech CEO threatened amid job cuts; founder Richard White calls in police
    • Best Sleep Trackers of 2026: Oura, Whoop, and Eight Sleep
    • SpaceX will reserve up to 5% of its Class A shares for select employees and executives’ friends and family; 60%+ of shares have an extended lock-up (Charles Capel/Bloomberg)
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Monday, June 1
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames
    Artificial Intelligence

    Stop Writing Messy Boolean Masks: 10 Elegant Ways to Filter Pandas DataFrames

    Editor Times FeaturedBy Editor Times FeaturedJanuary 23, 2026No Comments8 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    , I mentioned how to create your first DataFrame utilizing Pandas. I discussed that the very first thing it’s worthwhile to grasp is Information buildings and arrays earlier than transferring on to information evaluation with Python.

    Pandas is a superb library for information manipulation and retrieval. Mix it with Numpy and Seaborne, and also you’ve bought your self a powerhouse for information evaluation.

    On this article, I’ll be strolling you thru sensible methods to filter information in pandas, beginning with easy situations and transferring on to highly effective strategies like .isin(), .str.startswith(), and .question(). By the top, you’ll have a toolkit of filtering strategies you may apply to any dataset.

    With out additional ado, let’s get into it!

    Importing our information

    Okay, to start out, I’ll import our pandas library

    # importing the pandas library
    import pandas as pd

    That’s the one library I’ll want for this use case

    Subsequent, I’ll import the dataset. The dataset comes from ChatGPT, btw. It consists of primary gross sales transaction information. Let’s check out our dataset.

    # testing our information
    df_sales = pd.read_csv('sales_data.csv')
    df_sales

    Right here’s a preview of the information

    It consists of primary gross sales information with columns OrderId, Buyer, Product, Class, Amount, Value, OrderDate and Area.

    Alright, let’s start our filtering!

    Filtering by a single situation

    Let’s attempt to choose all information from a specific class. As an example, I need to know what number of distinctive orders have been made within the Electronics class. To try this, it’s fairly simple

    # Filter by a single situation
    # Instance: All orders from the “Electronics” class.
    df_sales[‘Category’] == ‘Electronics’

    In Python, it’s worthwhile to distinguish between the = operator and the == operator.

    = is used to assign a worth to a variable.

    As an example

    x = 10 # Assigns the worth 10 to the variable x

    == then again is used to match two values collectively. As an example

    a = 3
    b = 3
    print(a == b) # Output: True
    
    c = 5
    d = 10
    print(c == d) # Output: False

    With that stated, let’s apply the identical notion to the filtering I did above

    # Filter by a single situation
    # Instance: All orders from the “Electronics” class.
    df_sales[‘Category’] == ‘Electronics’

    Right here, I’m principally telling Python to look by way of our whole report to discover a class named Electronics. When it finds a match, it shows a Boolean end result, True or False. Right here’s the end result

    As you may see. We’re getting a Boolean output. True means Electronics exists, whereas False means the latter. That is okay and all, however it may possibly turn out to be complicated if you happen to’re coping with a lot of information. Let’s repair that.

    # Filter by a single situation
    # Instance: All orders from the “Electronics” class.
    df_sales[df_sales[‘Category’] == ‘Electronics’]

    Right here, I simply wrapped the situation within the DataFrame. And with that, we get this output

    A lot better, proper? Let’s transfer on

    Filter rows by numeric situation

    Let’s attempt to retrieve information the place the order amount is bigger than 2. It’s fairly simple.

    # Filter rows by numeric situation
    # Instance: Orders the place Amount > 2
    df_sales[‘Quantity’] > 2

    Right here, I’m utilizing the larger than > operator. Just like our output above, we’re gonna get a Boolean end result with True and False values. Let’s repair it up actual fast.

    And there we go!

    Filter by date situation

    Filtering by date is easy. As an example.

    # Filter by date situation
    # Instance: Orders positioned after “2023–01–08”
    df_sales[df_sales[“OrderDate”] > “2023–01–08”]

    This checks for orders positioned after January 8, 2023. And right here’s the output.

    The cool factor about Pandas is that it converts string information sorts to dates mechanically. In circumstances the place you encounter an error. You would possibly need to convert to a date earlier than filtering utilizing the to_datetime() perform. Right here’s an instance

    df[“OrderDate”] = pd.to_datetime(df[“OrderDate”])

    This converts our OrderDate column to a date information kind. Let’s kick issues up a notch.

    Filtering by A number of Situations (AND, OR, NOT)

    Pandas allows us to filter on a number of situations utilizing logical operators. Nevertheless, these operators are completely different from Python’s built-in operators like (and, or, not). Listed below are the logical operators you’ll be working with essentially the most

    & (Logical AND)

    The ampersand (&) image represents AND in pandas. We use this once we’re making an attempt to fulfil two situations. On this case, each situations must be true. As an example, let’s retrieve orders from the “Furnishings” class the place Value > 500.

    # A number of situations (AND)
    # Instance: Orders from “Furnishings” the place Value > 500
    df_sales[(df_sales[“Category”] == “Furnishings”) & (df_sales[“Price”] > 500)]

    Let’s break this down. Right here, we’ve two situations. One which retrieves orders within the Furnishings class and one other that filters for costs > 500. Utilizing the &, we’re capable of mix each situations.

    Right here’s the end result.

    One report was managed to be retrieved. Taking a look at it, it meets our situation. Let’s do the identical for OR

    | (Logical OR)

    The |,vertical bar image is used to signify OR in pandas. On this case, a minimum of one of many corresponding components must be True. As an example, let’s retrieve information with orders from the “North” area OR “East” area.

    # A number of situations (OR)
    # Instance: Orders from “North” area OR “East” area.
    df_sales[(df_sales[“Region”] == “North”) | (df_sales[“Region”] == “East”)]

    Right here’s the output

    Filter with isin()

    Let’s say I need to retrieve orders from a number of prospects. I might all the time use the & operator. As an example

    df_sales[(df_sales[‘Customer’] == ‘Alice’) | (df_sales[‘Customer’] == ‘Charlie’)]

    Output:

    Nothing fallacious with that. However there’s a greater and simpler method to do that. That’s through the use of the isin() perform. Right here’s the way it works

    # Orders from prospects ["Alice", "Diana", "James"].
    df_sales[df_sales[“Customer”].isin([“Alice”, “Diana”, “James”])]

    Output:

    The code is far simpler and cleaner. Utilizing the isin() perform, I can add as many parameters as I need. Let’s transfer on to some extra superior filtering.

    Filter utilizing string matching

    Considered one of Pandas’ highly effective however underused features is string matching. It helps a ton in information cleansing duties whenever you’re making an attempt to look by way of patterns within the information in your DataFrame. Just like the LIKE operator in SQL. As an example, let’s retrieve prospects whose identify begins with “A”.

    # Prospects whose identify begins with "A".
    df_sales[df_sales[“Customer”].str.startswith(“A”)]

    Output:

    Pandas offers you the .str accessor to make use of string features. Right here’s one other instance

    # Merchandise ending with “high” (e.g., Laptop computer).
    df_sales[df_sales[“Product”].str.endswith(“high”)]

    Output:

    Filter utilizing question() technique

    When you’re coming from a SQL background, this technique could be so useful for you. Let’s attempt to retrieve orders from the electronics class the place the amount > 2. It will probably all the time go like this.

    df_sales[(df_sales[“Category”] == “Electronics”) & (df_sales[“Quantity”] >= 2)]

    Output:


    However if you happen to’re somebody making an attempt to usher in your SQL sauce. This can give you the results you want as an alternative

    df.question(“Class == ‘Electronics’ and Amount >= 2”)

    You’ll get the identical output above. Fairly just like SQL if you happen to ask me, and also you’ll be capable of ditch the & image. I’m gonna be utilizing this technique very often.

    Filter by column values in a variety

    Pandas lets you retrieve a variety of values. As an example, Orders the place the Value is between 50 and 500 would go like this

    # Orders the place the Value is between 50 and 500
    df_sales[df_sales[“Price”].between(50, 500)]

    Output:

    Fairly simple.

    Filter lacking values (NaN)

    That is most likely essentially the most useful perform as a result of, as a knowledge analyst, one of many information cleansing duties you’ll be engaged on essentially the most is filtering out lacking values. To do that in Pandas is easy. That’s through the use of the notna() perform. Let’s filter rows the place Value is just not null.

    # filter rows the place Value is just not null.
    df_sales[df_sales[“Price”].notna()]

    Output:

    And there you go. I don’t actually discover the distinction, although, however I’m gonna belief it’s executed.

    Conclusion

    The following time you open a messy CSV and marvel “The place do I even begin?”, attempt filtering first. It’s the quickest solution to minimize by way of the noise and discover the story hidden in your information.

    The transition to Python for information evaluation used to really feel like an enormous step, coming from a SQL background. However for some motive, Pandas appears method simpler and fewer time-consuming for me for filtering information
    The cool half about that is that these identical strategies work irrespective of the dataset — gross sales numbers, survey responses, internet analytics, you identify it.

    I hope you discovered this text useful.

    I write these articles as a solution to check and strengthen my very own understanding of technical ideas — and to share what I’m studying with others who is perhaps on the identical path. Be happy to share with others. Let’s study and develop collectively. Cheers!

    Be happy to say hello on any of those platforms

    Medium

    LinkedIn

    Twitter

    YouTube



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026

    Proxy-Pointer RAG: Eliminating Wasteful Entity & Relations Extraction in Knowledge Graphs

    May 31, 2026

    Solving a Murder Mystery Using Bayesian Inference

    May 31, 2026

    Rerankers Aren’t Magic Either: When the Cross-Encoder Layer Is Worth the Cost

    May 31, 2026

    Qdrant TurboQuant Explained: Is TurboQuant the Silver Bullet?

    May 30, 2026

    Meta-Cognitive Regulation Might Be the Most Important AI Skill Nobody Is Talking About

    May 30, 2026

    Comments are closed.

    Editors Picks

    Universal Audio Volt 876 USB Audio Interface Review: Pro-Level Polish

    June 1, 2026

    New York City-based Mecka AI, which trains robots with human data sourced from body sensors and iPhones, raised $60M, including a $25M Series A (Ben Weiss/Fortune)

    June 1, 2026

    Is Instagram Down? What to Know

    June 1, 2026

    It’s the Lessons We Learned Along the Way. Or, Is It?

    June 1, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    OpenAI’s New Sora App Lets You Deepfake Yourself for Entertainment

    October 1, 2025

    Lamborghini Design 90: The superbike nobody wanted

    April 18, 2026

    AI Promised the Audemars Piguet x Swatch Wristwatch. China Will Deliver It

    May 14, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.