Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • AI art for your walls
    • With SXSW Sydney gone, the S2S Summit returns to MCA in September 2026
    • Take Control of Your Debt With These Free Tools
    • Apple’s revamped Siri may launch in beta, and will have an option to auto-delete chats; Apple plans to add Suggested Genmoji to iOS 27 and iPadOS 27 (Mark Gurman/Bloomberg)
    • Doctors Explain Why Your Smartwatch Is Giving You Anxiety, and How to Stop It
    • Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling
    • The Next Generation of Night Vision
    • Startmate hauls 19 ANZ founders to San Fran for first time since the pandemic
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Sunday, May 17
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling
    Artificial Intelligence

    Pandas Isn’t Going Anywhere: Why It’s Still My Go-To for Data Wrangling

    Editor Times FeaturedBy Editor Times FeaturedMay 17, 2026No Comments6 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    studying knowledge science in 2020, Pandas was one of the crucial in style instruments. Though new instruments give attention to enhancing Pandas’ weaknesses in dealing with very giant datasets, I nonetheless use Pandas for a lot of knowledge cleansing, processing, and evaluation duties. Sure, Pandas provides me a tough time when working with billions of rows, however it’s undoubtedly greater than sufficient for working with something under that.

    I see Pandas being utilized in not just for EDA or in notebooks but in addition in manufacturing programs.

    On this article, I’ll go over some knowledge cleansing and processing operations to exhibit how succesful Pandas is.

    Let’s begin with the dataset, which comprises inventory maintaining items (SKUs) and a search API responses for these SKUs.

    import pandas as pd
    
    search_results = pd.read_csv("search_results.csv")
    
    search_results.head()

    Search result’s a listing of dictionaries and appears like this:

    search_results.loc[0, "search_result"]
    
    "[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}}, 
    {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}}, 
    {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}}, 
    {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}}, 
    {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}}, 
    {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}}, 
    {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}}, 
    {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}] 
    ... and 5 entities remaining"

    As we see within the output, it’s not a correct record of dictionary format due to the final half (“… and 5 entities remaining”). Additionally, it’s saved as a single string.

    With a view to make higher use of it, we have to convert it to a correct record of dictionaries. The next line of code removes the final half by splitting the string at “…” and takes the primary break up.

    search_results.loc[0, "search_result"].break up("...")[0].strip()

    Nonetheless, the output remains to be a single string. We are able to use the built-in ast module of Python to transform it to a listing:

    import ast
    
    res = ast.literal_eval(search_results.loc[0, "search_result"].break up("...")[0].strip())
    
    res
    
    [{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
     {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
     {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
     {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
     {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
     {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
     {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]

    We now have the search outcomes as a correct record of dictionaries. This was just for a single row. We have to apply the identical operation to all SKUs (i.e. complete SKU column).

    One choice is to go over all of the rows in a for loop and carry out the identical operation. Nonetheless, this isn’t the best choice. We should always desire vectorized operations once we can. A vectorized operation mainly means executing the code on all rows without delay.

    On a single row, I used splitting to do away with the final a part of the string however it didn’t work in a vectorized operation. A extra strong choice appears to be utilizing a regex.

    search_results.loc[:, 'search_result'] = search_results['search_result'].str.change(r"....*", "", regex=True).str.strip()

    This code selects “…” and all the pieces that comes after it and replaces them with nothing. In different phrases, it removes “… and 5 entities remaining” half.

    We now have all of the rows within the search outcomes column as a correct record of dictionaries.

    search_results.loc[10, "search_result"]
    
    "[{'my_id': 'HBCV00007F5Y2B', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00007UPQBM', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00008I29IH', 'distance': 1.0, 'entity': {}},
     {'my_id': 'HBCV00006U3ZYB', 'distance': 0.8961254358291626, 'entity': {}},
     {'my_id': 'HBCV0000AFA4H6', 'distance': 0.8702399730682373, 'entity': {}},
     {'my_id': 'HBCV00009CDGD4', 'distance': 0.86175537109375, 'entity': {}},
     {'my_id': 'HBCV000046336T', 'distance': 0.8594968318939209, 'entity': {}},
     {'my_id': 'HBCV00009QDZRT', 'distance': 0.8572311997413635, 'entity': {}},
     {'my_id': 'HBCV00008E11P3', 'distance': 0.8553324937820435, 'entity': {}},
     {'my_id': 'HBV00000C4IY6', 'distance': 0.8539167642593384, 'entity': {}}]"

    They’re nonetheless saved as a string however I can simply convert them to a listing utilizing the ast module, which I’ll do within the subsequent step.

    What I’m fascinated with is the SKUs returned within the search outcomes. I’ll create a brand new column by extracting the SKUs within the dictionaries. I can entry them utilizing the “my_id” key of the dictionary.

    There are 3 elements of this operation:

    • Convert the search outcome string to record utilizing the literal_eval perform
    • Extract SKU from the my_id key of the dictionary
    • Do that in a listing comprehension to get SKUs from all of the dictionaries within the record

    We are able to do all these operations by making use of a lambda perform to all rows as follows:

    search_results.loc[:, "result_skus"] = 
    search_results["search_result"].apply(lambda x: [item['my_id'] for merchandise in ast.literal_eval(x)])
    
    search_results.head()

    Every row within the result_skus column comprises a listing of 10 SKUs. Let’s say I must have these 10 SKUs in numerous rows. For every row within the sku column, there will probably be 10 rows created from the record within the result_skus column. There’s a quite simple method of doing this in Pandas, which is the explode perform.

    knowledge = search_results[["sku", "result_skus"]].explode("result_skus", ignore_index=True)
    
    knowledge.head()

    We created a brand new dataframe with sku and result_skus column. The drawing under demonstrates what the explode perform does:

    Think about the alternative. We’ve a dataframe as proven above however wish to have all outcomes for an sku in a single row.

    We are able to use the groupby perform to group the rows by sku after which apply the record perform on the result_skus column:

    new_data = knowledge.groupby("sku", as_index=False)["result_skus"].apply(record)
    
    new_data.head()

    This can get us again to the earlier step:

    Utilizing the explode perform, we created a dataframe with a separate row for every sku within the result_skus column. What if we have to have them separated to totally different columns as an alternative of rows?

    One choice is to use the pd.Collection perform to the result_skus column and concatenate the ensuing columns to the unique dataframe.

    new_cols = new_data["result_skus"].apply(pd.Collection)
    
    new_data = pd.concat([new_data, new_cols], axis=1)
    
    new_data.head()

    Columns from 0 to 9 comprises the ten SKUs within the result_skus column. This code utilizing the apply perform shouldn’t be a vectorized operation.

    We’ve another choice, which is vectorized and far quicker.

    new_cols = pd.DataFrame(new_data["result_skus"].tolist())
    
    new_data = pd.concat([new_data, new_cols], axis=1)

    This code will give us the identical dataframe as above however a lot quicker.

    I demonstrated a typical knowledge cleansing and processing activity a knowledge scientist or analyst might encounter of their job. I’ve been within the area for over 5 years and Pandas has all the time been sufficient to do what I would like apart from when working very giant datasets (e.g. billions of rows).

    The instruments which might be higher match for such giant datasets have related syntax to Pandas. For instance, PySpark is form of a mix of Pandas and SQL. Polars is similar to Pandas by way of syntax. Thus, studying and practicind Pandas remains to be a extremely priceless ability for anybody working within the knowledge science and AI area.

    Thanks for studying.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Recursive Language Models: An All-in-One Deep Dive

    May 16, 2026

    From Data Analyst to Data Engineer: My 12-Month Self-Study Roadmap

    May 16, 2026

    Proxy-Pointer RAG — Structure-Aware Document Comparison at Enterprise Scale

    May 16, 2026

    Why My Coding Assistant Started Replying in Korean When I Typed Chinese

    May 15, 2026

    From Raw Data to Risk Classes

    May 15, 2026

    How I Continually Improve My Claude Code

    May 15, 2026
    Leave A Reply Cancel Reply

    Editors Picks

    AI art for your walls

    May 17, 2026

    With SXSW Sydney gone, the S2S Summit returns to MCA in September 2026

    May 17, 2026

    Take Control of Your Debt With These Free Tools

    May 17, 2026

    Apple’s revamped Siri may launch in beta, and will have an option to auto-delete chats; Apple plans to add Suggested Genmoji to iOS 27 and iPadOS 27 (Mark Gurman/Bloomberg)

    May 17, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    OpenClassrooms Co-founder Mathieu Nebra to speak at the EU-Startups Summit 2026 on May 7-8 in Malta

    February 18, 2026

    London-based CybaVerse raises €5.9 million aiming to make cybersecurity simpler, more efficient, and more accessible

    October 15, 2025

    Aspartame’s long-term health risks revealed in new study

    December 22, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.