Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

irritating points to debug in information science code aren’t syntax errors or logical errors. Relatively, they arrive from code that does precisely what it’s alleged to do, however takes its candy time doing it.

Useful however inefficient code could be a huge bottleneck in an information science workflow. On this article, I’ll present a quick introduction and walk-through of py-spy, a strong instrument designed to profile your Python code. It may well pinpoint precisely the place your program is spending essentially the most time so inefficiencies may be recognized and corrected.

Instance Drawback

Let’s arrange a easy analysis query to jot down some code for:

“For all flights going between US states and territories, which departing airport has the longest flights on common?”

Beneath is an easy Python script to reply this analysis query, utilizing information retrieved from the Bureau of Transportation Statistics (BTS). The dataset consists of information from each flight inside US states and territories between January and June of 2025 with info on the origin and vacation spot airports. It’s roughly 3.5 million rows.

It calculates the Haversine Distance — the shortest distance between two factors on a sphere — for every flight. Then, it teams the outcomes by departing airport to search out the typical distance and studies the highest 5.

import pandas as pd  
import math  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = math.radians(lat_1)  
    lon_1_rad = math.radians(lon_1)  
    lat_2_rad = math.radians(lat_2)  
    lon_2_rad = math.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    haversine_dists = []  
    for i, row in flights_df.iterrows():  
        haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],  
                                         lon_1=row["LONGITUDE_ORIGIN"],  
                                         lat_2=row["LATITUDE_DEST"],  
                                         lon_2=row["LONGITUDE_DEST"]))  
  
    flights_df["Distance"] = haversine_dists  
  
    # Get consequence by grouping by origin airport, taking the typical flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Operating this code provides the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Ok Inouye Worldwide        2211.857407
Took 169.8935534954071 s

These outcomes make sense, because the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all areas exterior of the contiguous United States the place one would count on lengthy common flight distances.

The issue right here isn’t the outcomes — that are legitimate — however the execution time: nearly three minutes! Whereas three minutes is perhaps tolerable for a one-off run, it turns into a productiveness killer throughout growth. Think about this as a part of an extended information pipeline. Each time a parameter is tweaked, a bug is fastened, or a cell is re-run, you’re pressured to take a seat idle whereas this system runs. That friction breaks your movement and turns a fast evaluation into an all-afternoon affair.

Now let’s see how py-spy will help us diagnose precisely what strains are taking so lengthy.

What Is Py-Spy?

To know what py-spy is doing and the advantages of utilizing it, it helps to match py-spy to the built-in Python profiler cProfile.

cProfile: This can be a Tracing Profiler, working much like a stopwatch on every perform name. The time between every perform name and return is measured and reported. Whereas extremely correct, this provides vital overhead, because the profiler has to always pause and document information, which might decelerate the script considerably.
py-spy: This can be a Sampling Profiler, working much like a excessive pace digital camera wanting on the entire program directly. py-spy sits utterly exterior the working Python script and takes high-frequency snapshots of this system’s state. It seems on the total “Name Stack” to see precisely what line of code is being run and what perform known as it, all the best way as much as the highest degree.

Operating Py-spy

With the intention to run py-spy on a Python script, the py-spy library should be put in within the Python setting.

pip set up py-spy

As soon as the py-spy library is put in, our script may be profiled by working the next command within the terminal:

py-spy document -o profile.svg -r 100 -- python predominant.py

Here’s what every a part of this command is definitely doing:

py-spy: Calls the instrument.
document: This tells py-spy to make use of its “document” mode, which can repeatedly monitor this system whereas it runs and saves the info.
-o profile.svg: This specifies the output filename and format, telling it to output the outcomes as an SVG file known as profile.svg.
-r 100: This specifies the sampling fee, setting it to 100 instances per second. Because of this py-spy will examine what this system is doing 100 instances per second.
--: This separates the py-spy command from the Python script command. It tells py-spy that all the pieces following this flag is the command to run, not arguments for py-spy itself.
python predominant.py: That is the command to run the Python script to be profiled with py-spy, on this case working predominant.py.

Word: If working on Linux, sudo privileges are sometimes a requirement for working py-spy, for safety causes.

After this command is completed working, an output file profile.svg will seem which can permit us to dig deeper into what components of the code are taking the longest.

Py-spy Output

Icicle Graph output from py-spy

Opening up the output profile.svg reveals the visualization that py-spy has created for the way a lot time our program spent in several components of the code. This is named a Icicle Graph (or typically a Flame Graph if the y-axis is inverted) and is interpreted as follows:

Bars: Every coloured bar represents a selected perform that was known as through the execution of this system.
X-axis (Inhabitants): The horizontal axis represents the gathering of all samples taken through the profiling. They’re grouped in order that the width of a selected bar represents the proportion of the full samples that this system was within the perform represented by that bar. Word: That is not a timeline; the ordering doesn’t symbolize when the perform was known as, solely the full quantity of time spent.
Y-axis (Stack Depth): The vertical axis represents the decision stack. The highest bar labeled “all” represents your complete program, and the bars beneath it symbolize capabilities known as from “all”. This continues down recursively with every bar damaged down into the capabilities that had been known as throughout its execution. The very backside bar exhibits the perform that was truly working on the CPU when the pattern was taken.

Interacting with the Graph

Whereas the picture above is static, the precise .svg file generated by py-spy is absolutely interactive. Once you open it in an online browser, you may:

Search (Ctrl+F): Spotlight particular capabilities to see the place they seem within the stack.
Zoom: Click on on any bar to zoom in on that particular perform and its kids, permitting you to isolate advanced components of the decision stack.
Hover: Hovering over any bar shows the particular perform title, file path, line quantity, and the precise share of time it consumed.

Essentially the most crucial rule for studying the icicle graph is just: The broader the bar, the extra frequent the perform. If a perform bar spans 50% of the graph’s width, it implies that this system was engaged on executing that perform for 50% of the full runtime.

Prognosis

From the icicle graph above, we will see that the bar representing the Pandas iterrows() perform is noticeably huge. Hovering over that bar when viewing the profile.svg file reveals that the true proportion for this perform was 68.36%. So over 2/3 of the runtime was spent within the iterrows() perform. Intuitively this bottleneck is smart, as iterrows() creates a Pandas Sequence object for each single row within the loop, inflicting huge overhead. This reveals a transparent goal to try to optimize the runtime of the script.

Optimizing The Script

The clearest path to optimize this script primarily based on what was discovered from py-spy is to cease utilizing iterrows() to loop over each row to calculate that haversine distance. As an alternative, it must be changed with a vectorized calculation utilizing NumPy that may do the calculation for each row with only one perform name. So the adjustments to be made are:

Rewrite the haversine() perform to make use of vectorized and environment friendly C-level NumPy operations that permit entire arrays to be handed in reasonably than one set of coordinates at a time.
Exchange the iterrows() loop with a single name to this newly vectorized haversine() perform.

import pandas as pd  
import numpy as np  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = np.radians(lat_1)  
    lon_1_rad = np.radians(lon_1)  
    lat_2_rad = np.radians(lat_2)  
    lon_2_rad = np.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],  
                                       lon_1=flights_df["LONGITUDE_ORIGIN"],  
                                       lat_2=flights_df["LATITUDE_DEST"],  
                                       lon_2=flights_df["LONGITUDE_DEST"])  
  
    # Get consequence by grouping by origin airport, taking the typical flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Operating this code provides the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Ok Inouye Worldwide        2211.857407
Took 0.5649983882904053 s

These outcomes are similar to the outcomes from earlier than the code was optimized, however as an alternative of taking almost three minutes to course of, it took simply over half a second!

Trying Forward

If you’re studying this from the long run (late 2026 or past), examine if you’re working Python 3.15 or newer. Python 3.15 is predicted to introduce a local sampling profiler in the usual library, providing related performance to py-spy with out requiring exterior set up. For anybody on Python 3.14 or older py-spy stays the gold commonplace.

This text explored a instrument for tackling a typical frustration in information science — a script that capabilities as supposed, however is inefficiently written and takes a very long time to run. An instance script was offered to study which US departure airports have the longest common flight distance in accordance with the Haversine distance. This script labored as anticipated, however took nearly three minutes to run.

Utilizing the py-spy Python profiler, we had been in a position to study that the reason for the inefficiency was using the iterrows() perform. By changing iterrows() with a extra environment friendly vectorized calculation of the Haversine distance, the runtime was optimized from three minutes down to only over half a second.

See my GitHub Repository for the code from this text, together with the preprocessing of the uncooked information from BTS.

Thanks for studying!

Knowledge Sources

Knowledge from the Bureau of Transportation Statistics (BTS) is a piece of the U.S. Federal Authorities and is within the public area beneath 17 U.S.C. § 105. It’s free to make use of, share, and adapt with out copyright restriction.

Source link

Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

How to Find the Optimal Coding Agent Interface

I Completed Five Years in Analytics Consulting: 5 Lessons That Changed How I Work

GPU-Resident Top-K for Agentic RAG: I Built a CUDA Kernel So My Retrieval Step Would Stop Bouncing Off the GPU

Can Machine Learning Predict the World Cup?

Automate Writing Your LLM Prompts

These Were My Favorite Things Samsung Unpacked During Its 2026 Galaxy Event

AI minister role boosted but tech department axed in Burnham shake-up

Loop Engineering for RAG Question Parsing: The Small Loop That Runs Before Retrieval

The risk of weather data sabotage is rising

Featured Picks

It’s Officially Election Season In Trumpworld

Ursa Major HAVOC Missile: Affordable, High-Volume Hypersonic

The Best Smart Rings, Tested and Reviewed (2025)

Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

Instance Drawback

What Is Py-Spy?

Operating Py-spy

Py-spy Output

Interacting with the Graph

Prognosis

Optimizing The Script

Trying Forward

Knowledge Sources

Related Posts