Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Robots-Blog | Vention und Universal Robots: One-Stop-Shop für Verpackungsautomatisierung auf der interpack 2026 vorgestellt
    • New earthquake waveform discovery reveals geological whiplash
    • Australia isn’t losing girls in STEM – it’s losing women 
    • Mexico City Is Sinking. A Powerful NASA Satellite Just Revealed How Fast
    • Qualcomm’s New Midrange Chips Add Wi-Fi 7, Improve Gaming for Lower-Cost Phones
    • A 125cc Ducati naked motorcycle concept
    • Stockholm’s Pit exits stealth with €13.6 million a16z-led funding to offer “AI product teams as a service”
    • Using AI for Just 10 Minutes Might Make You Lazy and Dumb, Study Shows
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, May 7
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Liberating Performance with Immutable DataFrames in Free-Threaded Python
    Artificial Intelligence

    Liberating Performance with Immutable DataFrames in Free-Threaded Python

    Editor Times FeaturedBy Editor Times FeaturedJuly 9, 2025No Comments7 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    to every row of a DataFrame is a typical operation. These operations are embarrassingly parallel: every row might be processed independently. With a multi-core CPU, many rows might be processed directly.

    Till just lately, exploiting this chance in Python was not attainable. Multi-threaded operate utility, being CPU-bound, was throttled by the International Interpreter Lock (GIL).

    Python now presents an answer: with the “experimental free-threading construct” of Python 3.13, the GIL is eliminated, and true multi-threaded concurrency of CPU-bound operations is feasible.

    The efficiency advantages are extraordinary. Leveraging free-threaded Python, StaticFrame 3.2 can carry out row-wise operate utility on a DataFrame a minimum of twice as quick as single-threaded execution.

    For instance, for every row of a sq. DataFrame of one-million integers, we will calculate the sum of all even values with lambda s: s.loc[s % 2 == 0].sum(). When utilizing Python 3.13t (the “t” denotes the free-threaded variant), the period (measured with ipython %timeit) drops by greater than 60%, from 21.3 ms to 7.89 ms:

    # Python 3.13.5 experimental free-threading construct (most important, Jun 11 2025, 15:36:57) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
    >>> import numpy as np; import static_frame as sf
    
    >>> f = sf.Body(np.arange(1_000_000).reshape(1000, 1000))
    >>> func = lambda s: s.loc[s % 2 == 0].sum()
    
    >>> %timeit f.iter_series(axis=1).apply(func)
    21.3 ms ± 77.1 μs per loop (imply ± std. dev. of seven runs, 10 loops every)
    
    >>> %timeit f.iter_series(axis=1).apply_pool(func, use_threads=True, max_workers=4)
    7.89 ms ± 60.1 μs per loop (imply ± std. dev. of seven runs, 100 loops every)

    Row-wise operate utility in StaticFrame makes use of the iter_series(axis=1) interface adopted by both apply() (for single-threaded utility) or apply_pool() for multi-threaded (use_threads=True) or multi-processed (use_threads=False) utility.

    The advantages of utilizing free-threaded Python are strong: the outperformance is constant throughout a variety of DataFrame shapes and compositions, is proportional in each MacOS and Linux, and positively scales with DataFrame dimension.

    When utilizing normal Python with the GIL enabled, multi-threaded processing of CPU-bound processes typically degrades efficiency. As proven beneath, the period of the identical operation in normal Python will increase from 17.7 ms with a single thread to virtually 40 ms with multi-threading:

    # Python 3.13.5 (most important, Jun 11 2025, 15:36:57) [Clang 16.0.0 (clang-1600.0.26.6)]
    >>> import numpy as np; import static_frame as sf
    
    >>> f = sf.Body(np.arange(1_000_000).reshape(1000, 1000))
    >>> func = lambda s: s.loc[s % 2 == 0].sum()
    
    >>> %timeit f.iter_series(axis=1).apply(func)
    17.7 ms ± 144 µs per loop (imply ± std. dev. of seven runs, 100 loops every)
    
    >>> %timeit f.iter_series(axis=1).apply_pool(func, use_threads=True, max_workers=4)
    39.9 ms ± 354 µs per loop (imply ± std. dev. of seven runs, 10 loops every)

    There are trade-offs when utilizing free-threaded Python: as obvious in these examples, single-threaded processing is slower (21.3 ms on 3.13t in comparison with 17.7 ms on 3.13). Free-threaded Python, usually, incurs efficiency overhead. That is an energetic space of CPython growth and enhancements are anticipated in 3.14t and past.

    Additional, whereas many C-extension packages like NumPy now supply pre-compiled binary wheels for 3.13t, dangers similar to thread competition or knowledge races nonetheless exists.

    StaticFrame avoids these dangers by implementing immutability: thread security is implicit, eliminating the necessity for locks or defensive copies. StaticFrame does this by utilizing immutable NumPy arrays (with flags.writeable set to False) and forbidding in-place mutation.

    Prolonged DataFrame Efficiency Checks

    Evaluating efficiency traits of a posh knowledge construction like a DataFrame requires testing many kinds of DataFrames. The next efficiency panels carry out row-wise operate utility on 9 completely different DataFrame sorts, testing all mixtures of three shapes and three ranges of kind homogeneity.

    For a hard and fast variety of parts (e.g., 1 million), three shapes are examined: tall (10,000 by 100), sq. (1,000 by 1,000), and vast (100 by 10,0000). To differ kind homogeneity, three classes of artificial knowledge are outlined: columnar (no adjoining columns have the identical kind), combined (teams of 4 adjoining columns share the identical kind), and uniform (all columns are the identical kind). StaticFrame permits adjoining columns of the identical kind to be represented as two-dimensional NumPy arrays, lowering the prices of column transversal and row formation. On the uniform excessive, a whole DataFrame might be represented by one two-dimensional array. Artificial knowledge is produced with the frame-fixtures bundle.

    The identical operate is used: lambda s: s.loc[s % 2 == 0].sum(). Whereas a extra environment friendly implementation is feasible utilizing NumPy instantly, this operate approximates frequent functions the place many intermediate Sequence are created.

    Determine legends doc concurrency configuration. When use_threads=True, multi-threading is used; when use_threads=False, multi-processing is used. StaticFrame makes use of the ThreadPoolExecutor and ProcessPoolExecutor interfaces from the usual library and exposes their parameters: the max_workers parameter defines the utmost variety of threads or processes used. A chunksize parameter can also be out there, however isn’t various on this examine.

    Multi-Threaded Perform Software with Free-Threaded Python 3.13t

    As proven beneath, the efficiency advantages of multi-threaded processing in 3.13t are constant throughout all DataFrame sorts examined: processing time is decreased by a minimum of 50%, and in some circumstances by over 80%. The optimum variety of threads (the max_workers parameter) is smaller for tall DataFrames, because the faster processing of smaller rows implies that extra thread overhead really degrades efficiency.

    Determine by Writer.

    Scaling to DataFrames of 100 million parts (1e8), outperformance improves. Processing time is decreased by over 70% for all however two DataFrame sorts.

    Determine by Writer.

    The overhead of multi-threading can differ enormously between platforms. In all circumstances, the outperformance of utilizing free-threaded Python is proportionally constant between MacOS and Linux, although MacOS exhibits marginally better advantages. The processing of 100 million parts on Linux exhibits comparable relative outperformance:

    Determine by Writer.

    Surprisingly, even small DataFrames of solely ten-thousand parts (1e4) can profit from multi-threaded processing in 3.13t. Whereas no profit is discovered for vast DataFrames, the processing time of tall and sq. DataFrames might be decreased in half.

    Determine by Writer.

    Multi-Threaded Perform Software with Customary Python 3.13

    Previous to free-threaded Python, multi-threaded processing of CPU-bound functions resulted in degraded efficiency. That is made clear beneath, the place the identical checks are carried out with normal Python 3.13.

    Determine by Writer.

    Multi-Processed Perform Software with Customary Python 3.13

    Previous to free-threaded Python, multi-processing was the one choice for CPU-bound concurrency. Multi-processing, nevertheless, solely delivered advantages if the quantity of per-process work was enough to offset the excessive value of making an interpreter per course of and copying knowledge between processes.

    As proven right here, multi-processing row-wise operate utility considerably degrades efficiency, course of time rising from two to 10 occasions the single-threaded period. Every unit of labor is just too small to make up for multi-processing overhead.

    Determine by Writer.

    The Standing of Free-Threaded Python

    PEP 703, “Making the International Interpreter Lock Non-compulsory in CPython”, was accepted by the Python Steering Council in July of 2023 with the steerage that, within the first part (for Python 3.13) it’s experimental and non-default; within the second part, it turns into non-experimental and formally supported; within the third part, it turns into the default Python implementation.

    After vital CPython growth, and help by crucial packages like NumPy, PEP 779, “Standards for supported standing for free-threaded Python” was accepted by the Python Steering Council in June of 2025. In Python 3.14, free-threaded Python will enter the second part: non-experimental and formally supported. Whereas it’s not but sure when free-threaded Python will grow to be the default, it’s clear {that a} trajectory is ready.

    Conclusion

    Row-wise operate utility is only the start: group-by operations, windowed operate utility, and lots of different operations on immutable DataFrames are equally well-suited to concurrent execution and are prone to present comparable efficiency positive factors.

    The work to make CPython sooner has had success: Python 3.14 is alleged to be 20% to 40% sooner than Python 3.10. Sadly, these efficiency advantages haven’t been realized for a lot of working with DataFrames, the place efficiency is essentially certain inside C-extensions (be it NumPy, Arrow, or different libraries).

    As proven right here, free-threaded Python allows environment friendly parallel execution utilizing low-cost, memory-efficient threads, delivering a 50% to 90% discount in processing time, even when efficiency is primarily certain in C-extension libraries like NumPy. With the flexibility to soundly share immutable knowledge constructions throughout threads, alternatives for substantial efficiency enhancements are actually ample.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Deconstruct Any Metric with a Few Simple ‘What’ Questions

    May 7, 2026

    Timer-XL: A Long-Context Foundation Model for Time-Series Forecasting

    May 6, 2026

    Beyond Lists: Using Python Deque for Real-Time Sliding Windows

    May 6, 2026

    When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

    May 6, 2026

    Why I Don’t Trust LLMs to Decide When the Weather Changed

    May 6, 2026

    U.S. Officials Want Early Access to Advanced AI, and the Big Companies Have Agreed

    May 6, 2026

    Comments are closed.

    Editors Picks

    Robots-Blog | Vention und Universal Robots: One-Stop-Shop für Verpackungsautomatisierung auf der interpack 2026 vorgestellt

    May 7, 2026

    New earthquake waveform discovery reveals geological whiplash

    May 7, 2026

    Australia isn’t losing girls in STEM – it’s losing women 

    May 7, 2026

    Mexico City Is Sinking. A Powerful NASA Satellite Just Revealed How Fast

    May 7, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    Waymo Is Trying to Crack Down on Solo Kids in Driverless Cars

    May 1, 2026

    Origami robot moves with heat, no motors or gears

    April 22, 2026

    Extra-wide Juniper tiny house for full-time living with full-size bathtub

    December 14, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.