Close Menu
    Facebook LinkedIn YouTube WhatsApp X (Twitter) Pinterest
    Trending
    • Qualcomm’s New Midrange Chips Add Wi-Fi 7, Improve Gaming for Lower-Cost Phones
    • A 125cc Ducati naked motorcycle concept
    • Stockholm’s Pit exits stealth with €13.6 million a16z-led funding to offer “AI product teams as a service”
    • Using AI for Just 10 Minutes Might Make You Lazy and Dumb, Study Shows
    • Today’s NYT Connections Is All-Symbols: Answers for May 6 Explained
    • Butterfly wings inspire earthquake-resistant building designs
    • Exclusive: Spain’s Humara raises €1.2 million to cut waste plant design cycles from months to days with AI SaaS
    • Trump’s Team Wants Him to Accept an Iran Deal He’s Already Rejected
    Facebook LinkedIn WhatsApp
    Times FeaturedTimes Featured
    Thursday, May 7
    • Home
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    • More
      • AI
      • Robotics
      • Industries
      • Global
    Times FeaturedTimes Featured
    Home»Artificial Intelligence»Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?
    Artificial Intelligence

    Scaling ML Inference on Databricks: Liquid or Partitioned? Salted or Not?

    Editor Times FeaturedBy Editor Times FeaturedMarch 1, 2026No Comments12 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email WhatsApp Copy Link


    Introduction

    a steady variable for 4 completely different merchandise. The machine studying pipeline was inbuilt Databricks and there are two main elements. 

    1. Function preparation in SQL with serverless compute.
    2. Inference on an ensemble of a number of hundred fashions utilizing job clusters to have management over compute energy.

    In our first try, a 420-core cluster spent almost 10 hours processing simply 18 partitions.

    The target is to tune the information stream to maximise cluster utilization and guarantee scalability. Inference is completed on 4 units of ML fashions, one set per product. Nevertheless, we’ll deal with how the information is saved as it should lay out how a lot parallelism we are able to leverage for inference. We won’t deal with the internal workings of the inference itself.

    If there are too few file partitions, the cluster will take a very long time scanning giant recordsdata and at that time, except repartitioned (meaning added community latency and information shuffling), you could be inferencing on a big set of rows in each partition too. Additionally leading to long term occasions.

    Fig 1. Don’t be afraid so as to add slightly salt to your information if it’s good to. Photograph by Faran Raufi on Unsplash

    Nevertheless, enterprise has restricted persistence to ship out ML pipelines with a direct influence on the org. So exams are restricted.

    On this article, we’ll assessment our function information panorama, then present an outline of the ML inference, and current the outcomes and discussions of the inference efficiency primarily based on 4 dataset therapy eventualities:

    1. Partitioned desk, no salt, no row restrict in partitions (non-salted and Partitioned)
    2. Partitioned desk, salted, with 1M row restrict (salty and Partitioned)
    3. Liquid-clustered desk, no salt, no row restrict in partitions (non-salted and Liquid)
    4. Liquid-clustered desk, salted, with 1M row restrict (salty and liquid)

    Information Panorama

    The dataset accommodates options that the set of ML fashions makes use of for inference. It has ~550M rows and accommodates 4 merchandise recognized within the attribute ProductLine:

    • Product A: ~10.45M (1.9%)
    • Product B: ~4.4M (0.8%)
    • Product C: ~100M (17.6%)
    • Product D: ~354M (79.7%)

    It then has one other low cardinality attribute attrB, that accommodates solely two distinct values and is used as a filter to extract subsets of the dataset for each a part of the ML system.

    Furthermore, RunDate logs the date when the options have been generated. They’re append-only. Lastly, the dataset is learn utilizing the next question:

    SELECT
      Id,
      ProductLine,
      AttrB,
      AttrC,
      RunDate,
      {model_features}
    FROM
      catalog.schema.FeatureStore
    WHERE
      ProductLine = :product AND
      AttrB = :attributeB AND
      RunDate = :RunDate

    Salt Implementation

    The salting right here is generated dynamically. Its goal is to distribute the information in line with the volumes. Which means that giant merchandise obtain extra buckets and smaller merchandise obtain fewer buckets. As an illustration, Product D ought to obtain round 80% of the buckets, given the proportions within the information panorama.

    We do that so we are able to have predictable inference run occasions and maximize cluster utilization.

    # Calculate share of every (ProductLine, AttrB) primarily based on row counts
    brand_cat_counts = df_demand_price_grid_load.groupBy(
       "ProductLine", "AttrB"
    ).depend()
    total_count = df_demand_price_grid_load.depend()
    brand_cat_percents = brand_cat_counts.withColumn(
       "p.c", F.col("depend") / F.lit(total_count)
    )
    
    # Accumulate percentages as dicts with string keys (this can later decide
    # the variety of salt buckets every product receives
    brand_cat_percent_dict = {
       f"{row['ProductLine']}|{row['AttrB']}": row['percent']
       for row in brand_cat_percents.acquire()
    }
    
    # Accumulate counts as dicts with string keys (this can assist
    # so as to add an extra bucket if counts will not be divisible by the variety of 
    # buckets for the product
    brand_cat_count_dict = {
       f"{row['ProductLine']}|{row['AttrB']}": row['count']
       for row in brand_cat_percents.acquire()
    }
    
    # Helper to flatten key-value pairs for create_map
    def dict_to_map_expr(d):
       expr = []
       for ok, v in d.objects():
           expr.append(F.lit(ok))
           expr.append(F.lit(v))
       return expr
    
    percent_case = F.create_map(*dict_to_map_expr(brand_cat_percent_dict))
    count_case = F.create_map(*dict_to_map_expr(brand_cat_count_dict))
    
    # Add string key column in pyspark
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "product_cat_key",
       F.concat_ws("|", F.col("ProductLine"), F.col("AttrB"))
    )
    
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "p.c", percent_case.getItem(F.col("product_cat_key"))
    ).withColumn(
       "product_count", count_case.getItem(F.col("product_cat_key"))
    )
    
    # Set min/max buckets
    min_buckets = 10
    max_buckets = 1160
    
    # Calculate buckets per row primarily based on (BrandName, price_delta_cat) share
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "buckets_base",
       (F.lit(min_buckets) + (F.col("p.c") * (max_buckets - min_buckets))).solid("int")
    )
    
    # Add an additional bucket if brand_count will not be divisible by buckets_base
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "buckets",
       F.when(
           (F.col("product_count") % F.col("buckets_base")) != 0,
           F.col("buckets_base") + 1
       ).in any other case(F.col("buckets_base"))
    )
    
    # Generate salt per row primarily based on (ProductLine, AttrB) bucket depend
    df_demand_price_grid_load = df_demand_price_grid_load.withColumn(
       "salt",
       (F.rand(seed=42) * F.col("buckets")).solid("int")
    )
    
    # Carry out the repartition utilizing the core attributes and the salt column
    df_demand_price_grid_load = df_demand_price_grid_load.repartition(
       1200, "AttrB", "ProductLine", "salt"
    ).drop("product_cat_key", "p.c", "brand_count", "buckets_base", "buckets", "salt")

    Lastly, we save our dataset to the function desk and add a max variety of rows per partition. That is to forestall Spark from producing partitions with too many rows, which it may do even when we’ve already computed the salt.

    Why will we implement 1M rows? The first focus is on mannequin inference time, not a lot on file dimension. After a number of exams with 1M, 1.5M, 2M, the primary yields the very best efficiency in our case. Once more, very funds and time-constrained for this mission, so we’ve to take advantage of our sources.

    df_demand_price_grid_load.write
       .mode("overwrite")
       .possibility("replaceWhere", f"RunDate = '{params['RunDate']}'")
       .possibility("maxRecordsPerFile", 1_000_000) 
       .partitionBy("RunDate", "price_delta_cat", "BrandName") 
       .saveAsTable(f"{params['catalog_revauto']}.{params['schema_revenueautomation']}.demand_features_price_grid")

    Why not simply depend on Spark’s Adaptive Question Execution (AQE)?

    Recall that the first focus is on inference occasions, not on measurements tuned for normal Spark SQL queries like file dimension. Utilizing solely AQE was really our preliminary try. As you will notice within the outcomes, the run occasions have been very undesirable and didn’t maximize the cluster utilization given our information proportions. 

    Machine Studying inference 

    There’s a pipeline with 4 duties, one per product. Each activity does the next normal steps:

    • Masses the options from the corresponding product
    • Masses the subset of ML fashions for the corresponding product
    • Performs inference in half the subset sliced by AttrB 
    • Performs inference within the different half sliced by AttrB
    • Saves information to the outcomes desk

    We are going to deal with one of many inference levels to not overwhelm this text with numbers, though the opposite stage could be very related in construction and outcomes. Furthermore, you’ll be able to see the DAG for the inference to guage in Fig. 2.

    Fig 2. DAG for the ML inference spark stage. Personal authorship.

    It appears very simple, however the run occasions can fluctuate relying on how your information is saved and the scale of your cluster. 

    Cluster configuration

    For the inference stage we’re analyzing, there may be one cluster per product, tuned for the infrastructure limitations of the mission, and in addition the distribution of information:

    • Product A: 35 employees (Standard_DS14v2, 420 cores)
    • Product B: 5 employees (Standard_DS14v2, 70 cores)
    • Product C: 1 employee (Standard_DS14v2, 14 cores)
    • Product D: 1 employee (Standard_DS14v2, 14 cores)

    As well as, AdaptiveQueryExecution is enabled by default, which is able to let Spark resolve find out how to greatest save the information given the context you present.

    Outcomes and dialogue

    You will notice for every situation an outline of the variety of file partitions per product and the typical variety of rows per partition to offer you a sign of what number of rows the ML system will do inference per Spark activity. Moreover, we current Spark UI metrics to watch run-time efficiency and search for the distribution of information at inference time. We are going to do the Spark UI portion just for Product D, which is the biggest, to not embody an extra of knowledge. As well as, relying on the situation, inference on Product D turns into a bottleneck in run time. Another excuse why it was the first focus of the outcomes.

    Non-Salted and Partitioned

    You may see in Fig. 3that the typical file partition has tens of thousands and thousands of rows, which implies appreciable run time for a single executor. The most important on common is Product C with greater than 45M rows in a single partition. The smallest is Product B with roughly 12M common rows.

    Fig 3. Common row depend in a partition vs the merchandise.

    Fig 4. depict the variety of partitions per product, with a complete of 26 for all. Checking product D, 18 partitions fall very in need of the 420 cores we’ve obtainable and on common, each partition will carry out inference on ~40M rows.

    Fig 4. Whole variety of file partitions per product

    Check out Fig 5. In whole, the cluster spent 9.9 hours and it nonetheless wasn’t full, as we needed to kill the job, for it was turning into costly and blocking different individuals’s exams.

    Fig 5. Abstract of the inference stage for the partitioned, non-salted dataset for Product D.

    From the abstract statistics in Fig. 6 for the duties that did end, we are able to see that there was heavy skew within the partitions for Product D. The utmost enter dimension was ~56M and the runtime was 7.8h.

    Fig 6. Abstract Statistics for the executors’ inference on the partitioned and non-salted dataset.

    Non-salted and Liquid

    On this situation, we are able to observe very related outcomes when it comes to common variety of rows per file partition and variety of partitions per product, as seen in Fig. 7 and Fig. 8, respectively.

    Fig 7. Common row depend in a partition vs the merchandise

    Product D has 19 file partitions, nonetheless very in need of 420 cores. 

    Fig 8. Whole variety of file partitions per product

    We are able to already anticipate that this experiment was going to be very costly, so I made a decision to skip the inference take a look at for this situation. Once more, in a super scenario, we supply it ahead, however there’s a backlog of tickets in my board.

    Salty and Partitioned

    After making use of the salting and repartition course of, we find yourself with ~2.5M common data per partition for merchandise A and B, and ~1M for merchandise C and D as depicted in Fig 9.

    Fig 9. Common row depend in a partition vs the merchandise

    Furthermore, we are able to see in Fig. 10 that the variety of file partitions elevated to roughly 860 for product D, which provides 430 for every inference stage.

    Fig 10. Whole variety of file partitions per product

    This leads to a run time of 3h for inferencing Product D with 360 duties as seen in Fig 11.

    Fig 11. Abstract for the inference stage for partitioned and salted dataset

    Checking the abstract statistics from Fig. 12, the distribution seems balanced with run occasions round 1.7, however a most activity taking 3h, which is value additional investigating sooner or later.

    Fig 12. Abstract Statistics for the executors’ inference on the partitioned and salted dataset.

    One nice profit is that the salt distributes the information in line with the proportions of the merchandise. If we had extra availability of sources, we may enhance the variety of shuffle partitions in repartition() and add employees in line with the proportions of the information. This ensures that our course of scales predictably.

    Salty and Liquid

    This situation combines the 2 strongest levers we’ve explored thus far:

    salting to regulate file dimension and parallelism, and liquid clustering to maintain associated information colocated with out inflexible partition boundaries.

    After making use of the identical salting technique and a 1M row restrict per partition, the liquid-clustered desk exhibits a really related common partition dimension to the salted and partitioned case, as proven in Fig 13. Merchandise C and D stay near the 1M rows goal, whereas merchandise A and B settle barely above that threshold.

    Fig 13. Common row depend in a partition vs the merchandise

    Nevertheless, the primary distinction seems in how these partitions are distributed and consumed by Spark. As proven in Fig. 14, product D once more reaches a excessive variety of file partitions, offering sufficient parallelism to saturate the obtainable cores throughout inference.

    Fig 14. Whole variety of file partitions per product.

    In contrast to the partitioned counterpart, liquid clustering permits Spark to adapt file format over time whereas nonetheless benefiting from the salt. This leads to a extra even distribution of labor throughout executors, with fewer excessive outliers in each enter dimension and activity length.

    From the abstract statistics in Fig. 15, we observe that almost all of duties are accomplished inside a decent runtime window, and the utmost activity length is decrease than within the salty and partitioned situation. This means decreased skew and higher load balancing throughout the cluster.

    Fig 15. Abstract for the inference stage for liquid clustered and salted dataset
    Fig 16. Abstract Statistics for the executors’ inference on the liquid clustered and salted dataset.

    An necessary facet impact is that liquid clustering preserves information locality for the filtered columns with out implementing strict partition boundaries. This permits Spark to nonetheless profit from information skipping, whereas the salt ensures that no single executor is overwhelmed with tens of thousands and thousands of rows.

    General, salty and liquid emerges as essentially the most sturdy setup: it maximizes parallelism, minimizes skew, and reduces operational danger when inference workloads develop or cluster configurations change.

    Key Takeaways

    • Inference scalability is usually restricted by information format, not mannequin complexity. Poorly sized file partitions can go away a whole bunch of cores idle whereas a number of executors course of tens of thousands and thousands of rows.
    • Partitioning alone will not be sufficient for large-scale inference. With out controlling file dimension, partitioned tables can nonetheless produce large partitions that result in long-running, skewed duties.
    • Salting is an efficient instrument to unlock parallelism. Introducing a salt key and implementing a row restrict per partition dramatically will increase the variety of runnable duties and stabilizes runtimes.
    • Liquid clustering enhances salting by decreasing skew with out inflexible boundaries. It permits Spark to adapt file format over time, making the system extra resilient as information grows.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Editor Times Featured
    • Website

    Related Posts

    Deconstruct Any Metric with a Few Simple ‘What’ Questions

    May 7, 2026

    Timer-XL: A Long-Context Foundation Model for Time-Series Forecasting

    May 6, 2026

    Beyond Lists: Using Python Deque for Real-Time Sliding Windows

    May 6, 2026

    When the Uncertainty Is Bigger Than the Shock: Scenario Modelling for English Local Elections

    May 6, 2026

    Why I Don’t Trust LLMs to Decide When the Weather Changed

    May 6, 2026

    U.S. Officials Want Early Access to Advanced AI, and the Big Companies Have Agreed

    May 6, 2026

    Comments are closed.

    Editors Picks

    Qualcomm’s New Midrange Chips Add Wi-Fi 7, Improve Gaming for Lower-Cost Phones

    May 7, 2026

    A 125cc Ducati naked motorcycle concept

    May 7, 2026

    Stockholm’s Pit exits stealth with €13.6 million a16z-led funding to offer “AI product teams as a service”

    May 7, 2026

    Using AI for Just 10 Minutes Might Make You Lazy and Dumb, Study Shows

    May 7, 2026
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    About Us
    About Us

    Welcome to Times Featured, an AI-driven entrepreneurship growth engine that is transforming the future of work, bridging the digital divide and encouraging younger community inclusion in the 4th Industrial Revolution, and nurturing new market leaders.

    Empowering the growth of profiles, leaders, entrepreneurs businesses, and startups on international landscape.

    Asia-Middle East-Europe-North America-Australia-Africa

    Facebook LinkedIn WhatsApp
    Featured Picks

    FanDuel Predicts announces expansion from five states to 18

    January 15, 2026

    My Favorite CES Product Might Be This Ingenious Wireless Smart Lock

    January 5, 2026

    AI Upends Writing World as Its ‘Authors’ Work Goes Viral and Pros Look On With Unease

    November 25, 2025
    Categories
    • Founders
    • Startups
    • Technology
    • Profiles
    • Entrepreneurs
    • Leaders
    • Students
    • VC Funds
    Copyright © 2024 Timesfeatured.com IP Limited. All Rights.
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us

    Type above and press Enter to search. Press Esc to cancel.