Why You Should Not Replace Blanks with 0 in Power BI

watching Jeffrey Wang as a live stream guest with Reid Havens, and one of many dozen fantastic issues that Jeffrey shared with the viewers was the checklist of optimizations that the DAX engine performs when creating an optimum question plan for our measures.

And, the one which caught my consideration was relating to the so-called “Sparse measures”:

Screenshot from the dwell stream on YouTube

To make it easy, when you outline the measure, Formula Engine in VertiPaq will add an implicit NonEmpty filter to the question, which ought to allow the optimizer to keep away from full cross-join of dimension tables and scan solely these rows the place data for the mix of your dimension attributes actually exist. For folk coming from the MDX world, the NonEmpty perform might look acquainted, however let’s see the way it works in DAX.

The factor that the majority resonated with me was when Jeffrey suggested towards changing BLANKs with zeroes (or no matter express values) in Energy BI calculations. I’ve already written how you can handle BLANKs and replace them with zeroes, however on this article, I need to give attention to the doable efficiency implications of this determination.

Setting the stage

Earlier than we begin, one necessary disclaimer: the advice to not substitute BLANK with 0 is simply that — a suggestion. If the enterprise request is to show 0 as a substitute of BLANK, it doesn’t essentially imply that you must refuse to do it. In most situations, you’ll in all probability not even discover a efficiency lower, however it would rely upon a number of various factors…

Let’s begin by writing our easy DAX measure:

Gross sales Amt 364 Merchandise =
CALCULATE (
    [Sales Amt],
    FILTER ( ALL ( 'Product'[ProductKey] ), 'Product'[ProductKey] = 364 )
)

Utilizing this measure, I need to calculate the overall gross sales quantity for the product with ProductKey = 364. And, if I put the worth of this measure within the Card visible, and activate Efficiency Analyzer to examine the occasions for dealing with this question, I get the next outcomes:

DAX question took solely 11ms to execute, and as soon as I switched to DAX Studio, the xmSQL generated by the Components Engine was fairly easy:

And, if I check out the Question plan (bodily), I can see that the Storage Engine discovered just one present mixture of values to return our information:

Including extra substances…

Nevertheless, let’s say that the enterprise request is to research information for Product Key 364 on a each day degree. Let’s go and add dates to our report:

This was once more very quick! I’ll now examine the metrics inside the DAX Studio:

This time, the question was expanded to incorporate a Dates desk, which affected the work Storage Engine wanted to do, as as a substitute of discovering just one row, this time, the quantity is completely different:

In fact, you’ll not discover any distinction in efficiency between these two situations, because the distinction is just a few milliseconds.

However that is just the start; we’re simply warming up our DAX engine. In each of those instances, as you might even see, we see solely “stuffed” values — that mixture of rows the place each of our necessities are glad — product secret is 364 and solely these dates the place we had gross sales for this product — in case you look completely within the illustration above, dates usually are not contiguous and a few are lacking, equivalent to January twelfth, January 14th to January twenty first and so forth.

It’s because Components Engine was good sufficient to get rid of the dates the place product 364 had no gross sales utilizing the NonEmpty filter, and that’s why the variety of data is 58: we’ve got 58 distinct dates the place gross sales of product 364 weren’t clean:

Now, let’s say that enterprise customers additionally need to see these dates in-between, the place product 364 hadn’t made any gross sales. So, the thought is to show 0$ quantity for all these dates. As already described within the earlier article, there are a number of alternative ways to exchange the BLANKs with zeroes, and I’ll use the COALESCE() perform:

Gross sales Amt 364 Merchandise with 0 = COALESCE([Sales Amt 364 Products],0)

Mainly, the COALESCE perform will examine all of the arguments supplied (in my case, there is just one argument) and substitute the primary BLANK worth with the worth you specified. Merely mentioned, it would examine if the worth of the Gross sales Amt 364 Merchandise is BLANK. If not, it would show the calculated worth; in any other case, it would substitute BLANK with 0.

Wait, what?! Why am I seeing all of the merchandise, after I filtered every little thing out, besides product 364? Not to mention that, my desk now took greater than 2 seconds to render! Let’s examine what occurred within the background.

As a substitute of producing one single question, now we’ve got 3 of them. The primary one is precisely the identical as within the earlier case (58 rows). Nevertheless, the remaining queries goal the Product and Dates tables, pulling all of the rows from each tables (The product desk comprises 2517 rows, whereas the Dates desk has 1826). Not simply that, check out the question plan:

4.6 million data?! Why on Earth does it occur?! Let me do the mathematics for you: 2.517 * 1.826 = 4.596.042…So, right here we had a full cross-join between Product and Dates tables, forcing each single tuple (mixture of date-product) to be checked! That occurred as a result of we pressured the engine to return 0 for each single tuple that might in any other case return clean (and consequentially be excluded from scanning)!

This can be a simplistic overview of what occurred:

Imagine it or not, there’s a chic answer to point out clean values out-of-the-box (however, not with 0 as a substitute of BLANK). You may simply merely click on on the Date discipline and select to Present gadgets with no information:

It will show the clean cells too, however with out performing a full cross-join between the Product and Dates tables:

We will now see all of the cells (even blanks) and this question took half the time of the earlier one! Let’s examine the question plan generated by the Components Engine:

Not all situations are catastrophic!

Fact to be mentioned, we might’ve rewritten our measure to exclude some undesirable data, however it might nonetheless not be an optimum approach for the engine to get rid of empty data.

Moreover, there are specific situations through which changing BLANKs with zero is not going to trigger a major efficiency lower.

Let’s look at the next state of affairs: we’re displaying information concerning the whole gross sales quantity for each single model. And I’ll add my gross sales quantity measure for product 364:

As you may count on, that was fairly quick. However, what’s going to occur after I add my measure that replaces BLANKs with 0, which brought on havoc within the earlier situation:

Hm, seems like we didn’t need to pay any penalty when it comes to efficiency. Let’s examine the question plan for this DAX question:

Conclusion

As Jeffrey Wang steered, you must keep away from changing blanks with zeroes (or with every other express values), as it will considerably have an effect on the question optimizer’s potential to get rid of pointless information scanning. Nevertheless, if for any cause you must substitute a clean with some significant worth, watch out when and the way to do it.

As normal, it is dependent upon many various points — for columns with low cardinality, or whenever you’re not displaying information from a number of completely different tables (like in our instance, after we wanted to mix information from Product and Dates tables), or visible varieties that don’t have to show a lot of distinct values (i.e. card visible) — you will get away with out paying the efficiency worth. Then again, in case you use tables/matrices/bar charts that present lots of distinct values, be sure to examine the metrics and question plans earlier than you deploy that report back to a manufacturing atmosphere.

Thanks for studying!

Source link

Why You Should Not Replace Blanks with 0 in Power BI

Understanding Application Performance with Roofline Modeling

Computer Vision’s Annotation Bottleneck Is Finally Breaking

What PyTorch Really Means by a Leaf Tensor and Its Grad

From Configuration to Orchestration: Building an ETL Workflow with AWS Is No Longer a Struggle

LLM-as-a-Judge: A Practical Guide | Towards Data Science

Beyond Model Stacking: The Architecture Principles That Make Multimodal AI Systems Work

Real-Time Speech from Brain Signals Achieved

Understanding Application Performance with Roofline Modeling

The James Brand Warrick Alpine F1 tiny multitool cache

Norwegian energy startup ONiO raises €5 million for world’s lowest power MCU

Featured Picks

Trump’s Tariffs Could Increase iPhone and Macbook Prices. But Experts Say Not to Panic Buy

Today’s NYT Mini Crossword Answers for Feb. 1

What Happens When You Remove the Filters from AI Love Generators?

Why You Should Not Replace Blanks with 0 in Power BI

Setting the stage

Including extra substances…

Not all situations are catastrophic!

Conclusion

Related Posts