Product Hierarchy in a Matrix with 8+ measures on Values-- Perfomance

  • I'm not new to PowerBI, but I'm new to this problem. I'm working for a company that's creating a PowerBI solution on Azure for a retail company.  They have a proper star schema (Sales fact in the middle, and then these dimensions: Product (hierarchy, 8 levels), Date, Store Hierarchy.

    And then they have like 8 measures on this thing, all of which are fairly simple:

    Net Sales = SUM(Sales[SalesNet])

    Sales Units = CALCULATE(sum(Sales[SalesUnits])) /* yes, that CALCULATE doesn't need to be there! */

    Net £ Margin = CALCULATE(SUM(Sales[NetMargin])*-1)

    Net Margin % = DIVIDE([Net £ Margin],'Sales Measures'[Net Sales])

    SKU Count with Sales 2 = CALCULATE(COUNTROWS('Products'),FILTER(RELATEDTABLE(Sales),Sales[SalesUnits]>0))

    Linear Space (m) = Calculate(DIVIDE(sum(Space[Position Total Linear Space]),100),filter(Space,Space[PLANOGRAM TYPE]="Line List Planogram"))

    Average Selling Price = DIVIDE([Net Sales],[Sales Units])

    % of SKUs planogrammed = DIVIDE([SKU COUNT Planned with Sales],[SKU Count with Sales])

    Net Sales per LMPW = DIVIDE([Net Sales per LM],[Weeks on Sale])

    Net Margin per LM = Calculate(DIVIDE([Net Margin £ Planned],[Linear Space (m)]))

    Net Sales per LM = Calculate(DIVIDE([Net Sales Planned],[Linear Space (m)]))

    The fact table in PowerBI has about 9 million rows in it, so the combination of that and the too many measures in the matrix causes some ridiculous number of calculations to be performed when the report opens or when the user drills down into the Product hierarchy - takes forever to render.

    One way of improving this would be to follow Phil Seamark's blog post about aggregation tables and implement that.  In his article, he assumes that the source data lives in a SQL database, but mine doesn't. (So he can create his aggregation table using simple T-SQL.) I *guess* I could create same using PowerQuery.)  The other would be to shorten the Category hierarchy. (If I have only 3 or so levels, performance is tolerable.)

    what other performance optimization options do I have? I could create drill throughs in my shortened Product dimension to go to another report, but then I lose the ability to see everything from 30,000 feet, which is definitely helpful.

    Where do I start with speeding this up? Create an aggregation table, and then drill through to the unaggregated (but filtered) "raw" fact table?

    Send me to articles or whatever to read. A few more won't hurt. I'm just at a loss at this point. The star schema looks okay. There are no longer any DISTINCTCOUNTs against the fact table. I could maybe at some point import all the CSV source files in blob storage to a properly indexed DW database (well, if my boss agrees to it).

    Open to any and all suggestions! Thanks for reading. Feel free to ask for clarification. I'm just stymied.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • I've not set something like that up before, but I'd try taking different things out of the equation. Can you query it without using powerBI? or is it slow if you pull the data into PowerBI BUT don't actually display anything?

    What I am thinking is my first step with performance issues when jumping through multiple applications is to try to narrow down the performance bottleneck. Is the bottleneck on the data pull, the data parsing, or the data presentation. Once I know that, I can start tuning the "slow" part of the system.

    My GUESS is that pulling in 9 million rows worth of data is what is causing the performance bottleneck, but it could be the calculations on those 9 million rows, or it could be the presentation layer. I would try watching bandwidth during the report execution as well as it could be that you are maxing out the bandwidth to your report and you need to reduce the data set.

    Just my 2 cents mind you... All of my SQL stuff is on premise, so debugging is a lot easier, but I know that bandwidth has been a big bottleneck with reports once people started working from home. Report in PowerBI Desktop or Excel on site executes in a few seconds, but doing it from home can take minutes due to the amount of data that gets downloaded. Unfortunately, the only fix to that is to reduce the data set which most end users don't like, so they just need to be patient...

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Okay, silly me... I completely missed Reza Rad's video on it, "Aggregation to speed up the performance of a Power BI report even if all IMPORTED"

    You basically create an aggregated table using GROUPBY in PowerQuery to create the aggregate table that's at a higher level of granularity than the fact table. And then you modify the measures to use

    /* No DAX? */

    Sales = IF(
    ISCROSSFILTERED(DimPromotion[PromotionKey]) ||
    ISCROSSFILTERED(DimProduct[ProductKey]),
    SUM(FactInternetSales[SalesAmount]), // Main table
    SUM(SalesAggregationTable[SalesAmount])
    )

    (This post really needs the snippet from the Simpson's where Marge snaps at Lisa after she explains some conclusion saying "Oh Lisa, don't you think we've already figured that out???!")

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply