February 17, 2023 at 11:01 am
Hello,
We are working with a SSAS tabular model in compatibility level 1500 on an SQL 2019 environment, which we fully process every night.
We notice however that this processing time can wildly vary from day to day, where one day the processing is done in less than 2 hours, and the next day it takes over 3.5 hours, while the underlaying source data doesn’t change that much from day to day.
We have ran an SQL profiler on the SSAS instance, and managed to log both a “fast” and a “slow” run, and when we compare the 2 outputs, we notice the main difference can be led back to one partition of one of the tables, and more exactly in the profiler we see a big difference in the event “Reading data for the 'XXX' table 'YYY' partition.”
We tried to get some more details about this particular event, and if we look up de EventId (7) en SubEventId(17), we just get “ReadData”, which isn’t saying that much, we are wondering what is happening exactly: is it the source query who is reading data from the source database? Is it the output from the source query that is getting loaded into the memory of the tabular model?
Basically, we would like to know what this step is exactly, and why there is such a big difference from day to day, and if it’s possible to optimise it?
February 17, 2023 at 5:06 pm
I have no answers, only questions. Is there anything unusual about this partition? Is it the open time period partition of a fact? or is it much larger than other partitions? Have you looked at the database activity while the read is in progress? Is the query for the partition simple? or is it complex enough to occasionally pull a bad execution plan?
I quite often see resource_semaphore waits during cube processing, because the amount of parallel processing exceeds the database resources. This means some partitions wait for others to finish. Could it be that this partition takes 60+ minutes to read data, but sometimes it has to wait for other partitions to complete? If you just process this partition, how long does it take? Do you process everything serially or in parallel?
How much memory does the tabular model use and how much does it need for processing? Is there anything else competing for memory? My understanding is that during processing the existing partition, or even model, remains in memory until the new partition has finished processing, essentially duplicating the amount of data in memory. This can be reduced by a process clear before the process full, but I think it would simply fail if it ran out of memory rather than taking longer.
Do you need to do a full process every night? I don't know what your model is like, but the majority of our data is static, so we only process a handful of partitions every night.
February 20, 2023 at 5:37 pm
Ed's questions are all spot on and you should look to try and answer all of them as part of your troubleshooting process.
My two cents: I'd start with a trace on the back-end SQL database, as it is likely a source query that's taking a long amount of time. With that trace info, you should be able to see what query is being generated by SSAS and by definition then also which partition it is.
If you see different execution plans for the long-running query from day to day, it could be statistics that are out of date...but difficult to say without all the other information Ed has asked about.
February 24, 2023 at 1:39 pm
To answer some of the questions about the model:
Based on the feedback we decided to try to look some more into the direction of the source query, and activated the Query Store, and let the daily processing run for a couple of days, and eventually managed to capture a slow run
When we compare the Query Store data of the slow run with the normal run, we notice the following:
- The query plan is always the same, so both in the slow and the normal runs the query is using the same query plan
- The metrics “Duration (ms)”, “CPU time (ms)”, “Logical Reads (KB)” and “Log memory used (KB)” are noticeably a lot higher for the slow run query
- The metrics “Logical writes (KB)”, “CLR time (ms)”, “DOP”, “Memory consumption (KB)”, “Temp DB Memory used (KB)” and “Wait time (ms)” are noticeably lower for the slow run query
- The other metrics (“Execution count”, “Physical Reads (KB)”, “Row count”) are about the same
We are pretty new to using the query store, so we don’t know what all these thing mean exactly. What could be the reason the same query with the same query plan gives such different metrics from time to time?
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply