What are the limits?
What are the limits?
Of course, as a follow-up to my last article you might ask at least two questions:
- As the potential cardinality issues are related to actual HW, where exactly are the limits in my environment? At which size of dimension should I start to care?
- What about the impact of having more than one such dimension? Does performance drop double when using two big dimensions?
Let’s test it
So, we did an experiment. I took the same dimension and made multiple copies of the same dimension containing over 2mio records. But for each copy I merged some of them to one node (Unknown). I did the same operation with key column. Now I had the same fact table connected to similar dimensions with different cardinality.
Here are the results of simple sum query over amount through a whole fact table (200mio records) aggregated through different cardinality into columns with 2 values.
From the chart above, we can see it’s obvious – the golden rule “you should worry about dimension of size 1mio+” works for us.
Now it’s time to answer the second question. What performance drop can I expect when using more than one 2mio size dimension? Let’s look at the test result as well.
From the above we can see there is not such a devastating impact of having more than one dimension as we could expect. We can see performance drop, but this can be influenced by just combining more data together. You could think about the theory behind those numbers.
Conclusion
The golden rule regarding size of big dimension defined by SQLBI guys works.
If you have one large dimension in your query which you cannot eliminate you have a chance you will not see another big performance drop when adding another.
One more hint from real life monitoring. My measurement in this and the previous blog was under special conditions. The real-life measures are usually much more complicated. Therefore, you lose performance at different stages of query evaluation. Our real-life test of performance drop when using big dimension is 5%-20% – depending on the query. Which could be something you can consider as acceptable when compared with other options.