As is the case with many of my topics of late, I came across this one by helping somebody else. In SQL, we should be well aware of Precision and Scale of certain datatypes.
The particular case I was working on was focused on the decimal datatype, and so we will work with that throughout this post explicitly.
What are these attributes?
According to MSDN, these attributes have the following definitions.
Precision – specifies the number of digits an object can hold
Scale – specifies the number of digits to the right of the decimal point that an object can hold.
Based on those definitions, it seems pretty straight forward, right? Well, it is until you start doing a bit of math. Microsoft has formulas for figuring out what the resultant precision and scale will be for various math operations. You can read about that here.
Throughout our example, we will be focusing on multiplication and division. We will demonstrate a few different results and configurations as well.
First, let’s get some formulas out of the way. The formulas for precision and scale, as they show in MSDN at the link above, are as follows:
Multiplication Precision p1 + p2 + 1 Scale s1 + s2 Division Precision p1 - s1 + s2 + max(6, s1 + p2 + 1) Scale max(6, s1 + p2 + 1)
As is described in the MSDN article, p represents precision and s represents scale. The number annotations with p and s represent the corresponding expressions in the mathematical operation. The equation that we will be trying to solve is as follows:
[codesyntax lang=”tsql”]
SELECT ROUND(((@numerator*@multiplier)/@divisor1)*@anothermultiplier,2)
[/codesyntax]
But for the majority of these exercises, we will be focusing on this part of the formula.
[codesyntax lang=”tsql”]
SELECT ((@numerator*@multiplier)/@divisor1)
[/codesyntax]
This will provide us with ample example of the math involved when calculating the resultant precision and scale of a SQL math operation.
Here is an example of the above query with values. This query results in a value that is consistent with such calculators as MS Excel ( ;0) ).
[codesyntax lang=”tsql”]
SELECT ((5000000000.00*2.0250000000)/111883775187.72)
[/codesyntax]
However, if we use variables in lieu of those values, we start to see different results. And thank goodness for that, because there wouldn’t be much to talk about otherwise. So, let’s dump those values into some variables and see what starts happening.
[codesyntax lang=”tsql”]
DECLARE @divisor1DECIMAL(18,6) = 111883775187.72 ,@numeratorDECIMAL(18,2) = 5000000000.00 ,@multiplierDECIMAL(10,3) = 2.0250000000 ,@multiplier1DECIMAL(18,3) = 2.0250000000 ,@stage1DECIMAL(18,2) ,@anothermultiplierDECIMAL(18,2) = 932364.79
[/codesyntax]
And the formula(s). I say formulas, because I will be demonstrating two results here. Notice quickly that I have two similar multiplier variables – they differ only in name and precision.
[codesyntax lang=”tsql”]
SELECT ((@numerator*@multiplier1)/@divisor1) SELECT ((@numerator*@multiplier)/@divisor1)
[/codesyntax]
If you execute those two queries, you should get very similar results. Both should return 0.090xxx, but the second has more scale, extending the decimal out 8 places rather than 6 places. For the second query our result is 0.09049569. When you combine this difference at this point, it could make for some accounting nightmares. Especially given this difference in result occurs early on in the equation.
Notice in my variables there is one called stage. Let’s use that one now and see how using a staging variable plays into this.
[codesyntax lang=”tsql”]
DECLARE @divisor1DECIMAL(18,6) = 111883775187.72 ,@numeratorDECIMAL(18,2) = 5000000000.00 ,@multiplierDECIMAL(10,3) = 2.0250000000 ,@multiplier1DECIMAL(18,3) = 2.0250000000 ,@stage1DECIMAL(18,2) ,@anothermultiplierDECIMAL(18,2) = 932364.79 SELECT @stage1 = @numerator*@multiplier SELECT @stage1/@divisor1 SELECT @stage1 = @numerator*@multiplier1 SELECT @stage1/@divisor1
[/codesyntax]
Do you see what just happened? Both multipliers now produce the same result. How could that be? Let’s look at that. This time, let’s post calculations for precision and scale along-side each of those queries.
[codesyntax lang=”tsql”]
SELECT @stage1 = @numerator*@multiplier--P = 18+10+1 = 29, S = 2+3 = 5 SELECT @stage1/@divisor1--P = 18-2+6+max(6, 2+18+1) = 43, S = max(6, 2+18+1) = 21 --Actual P = 38 S = 16 SELECT @stage1 = @numerator*@multiplier1--P = 18+18+1 = 37, S = 2+3 = 5 SELECT @stage1/@divisor1--P = 18-2+6+max(6, 2+18+1) = 43, S = max(6, 2+18+1) = 21 --Actual P = 38 S = 16
[/codesyntax]
Looking this over, you should be able to quickly pick out some anomalies. Let’s start with the anomalies present in the calculations for the second query. First, you can see that the value for p1 is 18. One might fairly think that it should be the resultant precision of the first query. But, the variable is created as Decimal(18,2) and that precision and scale is used in calculations involving that variable.
The second thing one should notice is that the resultant precision is 43. Then why did I change it to 38 at the end? Max precision is 38. If the resultant precision of a mathematical operation exceeds 38, then it must be reduced to 38. This has an impact on scale – which is the next item of note. In the aforementioned MSDN article, scale is simply reduced by the difference between resultant(p) and final(p). That simple calculation holds true for these particular queries. But, if we look at the following queries, we can clearly see that it is behaving differently.
[codesyntax lang=”tsql”]
DECLARE @divisor1DECIMAL(18,6) = 111883775187.72 ,@numeratorDECIMAL(18,2) = 5000000000.00 ,@multiplierDECIMAL(10,3) = 2.0250000000 ,@multiplier1DECIMAL(18,3) = 2.0250000000 ,@stage1DECIMAL(18,2) ,@anothermultiplierDECIMAL(18,2) = 932364.79 SELECT (@numerator*@multiplier1)/@divisor1 SELECT (@numerator*@multiplier)/@divisor1
[/codesyntax]
And the correlating notes regarding precision and scale calculations.
[codesyntax lang=”tsql”]
SELECT (@numerator*@multiplier1)/@divisor1--P = 37-5+6+max(6, 5+18+1) = 62, S = max(6, 5+18+1) = 24 --Actual P = 38 S = 6 SELECT (@numerator*@multiplier)/@divisor1--P = 29-5+6+max(6, 5+18+1) = 54, S = max(6, 5+18+1) = 24 --Actual P = 38 S = 8
[/codesyntax]
Look at the final(s) for that first query. Scale is actually 6, but that does not match the math. Resultant(p) = 62, Final(p) = 38 and that means the difference is 24. Resultant(s) is 24, from which I subtract 24 and should get 0. Well, there is a part of that formula that needs better explanation maybe in the documentation. The final(s) should actually include the max(6, Resultant(s) – (resultant(p) – final(p))). The final(s) cannot be less than 6, and thus the reason that we see 6 digits to the right of the decimal in the result of that first query.
Now let’s change that divisor scale up a bit. The requirements dictate that the divisor be a Decimal(18,2) – I used 18,6 as one of my test sets. In this case, the only thing that changes is the final(s). And in this particular case (though, I don’t recommend shortcutting – it just works for this case), we can simply add 4 to the final(s) of the second query. The first remains unchanged.
Let’s look at the resulting value now. This difference alone is cause enough for significant differences in the results of the larger formula.
First query = 0.090495, Second query = 0.090495695046. To this point, I have shown why this happens. The calculations performed exceed the limitations for precision which impacts scale – which affects accuracy of the formula.
I showed one method, without saying as much, on how to avoid this. My use of an intermediary step to perform these calculations via variable helped to correct the precision/scale/accuracy problem. Another viable option is to use appropriate precision and scale for the data being used. Changing precision and scale to match expected data can have a significant impact on the resultant accuracy of the calculation.
I used two multipliers to demonstrate that last suggestion. The more accurate result came from the second query which used a more appropriate precision and scale for the data (see the variable @multiplier).