Using EXCEPT

  • Hi guys

    Banging my head off the wall with this one. I have redeveloped a stored proc which generates a ton of select statements and as such I need to verify that my new calculations match the original so I'm dumping all select results into global temp tables and running comparison queries using EXCEPT to determine if both data sets match up.

    I found one data set that didn't match up and started looking into it more closely. There over 100 columns returned so you can imagine the fun I've had trying to determine which fields aren't matching when nothing obvious appears.

    I found the offending columns (there were 2) but the strange thing is they show exactly the same value for each. One pair of columns both return NULL the other pair returns a float value (-0.0013382309285882).

    In regards the float value, I have a feeling that this might be down to how SQL handles float values for calculations rounded off to x amount of decimal places.

    But it doesn't make sense for the NULL values. The data set contains other NULL values which compare properly and they don't show up in this exception.

    Any suggestions are appreciated.

  • It must be the floats, because NULLs are treated as equal by EXCEPT:

    SELECT NULL

    EXCEPT

    SELECT NULL

    returns no results.

    -- Gianluca Sartori

  • Thought so too but unfortunately it still seems to be failing the EXCEPT comparison.

    If I wrap ISNULLS around these values with a 1 for replacement value it works

    If I cast the floats as decimals it works.

    I can allow for the floats as they could probably do with rounding up to fewer decimal places but the NULL values make no sense

  • Forget the NULLs, it was down to the floats.

    While investigating this I started using other queries using joins and comparing one value to the other in a WHERE clause which of course leads to its own problems.

  • Can anyone tell me the 'official' reason why such behaviour occurs on SQL server when comparing float values.

    If I run my code again, the values might match and I get no differences in the EXCEPT query or I could get more results. There's no consistency here.

    Right now its not a viable option for me to redevelop my code to change all float values as decimal's set to a number of decimal places. We're talking changing 10s of thousands of rows of code for a simple comparison.

    If a value is declared the same in one piece of code as the other and the value returned / displayed on screen is the same, then it doesn't make sense that EXCEPT would return a difference between the 2.

  • mitzyturbo (6/15/2015)


    Can anyone tell me the 'official' reason why such behaviour occurs on SQL server when comparing float values.

    If I run my code again, the values might match and I get no differences in the EXCEPT query or I could get more results. There's no consistency here.

    Right now its not a viable option for me to redevelop my code to change all float values as decimal's set to a number of decimal places. We're talking changing 10s of thousands of rows of code for a simple comparison.

    If a value is declared the same in one piece of code as the other and the value returned / displayed on screen is the same, then it doesn't make sense that EXCEPT would return a difference between the 2.

    If you were comparing the declared values, there wouldn't be a problem - but you are not doing that, you are comparing teh results of some computation.

    The default display precisions for FLOATs is generally much less precise than the actual FLOAT data; so if two numbers are displayed the same, that doesn't mean that they actually are the same.

    Here's some code that demonstrates that

    declare @f float(53) = 1, @i int = 5, @s-2 float(53) = sqrt(2.0), @two float(53)= 2.0;

    declare @ns float(53) = 1.4142135623731, @NF float(53) = 1.41421356237309

    while @i > 0

    begin

    set @i-=1

    set @f = (@f+cast(2.0 as float(53))/@f)/cast(2.0 as float(53))

    end

    select @s-2*@s [S*S], @s-2*@s-@two [S*S - 2], @s-2*@f [S*F],@s*@f-@two [S*F - 2], @f*@f [F*F], @f*@f-@two [F*F - 2]

    select @s-2 S, @ns NS, @ns*@ns [NS*NS], @f F, @NF NF, @NF*@nf

    The first row of results shows three quantities each of which is displayed by default as 2, but when we subtract 2 from them we see that the first is bigger than two, the second is 2, and the third is smaller than two - so clearly 3 different quantities are being displayed as 2, and no two of them are actually equal.

    The second row of results shows qantities S and NS which are obviously different (because the first line shows that S*S is displayed as 2 but NS*NS is displayed as something different). Then it shows the same thing for the two quantities F and NF (again relying on F*F in the first line).

    All the differences in values that are showed the same seem to be somewhere below one part in a quadrillion, so teh problem is that an equals comparison gets too close.

    Since the display values are coming out the same, it is likely to be ok to fix this for just the columns affected, by changing the declaration of the columns from FLOAT to the appropriate decimal types; you don't need to do this for all your data - just for the affected columns. Your original post suggests there is only one column affected so changing the declaration of that column may be enough - but of course you will need to test thoroughly to be sure that you don't sometimes get a similar problem in other columns.

    In the long term, what you need to do is either switch from FLOAT to FLOATD (or whatever SQL eventually calls the decimal float type) or change all your arithmetic to work in decimal numeric mode. Probably the latter, as the SQL standards people (as far as I can tell) haven't yet shown any sign of getting up to date on floating point although it's 8 years since the revised standard was finally ratified by IEEE (and 4 years since it was adopted by ISO and published as an ISO standard) so teh former might involve too long a wait.

    (General moan: why oh why is SQL lagging years behind Python in floating point support?)

    Tom

  • When comparing two relatively small resultsets, I usually copy / paste both into separate tabs in a text comparison tool like WinMerge. If sorted the same, the tool will line up matching rows, and missing rows or different blocks of text are highlighted.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thought it might be something like that, thanks for the clarification Tom

    Unfortunately for me, this is the first of many sp's I'll be changing which all have a number of float values, which all need comparitive testing similar to this. The real pain of it is that if you run the code again for the sam calculations different values will appear as different such is the nature of the calculations returned in the float type.

    I might take a look at WinMerge as Eric stated to see if that works as a better tool for comparing such results.

  • You could also round the float values to avoid the issue.

    -- Gianluca Sartori

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply