SELECT AVG() causing arithmetic overflow?

  • This is happening seemingly randomly on various columns in the DB. Columns are defined as int and smallint.

    Sample query mildly obfuscated:

    SELECT Avg(ACTIVITY_ID) as AvgUsed from ACTIVITY_SUMMARY

    Error message:

    Msg 8115, Level 16, State 2, Line 1

    Arithmetic overflow error converting expression to data type int.

    Max value in this particular table is 8149969. Table has 7770306 rows. But again, it is happening for multiple tables. In one table, the max value for a failed column (different one) was 1007. That particular table has 5251055 rows.

    All have more than 1 million rows but that's about the only definite known.

    Other tables which also have this column return their AVG just fine.

    Saw one posting that it might be an implicit conversion issue. Query plan does not show one.

  • It could be that the SUM of the values is greater than the maximum value for an INT data type. Try casting it as a BIGINT:

    select AVG(cast(ActivityID as bigint)) from dbo.Activity

  • Yeah, that did it. Odd, though. Seems to me that the function should anticipate something like that.

  • pdb DBA (4/30/2009)


    Yeah, that did it. Odd, though. Seems to me that the function should anticipate something like that.

    the AVG returns the same type as entered. Since you were passing an int, it wanted to return an int and couldn't. By casting the the values as BIGINT's, AVG was able to return a BIGINT.

  • Well here's my thinking (which is probably a bit twisted...) on it and why this didn't occur to me at first...

    Sure, I know it *returns* the same value, but that should be *after* all calculations. It just makes much more sense to me. I mean, I can't be the only one out there with a multi-million row table with values on the higher end of the int spectrum (or even the smallint spectrum). The actual *average* is well within the limits of int, but with lots of rows the *sum* could get high very quickly. So the *sum* needs to be/should be bigint "behind the scenes".

  • pdb DBA (4/30/2009)


    Well here's my thinking (which is probably a bit twisted...) on it and why this didn't occur to me at first...

    Sure, I know it *returns* the same value, but that should be *after* all calculations. It just makes much more sense to me. I mean, I can't be the only one out there with a multi-million row table with values on the higher end of the int spectrum (or even the smallint spectrum). The actual *average* is well within the limits of int, but with lots of rows the *sum* could get high very quickly. So the *sum* needs to be/should be bigint "behind the scenes".

    Submit it to CONNECT as an issue. All i can say right now is that is the way it is.

  • Think of it this way:

    A place to store the results of the calculation must exist somewhere in memory prior to the calculation. The work doesn't just take place in the aether. There is no bitbucket.

    SQL defines a location and datatype for that target prior to the start of the calculation. Since it can't know in advance what the final result of the calculation would be, it would have to make all numeric results FLOAT, and all strings NVARCHAR(max). That would be bad. 😉

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • pdb DBA (4/30/2009)


    Well here's my thinking (which is probably a bit twisted...) on it and why this didn't occur to me at first...

    Sure, I know it *returns* the same value, but that should be *after* all calculations. It just makes much more sense to me. I mean, I can't be the only one out there with a multi-million row table with values on the higher end of the int spectrum (or even the smallint spectrum). The actual *average* is well within the limits of int, but with lots of rows the *sum* could get high very quickly. So the *sum* needs to be/should be bigint "behind the scenes".

    The problem is that INT is the most efficient datatype (for AVG). So, it incurs unnecessary overhead (i.e., it's slower) if it upconverts everything to BIGINT when it does not need to. And, it cannot tell ahead of time if it needs to, so the ideal solution would be if there was a way for you to tell it when it should switch to BIGINT.

    As it happens, such a method does already exist: CAST as BIGINT.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/30/2009)


    pdb DBA (4/30/2009)


    Well here's my thinking (which is probably a bit twisted...) on it and why this didn't occur to me at first...

    Sure, I know it *returns* the same value, but that should be *after* all calculations. It just makes much more sense to me. I mean, I can't be the only one out there with a multi-million row table with values on the higher end of the int spectrum (or even the smallint spectrum). The actual *average* is well within the limits of int, but with lots of rows the *sum* could get high very quickly. So the *sum* needs to be/should be bigint "behind the scenes".

    The problem is that INT is the most efficient datatype (for AVG). So, it incurs unnecessary overhead (i.e., it's slower) if it upconverts everything to BIGINT when it does not need to. And, it cannot tell ahead of time if it needs to, so the ideal solution would be if there was a way for you to tell it when it should switch to BIGINT.

    As it happens, such a method does already exist: CAST as BIGINT.

    I understand, but I think I agree with the OP. If your code has been working for years, and because it has grown so big it now has a problem. Why can't it use an int, and when it gets to the overflow area, up it to a bigint? Then if the final result will fit into an int, downsize it for the result. It can't take that many cpu cycles to do this, and, IMHO, would be better than having code that's been working fine for years now generating an error.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I see a lot of folks trip over this: the "AVG() Overflow syndrome", as I like to think of it.

    What's interesting about it is no one ever calls me about "SUM() Overflow". Why? "Well duh, I have a million rows, so yeah the SUM went over 2 billion. I don't need to call you to figure that out."

    So why do they call when the same thing happens on AVG() (and for the exact same reason)? It's because, either:

    A) They are only thinking of the end result and forget that the overflow happens while still accumulating and not at the end when it returns. or, ..

    B) They assume that some kind of "rolling division" algorithim is being used, like:

    Declare @Acc as int

    Set @Acc = 0

    ForEachRow:

    Select @Acc = @Acc + (col / TotalRows)

    Return @Acc

    This way each piece is small and there is no danger of overflow, right?

    But if you think about it, this algorithm is entirely unusable because first, it won't have TotalRows until it has already read all of the rows. And secondly, these are integers and dividing a single-row value by a million or more is likely to produce a lot of zeroes.

    The actual algorithm used is sort of like this:

    Declare @Acc as int

    Declare @Cnt as int

    Select @Acc = 0, @Cnt = 0

    ForEachRow:

    Select @Acc = @Acc + col, @Cnt = @Cnt + 1

    Return (@Acc / @Cnt)

    This has the advantage of beng useable (don't have to know TotalRows ahead of time), correct (no integer underflow) and faster to boot (the extra addition per row is much faster than the division).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • WayneS (4/30/2009)


    RBarryYoung (4/30/2009)


    pdb DBA (4/30/2009)


    Well here's my thinking (which is probably a bit twisted...) on it and why this didn't occur to me at first...

    Sure, I know it *returns* the same value, but that should be *after* all calculations. It just makes much more sense to me. I mean, I can't be the only one out there with a multi-million row table with values on the higher end of the int spectrum (or even the smallint spectrum). The actual *average* is well within the limits of int, but with lots of rows the *sum* could get high very quickly. So the *sum* needs to be/should be bigint "behind the scenes".

    The problem is that INT is the most efficient datatype (for AVG). So, it incurs unnecessary overhead (i.e., it's slower) if it upconverts everything to BIGINT when it does not need to. And, it cannot tell ahead of time if it needs to, so the ideal solution would be if there was a way for you to tell it when it should switch to BIGINT.

    As it happens, such a method does already exist: CAST as BIGINT.

    I understand, but I think I agree with the OP. If your code has been working for years, and because it has grown so big it now has a problem. Why can't it use an int, and when it gets to the overflow area, up it to a bigint? Then if the final result will fit into an int, downsize it for the result. It can't take that many cpu cycles to do this, and, IMHO, would be better than having code that's been working fine for years now generating an error.

    Actually, Wayne, although it seems like a simple fix, there are a lot of little complications, so that it would probably be 5x to 10x times the CPU cost of the current AVG. So much that it would probably be just as fast to upconvert everything to BIGINT.

    The current method has just two integer additions per row, so we will call its CPU cost = 2. But Consider what has to be done for the Switch method, even if it just stays in INT mode. First it has two different modes (INT & BIGINT) that it could be running in so for every row it has to check that mode and take the appropiate branch. this is a fast operation, but so is INT addition, so call it a 1 cost also, giving us a CPU cost = 2 + 1 = 3.

    Next, how do we tell if our addition overflowed? That's not a gimme, not at all. We have to wrap the equivalent of a TRY..CATCH block around the INT addition, for each row and TRY..CATCH instantiation/termination is not a cheap operation when compared to an integer addition. However, it is possible(likely?) that the SQL OS internals have a less sophisticated and more efficient exception-handling mechanism, so lets be generous (extremely) and say that it only costs 1 to set it up and 1 to tear it down. Now our per row CPU cost = 3 + 1 + 1 = 5.

    So just looking at the INT Only path and using the most favorable estimates, the CPU cost is 2.5x higher. That's a lot to pay for something that may hardly ever happen and that can be easily fixed if it does.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Also consider that the routine already is complex. If you pass an INT you get an INT in return; if you pass a DECIMAL you get a DECIMAL in return; if you pass it a FLOAT you get FLOAT in return.

  • RBarryYoung (4/30/2009)


    WayneS (4/30/2009)


    I understand, but I think I agree with the OP. If your code has been working for years, and because it has grown so big it now has a problem. Why can't it use an int, and when it gets to the overflow area, up it to a bigint? Then if the final result will fit into an int, downsize it for the result. It can't take that many cpu cycles to do this, and, IMHO, would be better than having code that's been working fine for years now generating an error.

    Next, how do we tell if our addition overflowed? That's not a gimme, not at all. We have to wrap the equivalent of a TRY..CATCH block around the INT addition, for each row and TRY..CATCH instantiation/termination is not a cheap operation when compared to an integer addition. However, it is possible(likely?) that the SQL OS internals have a less sophisticated and more efficient exception-handling mechanism, so lets be generous (extremely) and say that it only costs 1 to set it up and 1 to tear it down. Now our per row CPU cost = 3 + 1 + 1 = 5.

    So just looking at the INT Only path and using the most favorable estimates, the CPU cost is 2.5x higher. That's a lot to pay for something that may hardly ever happen and that can be easily fixed if it does.

    Well, I guess I'm pretty soundly trounced on this, with pretty irrefutable logic. I didn't think it through. Thanks for the clarification.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • It's a common perception Wayne, and sometimes true. But what's easy to overlook is that with a really super-efficent function like AVG() the primary loop-path is so tight that anything, even simple branching or mode testing starts to multiply the overhead.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (4/30/2009)


    I see a lot of folks trip over this: the "AVG() Overflow syndrome", as I like to think of it.

    What's interesting about it is no one ever calls me about "SUM() Overflow". Why? "Well duh, I have a million rows, so yeah the SUM went over 2 billion. I don't need to call you to figure that out."

    So why do they call when the same thing happens on AVG() (and for the exact same reason)? It's because, either:

    A) They are only thinking of the end result and forget that the overflow happens while still accumulating and not at the end when it returns. or, ..

    B) They assume that some kind of "rolling division" algorithim is being used, like:

    Declare @Acc as int

    Set @Acc = 0

    ForEachRow:

    Select @Acc = @Acc + (col / TotalRows)

    Return @Acc

    This way each piece is small and there is no danger of overflow, right?

    But if you think about it, this algorithm is entirely unusable because first, it won't have TotalRows until it has already read all of the rows. And secondly, these are integers and dividing a single-row value by a million or more is likely to produce a lot of zeroes.

    The actual algorithm used is sort of like this:

    Declare @Acc as int

    Declare @Cnt as int

    Select @Acc = 0, @Cnt = 0

    ForEachRow:

    Select @Acc = @Acc + col, @Cnt = @Cnt + 1

    Return (@Acc / @Cnt)

    This has the advantage of beng useable (don't have to know TotalRows ahead of time), correct (no integer underflow) and faster to boot (the extra addition per row is much faster than the division).

    thats an excellent post! i was confused about why SUM was being mentioned here when it was an AVG. never thought of it this way. its so obvious really isnt it? 😀

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply