MIN() and MAX() issues

  • I've run into a curiosity. While creating a script to correlate some data, I was trying to use MIN() and MAX() on a identity column based on values in a datetime column. Using MAX(), I got the results back in six seconds (00:00:06). Using MIN(), I got the results back in 8 minutes 43 seconds (00:08:43). Using the Estimated Execution Plan, I found this:

    The MAX() query first does a Clustered Index SEEK (98%), then a TOP (2%), then a Stream Aggregate (0%).

    The MIN() query first does a Clustered Index SCAN (100%), followed by Parallelism/Gather Stream (0%), then a TOP (2%), then a Stream Aggregate (0%).

    The queries are:

    SELECT MAX(RowID)

    FROM mytable

    WHERE dt_column = '2006-09-06 11:00'

    SELECT MIN(RowID)

    FROM mytable

    WHERE dt_column = '2006-09-06 11:00'

    The RowID column is a clustered index and the dt_column is a non-clustered index.

    I can make the second query faster by changing it to a MAX() and using 2006-09-06 10:59:59 (acceptable for the need), but that doesn't explain why one uses a scan and parallelism/gather stream and the other uses a seek and no parallelism.

    Any thoughts on this?

    (I can't change the schema and that's not the point of this, I'm more interested in why the difference execution between the two.)

    -SQLBill

  • I have similar table with int column clustered index and date column non clusterd index with about 5 million rows. With are without date criteria it returns data in millseconds.

    And both uses clustererd index seek.

    Regards,
    gova

  • My table has 194 million rows. But I don't think the size should be causing MIN() and MAX() to act differently.

    -SQLBill

  • I wonder if it could be caused by a badly fragmented index?   Perhaps because of the physical position of the max records on the disk it is able to find them quicker?

  • The thing about parallellism is that we have no control over if and when it gets used.

    It seems like the MAX() didn't get any and the MIN() got some. Perhaps a coincidence, and perhaps of no importance, but you never now..

    Try to run both with parallellism turned off and see if the plans still are different?

    /Kenneth

  • Out for Justice,

    I had checked the fragmentation and it was slightly fragged. So I did a INDEXDEFRAG and ran it with the results I posted.

    Kenneth,

    It's been repeatable. MAX() doesn't use parallelism, MIN() does. This has been happening for a while, but the result time wasn't enough to 'dig' into it. But this was so large (1 1/2 hours to return less than 20,000 rows) that I started checking it out more in depth.

    If I get a chance I will try it with the parallelism off.

    -SQLBill

  • I ran your original examples on my production box... both execution plans are identical and they both have parallelism. 

    Could it be that someone screwed up and turned of auto-calc of statistics or that no regular maintenance plan is in effect?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nope, I have auto-statistics on (I also run UPDATEUSAGE frequently) and I do regular maintenance. In fact, I had just finished defragging the indexes and had updated usage and stats.

    -SQLBill

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply