aggregate sql question

  • Given this table:

    CREATE TABLE #T_PEOPLE (

    SSN char (9),

    BIRTHDAY datetime

    )

    insert #T_PEOPLE(ssn, birthday) values('123456789', '1/1/60')

    insert #T_PEOPLE(ssn, birthday) values('111223333', '1/1/50')

     

    Which query is faster to find the ssn of the older person?

    This query:

    select top 1 SSN

    from #T_PEOPLE

    order by birthday

     

     

    or this one:

    select SSN

    from #T_PEOPLE

    where birthday=

    (select min(birthday) from #T_PEOPLE)

  • According to the Query Plan (From toolbar in QA select Query..  Show Execution Plan) Query 1 is more efficient (doesn't perform a TOP).

    However, the TOP cost is 0% it is just an additional step.

    I tried the following query as well (Like the structure better)

    SELECT T1.SSN

      FROM #T_PEOPLE T1

        INNER JOIN (SELECT MIN(birthday) birthday FROM #T_PEOPLE) T2

      ON T1.birthday = T2.birthday

    It gave same plan as #2.

    I guess you just need to test each query with a full set of data and make sure that you properly INDEX the table....



    Good Hunting!

    AJ Ahrens


    webmaster@kritter.net

  • I expected little difference between these two queries, performance-wise. But, being the curious soul I am, I decided to try it out, with the following code:

    DROP TABLE #T_PEOPLE
    GO
    
    CREATE TABLE #T_PEOPLE (
    SSN char (9),
    BIRTHDAY datetime
    )
    
    SET NOCOUNT ON
    DECLARE @counter int
    SET @counter = 100000000
    WHILE @counter < 101000000
    BEGIN
    INSERT INTO #T_PEOPLE (SSN, BIRTHDAY)
      VALUES (@counter, DATEADD(dd, CEILING(36525 * RAND()), '1905-01-01'))
    IF (@counter % 1000 = 0)
    BEGIN
    PRINT CONVERT(varchar,@counter) + ' - ' + CONVERT(varchar,getdate())
    END
    SET @counter = @counter + 1
    END
    
    SET NOCOUNT OFF
    GO
    

    Then I ran the two queries through as a single batch, with Show Execution Plan turned on.

    The "order by" query accounted for 96% of execution; the "select min" plan only 4%.

    Out of curiosity, I added an index (right before SET NOCOUNT OFF above:

    CREATE INDEX [people_birthday] ON #T_PEOPLE (BIRTHDAY) WITH  FILLFACTOR = 90
    

    This time, "order by" only took 66% of the execution, and "select min" 34%.

    The "select min" approach would seem to be a clear winner.

    And, yes, I have no life.


    R David Francis

  • Forgot to add this when I recreated my post (first attempt disappeared when I hit "Preview", probably because of some sort of time-out thing).

    You need "TOP 1" in the "select min" statement, in case there is more than one record with that birthdate; In my 1,000,000 row test table, there were several.


    R David Francis

  • The percentages within a plan is only valid in the context of that particular plan. You can't compare 'order by 75%' from one plan to 'min 25%' from another plan and draw any conclusions from that. What the figures say is just that in the first plan, the 'order by' stood for 75% of that entire plan, and the 'min' stood for 25% of the other plan's total.

    They can't be compared to each other as percentages because they have no correlation between plans. To give some hint about efficiency, you have to drill into the particular statements in each plan and compare the metrics for each, (ie pages, i/o, rows etc)

    /Kenneth

     

  • I ran the two statements together as a batch. Therefore, the plan included both statements. When I say that the "order by" statement had a percentage of 96%, I mean that all the elements of the statement, within the overall plan, added up to 96%. Similarly, all the elements of the "select min" statement added up to 4% of the same plan. Otherwise, it would be a barely credible coincidence that, taken together, they added up to 100%.

    I do agree that there are caveats involved. In particular, comparing the percentages in the first plan (both statements, no index) to those in the second plan (both statements, birthday indexed) is not directly applicable. The first statement took about 2 seconds to execute; the second was virtually instantaneous. Some might take the change in percentage on the "select min" statement to mean that it actually runs less effectively with an index in place. Since we don't have a solid benchmark for comparison, we can't say for certain that it's faster, the same, or slower with the index. We can say that the two statements combined are noticeably faster with the index in place. We can say that it is *likely* that the "select min" statement is not dramatically faster, and thus that the higher percentage in the indexed batch represents the dramatic improvement in the "order by" statement.


    R David Francis

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply