Top and Order By (Why can one change the other?)

  • NOTE : I have a work around for this so I can fix my issue. What I am looking

    for is understanding why this happened.

    I recently ran across a problem where adding a top clause changed the sorting of

    my query. The query was sorting off of a field that is user configurable, but

    may not be configured. And the column defaults to 1. So when sorting, ny sort

    may or may not be sorting against data that is truly sortable, or it may be all

    1's. I wrote the below examples against the pubs DB to demonstrate my question.

    Query 1 (No Top):

    
    
    SELECT au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract ASC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    172-32-1176 Johnson 1

    213-46-8915 Marjorie 1


    Query 2 (Added TOP) :

    
    
    SELECT TOP 10 au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract ASC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    213-46-8915 Marjorie 1

    172-32-1176 Johnson 1


    As you can see, the data clearly changed sort, even though all I did was add a

    Top clause. What happened here? I looked at the execution plan, and they seem

    almost exactly the same, yet the results are not. I then changed the sort method

    on the second query, expecting the results to get flip-flopped back, but noooo....

    Query 3 (TOP with DESC sort) :

    
    
    SELECT TOP 10 au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract DESC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    213-46-8915 Marjorie 1

    172-32-1176 Johnson 1


    I think here we have hit a lack of knowledge on my part in regards to the query

    optimizer. Oh btw, remove the order by's on the first two queries to get an

    interesting result.

    Tim C //Will code for food


    Tim C //Will code for food

  • Not a botttom line answer, but the standard MS line is that unless you provide the order by, you're not guaranteed any particular sort. Group by's used to return sorted as a side affect of how they did the group by internally, when they changed the algorithm it broke that handy side affect.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • I did provide an ORDER BY Andy. The only thing I changed was I added a TOP

    clause to the second query. If you run the queries I posted below using an

    execution plan the only difference you will see is that is does a TopN sort.

    AFAIK TOP did not sort, yet this is showing me that it does? IT is very

    frustrating sometimes not having the source to the query optimizer causing one

    to guess as to how some things work by trial and error.

    Tim C //Will code for food


    Tim C //Will code for food

  • Am I missing something here?

    The two rows returned have the same value (1)for the ORDER BY column (contract), therefore SQL Server cannot guarantee which order they will return in.

    If it is important that rows with the same value are returned in a specific order, then you should order by a second column as well (e.g. CONTRACT, AU_ID)

  • You are not missing anything Ian. The column may or may not have sort

    able data, it is basically a user-definable sort, and without them defining the

    values the sorted column will contain all ones. I used the authors table as an

    example, and one that you could run on your own system. Querying the 1st query

    without the TOP, 100 out of 100 times sorts exactly as I expected. Then adding

    the TOP changes the sort, so I added another column to the ORDER BY. What I was

    looking for is why is a TOP clause performing a sort?

    Try this to see something else interesting, same queries, just added an extra

    clause to the where :

    
    
    SELECT au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    AND contract = 1
    ORDER BY contract ASC

    SELECT TOP 10 au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    AND contract = 1
    ORDER BY contract ASC

    From BOL :

    quote:


    TOP n [PERCENT]

    Specifies that only the first n rows are to be output from the query result set.

    n is an integer between 0 and 4294967295. If PERCENT is also specified, only the

    first n percent of the rows are output from the result set. When specified with

    PERCENT, n must be an integer between 0 and 100.

    If the query includes an ORDER BY clause, the first n rows (or n percent of

    rows) ordered by the ORDER BY clause are output. If the query has no ORDER BY

    clause, the order of the rows is arbitrary.


    This DOES not say it will re-arrange a sort. So whats going on? This is

    more of an intellectual curiousity challenge. Plus knowing this will keep me

    and others from introducing weird logic bugs into their T-SQL.

    Tim C //Will code for food


    Tim C //Will code for food

  • Tim

    The answer lies in the query plan generated by the two queries. One generates a 'Sort', the other generates a 'Sort/TopN Sort'. Obviously the SQL Server code for these is slightly different.

    Your quote from BOL, says that if there is no ORDER BY, then the results are arbitrary.

    In your example, it is correctly sorting on the contract column, but as there is no secondary 'order by' column, then within contract, the results are arbitrary.

    In the examples included in your second post, the query plan doesn't generate a 'Sort' in either query. SQL Server has correctly spotted that there is no need to sort, because you are limiting the result set to a single value of the order by column.

    In your 'TOP' query, SQL Server has correctly returned the data in Contract sequence, as specified by the ORDER BY clause.

  • Ian, I understand what is going on, what I am questioning is WHY? I can find no

    documentation that states that TOP will perform a sort, yet it clearly does,

    EVEN WITH an ORDER BY present, forget the fact that the order by may or

    may not actually have sort-able data. I did give the query an ORDER BY. Yet the

    TOP clearly re-sorted it. I am basically trying to point out a hidden gotcha in

    the optimizer. I can guarantee that if I found this, then others have fallen for

    this as well. They may have meant that is what TOP does in that BOL quote I

    posted before, yet it is very unclear.

    Tim C //Will code for food


    Tim C //Will code for food

  • Tim

    The SELECT TOP is performing a sort because you asked it to by specifying an ORDER BY clause.

    The only difference is that that the combination of TOP and ORDER BY is sorting EQUAL values differently to the ORDER BY without the TOP.

  • Ian, thanks for explaining to me what I have already pointed out. My wife

    typically does this to me, but it is refreshing to have it done in a forum as

    well. Please read the entire post thoroughly, and you will see that we are

    saying basically the same thing, yet not comprehending the other.

    Tim C //Will code for food


    Tim C //Will code for food

  • Tim is right, it is very odd.

    create table x (

    ix int,

    nx varchar(3)

    )

    insert x (ix, nx) values (1, 'cat')

    insert x (ix, nx) values (1, 'dog')

    now try

    select * from x order by ix

    then

    select top 2 * from x order by ix

    then

    select top 2 * from x

    and notice what happens.

    However also try this and notice what doesn't happen.

    select * from (select top 2 * from x order by ix) as z order by ix

    then

    select top 2 * from (select top 2 * from x order by ix) as z order by ix

    I am not sure exactly why Top is affecting the order, but suspect it is the way the data enteres the buffers and then top is performed.

    But that doesn't explain the situation in the later except that Order By 1 could be any order as long as 1,1,1,+ is the order.

  • Let's be patient everyone! Somehow we'll figure out what we mean, or what we meant to mean. Know what I mean?

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • In SQL 6.5 with, the I think it was the job history table, when the 1000 rows limit was exceeded the latest rather than oldest was job history was deleted. The effect was that the backup job completed succussfully with dumps files produced but no history in history table. The problem was in the system trigger the developer assumed the data would in a certain sequence and end up deleting the youngest entry.

    For me it reconfirms the rule "never make assumptions about the order the data is returned if not specified explicitly". As far as code robustness what will happen should the database migrate to to a another type of database which might have implemented "Top" differently with regards to affecting after sort sequence.

  • It is really a curiosity. BTW in mssql 7.0 won’t happen this.

    'Sort', and 'Sort/TopN Sort' may be different.

    In mssql 7.0, first is always a SORT proc, then a TOP proc (a FIND proc).

    MSSQL 70: àSORT-àFIND TOP (Always)

    In this new method, they are saving the sort step some how, so (maybe) the sort would be done before only if it is known that the result set would be bigger (alone, by their conditions) than the top sentence. If that is not the case, then:

    MSSQL 2K: àFIND rows TOP à SORT a smaller result set if necessary

    So they choose first FIND the top rows, then SORT them only if necessary (and in this samples they would be).

    (FIND rows a put them in a list, wouldn’t imply original order.)

    quote:


    NOTE : I have a work around for this so I can fix my issue. What I am looking

    for is understanding why this happened.

    I recently ran across a problem where adding a top clause changed the sorting of

    my query. The query was sorting off of a field that is user configurable, but

    may not be configured. And the column defaults to 1. So when sorting, ny sort

    may or may not be sorting against data that is truly sortable, or it may be all

    1's. I wrote the below examples against the pubs DB to demonstrate my question.

    Query 1 (No Top):

    
    
    SELECT au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract ASC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    172-32-1176 Johnson 1

    213-46-8915 Marjorie 1


    Query 2 (Added TOP) :

    
    
    SELECT TOP 10 au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract ASC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    213-46-8915 Marjorie 1

    172-32-1176 Johnson 1


    As you can see, the data clearly changed sort, even though all I did was add a

    Top clause. What happened here? I looked at the execution plan, and they seem

    almost exactly the same, yet the results are not. I then changed the sort method

    on the second query, expecting the results to get flip-flopped back, but noooo....

    Query 3 (TOP with DESC sort) :

    
    
    SELECT TOP 10 au_id,
    au_fname,
    contract
    FROM pubs.dbo.authors
    WHERE au_id IN ('172-32-1176', '213-46-8915')
    ORDER BY contract DESC

    Results :

    quote:


    au_id au_fname contract

    ----------- -------------------- --------

    213-46-8915 Marjorie 1

    172-32-1176 Johnson 1


    I think here we have hit a lack of knowledge on my part in regards to the query

    optimizer. Oh btw, remove the order by's on the first two queries to get an

    interesting result.

    Tim C //Will code for food


  • We all agree (don't we?) that if you don't provide an "ORDER BY" clause, you'll get your results in an arbitrary order.

    I'm now going to rephrase that statement slightly.

    If you don't provide a MEANINGFUL "ORDER BY" clause, you'll get your results in an arbitrary order.

    Your ORDER BY column is Contract, but both values are 1. To me, that means no MEANINGFUL "ORDER BY" clause has been supplied, so the results are returned in an arbitrary order - which is indeed what is occurring. I don't believe the TOP n clause is relevant here - you're simply getting arbitrary results.

  • But what's funny is it is not as arbitrary as one would think. When I tested mine example I always got the same results and it differs between SQL 7 and SQL 2000.

    On SQL 7

    With order By

    1, cat

    1, dog

    With Top 2 and Order By

    1, cat

    1, dog

    With Top 2

    1, cat

    1, dog

    However on SQL 2K

    With order By

    1, cat

    1, dog

    With Top 2 and Order By

    1, dog

    1, cat

    With Top 2

    1, cat

    1, dog

    Notice with Top 2 and Order By it flips the order. And since I tested this on 4 SQL 2K installs and 3 SQL 7 installs and got the exact same results (based on order the results for all the previous were the same in reverse of what typed). So it seems like the condition is related to a SQL 2000 engine change.

    Ok further testing I threw in 1,eel and did same.

    With just order by or top 3 I would get order of data insert sorted on ix column or 1 in this case.

    1,cat

    1,dog

    1,eel

    But with Top3 and Order By

    1,eel

    1,dog

    1,cat

    If I inserted dog, cat, eel I got

    1,eel

    1,cat

    1,dog

    So I tested with two sets

    1,dog

    1,cat

    2,cat

    2,dog

    With Top 4 or Order by I got

    1,dog

    1,cat

    2,cat

    2,dog

    but with Top 4 and Order by I got

    1,cat

    1,dog

    2,dog

    2,cat

    So I had a theory and need further testing to see how it holds up. So I instead tried this with the data.

    insert x (ix, nx) values (6, 'dog')

    insert x (ix, nx) values (1, 'dog')

    insert x (ix, nx) values (1, 'cat')

    insert x (ix, nx) values (3, 'dog')

    insert x (ix, nx) values (5, 'dog')

    insert x (ix, nx) values (2, 'cat')

    insert x (ix, nx) values (3, 'cat')

    insert x (ix, nx) values (2, 'dog')

    insert x (ix, nx) values (4, 'dog')

    insert x (ix, nx) values (5, 'cat')

    insert x (ix, nx) values (4, 'cat')

    insert x (ix, nx) values (6, 'cat')

    then this set

    insert x (ix, nx) values (7, 'dog')

    insert x (ix, nx) values (6, 'dog')

    insert x (ix, nx) values (1, 'dog')

    insert x (ix, nx) values (1, 'cat')

    insert x (ix, nx) values (3, 'dog')

    insert x (ix, nx) values (5, 'dog')

    insert x (ix, nx) values (2, 'cat')

    insert x (ix, nx) values (3, 'cat')

    insert x (ix, nx) values (2, 'dog')

    insert x (ix, nx) values (4, 'dog')

    insert x (ix, nx) values (5, 'cat')

    insert x (ix, nx) values (4, 'cat')

    insert x (ix, nx) values (6, 'cat')

    insert x (ix, nx) values (7, 'cat')

    Curious effect and I can see now they changed the Order By search mechanism with just the first query.

    Try it yourselves and see if you can spot the difference between 7 and 2000.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply