April 8, 2003 at 10:51 am
NOTE : I have a work around for this so I can fix my issue. What I am looking
for is understanding why this happened.
I recently ran across a problem where adding a top clause changed the sorting of
my query. The query was sorting off of a field that is user configurable, but
may not be configured. And the column defaults to 1. So when sorting, ny sort
may or may not be sorting against data that is truly sortable, or it may be all
1's. I wrote the below examples against the pubs DB to demonstrate my question.
Query 1 (No Top):
SELECT au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract ASC
Results :
quote:
au_id au_fname contract----------- -------------------- --------
172-32-1176 Johnson 1
213-46-8915 Marjorie 1
Query 2 (Added TOP) :
SELECT TOP 10 au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract ASC
Results :
quote:
au_id au_fname contract----------- -------------------- --------
213-46-8915 Marjorie 1
172-32-1176 Johnson 1
As you can see, the data clearly changed sort, even though all I did was add a
Top clause. What happened here? I looked at the execution plan, and they seem
almost exactly the same, yet the results are not. I then changed the sort method
on the second query, expecting the results to get flip-flopped back, but noooo....
Query 3 (TOP with DESC sort) :
SELECT TOP 10 au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract DESC
Results :
quote:
au_id au_fname contract----------- -------------------- --------
213-46-8915 Marjorie 1
172-32-1176 Johnson 1
I think here we have hit a lack of knowledge on my part in regards to the query
optimizer. Oh btw, remove the order by's on the first two queries to get an
interesting result.
Tim C //Will code for food
Tim C //Will code for food
April 8, 2003 at 12:53 pm
Not a botttom line answer, but the standard MS line is that unless you provide the order by, you're not guaranteed any particular sort. Group by's used to return sorted as a side affect of how they did the group by internally, when they changed the algorithm it broke that handy side affect.
Andy
April 8, 2003 at 1:21 pm
I did provide an ORDER BY Andy. The only thing I changed was I added a TOP
clause to the second query. If you run the queries I posted below using an
execution plan the only difference you will see is that is does a TopN sort.
AFAIK TOP did not sort, yet this is showing me that it does? IT is very
frustrating sometimes not having the source to the query optimizer causing one
to guess as to how some things work by trial and error.
Tim C //Will code for food
Tim C //Will code for food
April 9, 2003 at 1:29 am
Am I missing something here?
The two rows returned have the same value (1)for the ORDER BY column (contract), therefore SQL Server cannot guarantee which order they will return in.
If it is important that rows with the same value are returned in a specific order, then you should order by a second column as well (e.g. CONTRACT, AU_ID)
April 9, 2003 at 8:23 am
You are not missing anything Ian. The column may or may not have sort
able data, it is basically a user-definable sort, and without them defining the
values the sorted column will contain all ones. I used the authors table as an
example, and one that you could run on your own system. Querying the 1st query
without the TOP, 100 out of 100 times sorts exactly as I expected. Then adding
the TOP changes the sort, so I added another column to the ORDER BY. What I was
looking for is why is a TOP clause performing a sort?
Try this to see something else interesting, same queries, just added an extra
clause to the where :
SELECT au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
AND contract = 1
ORDER BY contract ASC
SELECT TOP 10 au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
AND contract = 1
ORDER BY contract ASC
From BOL :
quote:
TOP n [PERCENT]Specifies that only the first n rows are to be output from the query result set.
n is an integer between 0 and 4294967295. If PERCENT is also specified, only the
first n percent of the rows are output from the result set. When specified with
PERCENT, n must be an integer between 0 and 100.
If the query includes an ORDER BY clause, the first n rows (or n percent of
rows) ordered by the ORDER BY clause are output. If the query has no ORDER BY
clause, the order of the rows is arbitrary.
This DOES not say it will re-arrange a sort. So whats going on? This is
more of an intellectual curiousity challenge. Plus knowing this will keep me
and others from introducing weird logic bugs into their T-SQL.
Tim C //Will code for food
Tim C //Will code for food
April 9, 2003 at 9:11 am
Tim
The answer lies in the query plan generated by the two queries. One generates a 'Sort', the other generates a 'Sort/TopN Sort'. Obviously the SQL Server code for these is slightly different.
Your quote from BOL, says that if there is no ORDER BY, then the results are arbitrary.
In your example, it is correctly sorting on the contract column, but as there is no secondary 'order by' column, then within contract, the results are arbitrary.
In the examples included in your second post, the query plan doesn't generate a 'Sort' in either query. SQL Server has correctly spotted that there is no need to sort, because you are limiting the result set to a single value of the order by column.
In your 'TOP' query, SQL Server has correctly returned the data in Contract sequence, as specified by the ORDER BY clause.
April 9, 2003 at 9:42 am
Ian, I understand what is going on, what I am questioning is WHY? I can find no
documentation that states that TOP will perform a sort, yet it clearly does,
EVEN WITH an ORDER BY present, forget the fact that the order by may or
may not actually have sort-able data. I did give the query an ORDER BY. Yet the
TOP clearly re-sorted it. I am basically trying to point out a hidden gotcha in
the optimizer. I can guarantee that if I found this, then others have fallen for
this as well. They may have meant that is what TOP does in that BOL quote I
posted before, yet it is very unclear.
Tim C //Will code for food
Tim C //Will code for food
April 9, 2003 at 9:59 am
Tim
The SELECT TOP is performing a sort because you asked it to by specifying an ORDER BY clause.
The only difference is that that the combination of TOP and ORDER BY is sorting EQUAL values differently to the ORDER BY without the TOP.
April 9, 2003 at 10:28 am
Ian, thanks for explaining to me what I have already pointed out. My wife
typically does this to me, but it is refreshing to have it done in a forum as
well. Please read the entire post thoroughly, and you will see that we are
saying basically the same thing, yet not comprehending the other.
Tim C //Will code for food
Tim C //Will code for food
April 9, 2003 at 11:07 am
Tim is right, it is very odd.
create table x (
ix int,
nx varchar(3)
)
insert x (ix, nx) values (1, 'cat')
insert x (ix, nx) values (1, 'dog')
now try
select * from x order by ix
then
select top 2 * from x order by ix
then
select top 2 * from x
and notice what happens.
However also try this and notice what doesn't happen.
select * from (select top 2 * from x order by ix) as z order by ix
then
select top 2 * from (select top 2 * from x order by ix) as z order by ix
I am not sure exactly why Top is affecting the order, but suspect it is the way the data enteres the buffers and then top is performed.
But that doesn't explain the situation in the later except that Order By 1 could be any order as long as 1,1,1,+ is the order.
April 9, 2003 at 11:43 am
Let's be patient everyone! Somehow we'll figure out what we mean, or what we meant to mean. Know what I mean?
Andy
April 9, 2003 at 3:55 pm
In SQL 6.5 with, the I think it was the job history table, when the 1000 rows limit was exceeded the latest rather than oldest was job history was deleted. The effect was that the backup job completed succussfully with dumps files produced but no history in history table. The problem was in the system trigger the developer assumed the data would in a certain sequence and end up deleting the youngest entry.
For me it reconfirms the rule "never make assumptions about the order the data is returned if not specified explicitly". As far as code robustness what will happen should the database migrate to to a another type of database which might have implemented "Top" differently with regards to affecting after sort sequence.
April 9, 2003 at 10:44 pm
It is really a curiosity. BTW in mssql 7.0 won’t happen this.
'Sort', and 'Sort/TopN Sort' may be different.
In mssql 7.0, first is always a SORT proc, then a TOP proc (a FIND proc).
MSSQL 70: àSORT-àFIND TOP (Always)
In this new method, they are saving the sort step some how, so (maybe) the sort would be done before only if it is known that the result set would be bigger (alone, by their conditions) than the top sentence. If that is not the case, then:
MSSQL 2K: àFIND rows TOP à SORT a smaller result set if necessary
So they choose first FIND the top rows, then SORT them only if necessary (and in this samples they would be).
(FIND rows a put them in a list, wouldn’t imply original order.)
quote:
NOTE : I have a work around for this so I can fix my issue. What I am lookingfor is understanding why this happened.
I recently ran across a problem where adding a top clause changed the sorting of
my query. The query was sorting off of a field that is user configurable, but
may not be configured. And the column defaults to 1. So when sorting, ny sort
may or may not be sorting against data that is truly sortable, or it may be all
1's. I wrote the below examples against the pubs DB to demonstrate my question.
Query 1 (No Top):
SELECT au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract ASCResults :
quote:
au_id au_fname contract----------- -------------------- --------
172-32-1176 Johnson 1
213-46-8915 Marjorie 1
Query 2 (Added TOP) :
SELECT TOP 10 au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract ASCResults :
quote:
au_id au_fname contract----------- -------------------- --------
213-46-8915 Marjorie 1
172-32-1176 Johnson 1
As you can see, the data clearly changed sort, even though all I did was add a
Top clause. What happened here? I looked at the execution plan, and they seem
almost exactly the same, yet the results are not. I then changed the sort method
on the second query, expecting the results to get flip-flopped back, but noooo....
Query 3 (TOP with DESC sort) :
SELECT TOP 10 au_id,
au_fname,
contract
FROM pubs.dbo.authors
WHERE au_id IN ('172-32-1176', '213-46-8915')
ORDER BY contract DESCResults :
quote:
au_id au_fname contract----------- -------------------- --------
213-46-8915 Marjorie 1
172-32-1176 Johnson 1
I think here we have hit a lack of knowledge on my part in regards to the query
optimizer. Oh btw, remove the order by's on the first two queries to get an
interesting result.
Tim C //Will code for food
April 10, 2003 at 12:50 am
We all agree (don't we?) that if you don't provide an "ORDER BY" clause, you'll get your results in an arbitrary order.
I'm now going to rephrase that statement slightly.
If you don't provide a MEANINGFUL "ORDER BY" clause, you'll get your results in an arbitrary order.
Your ORDER BY column is Contract, but both values are 1. To me, that means no MEANINGFUL "ORDER BY" clause has been supplied, so the results are returned in an arbitrary order - which is indeed what is occurring. I don't believe the TOP n clause is relevant here - you're simply getting arbitrary results.
April 10, 2003 at 4:22 am
But what's funny is it is not as arbitrary as one would think. When I tested mine example I always got the same results and it differs between SQL 7 and SQL 2000.
On SQL 7
With order By
1, cat
1, dog
With Top 2 and Order By
1, cat
1, dog
With Top 2
1, cat
1, dog
However on SQL 2K
With order By
1, cat
1, dog
With Top 2 and Order By
1, dog
1, cat
With Top 2
1, cat
1, dog
Notice with Top 2 and Order By it flips the order. And since I tested this on 4 SQL 2K installs and 3 SQL 7 installs and got the exact same results (based on order the results for all the previous were the same in reverse of what typed). So it seems like the condition is related to a SQL 2000 engine change.
Ok further testing I threw in 1,eel and did same.
With just order by or top 3 I would get order of data insert sorted on ix column or 1 in this case.
1,cat
1,dog
1,eel
But with Top3 and Order By
1,eel
1,dog
1,cat
If I inserted dog, cat, eel I got
1,eel
1,cat
1,dog
So I tested with two sets
1,dog
1,cat
2,cat
2,dog
With Top 4 or Order by I got
1,dog
1,cat
2,cat
2,dog
but with Top 4 and Order by I got
1,cat
1,dog
2,dog
2,cat
So I had a theory and need further testing to see how it holds up. So I instead tried this with the data.
insert x (ix, nx) values (6, 'dog')
insert x (ix, nx) values (1, 'dog')
insert x (ix, nx) values (1, 'cat')
insert x (ix, nx) values (3, 'dog')
insert x (ix, nx) values (5, 'dog')
insert x (ix, nx) values (2, 'cat')
insert x (ix, nx) values (3, 'cat')
insert x (ix, nx) values (2, 'dog')
insert x (ix, nx) values (4, 'dog')
insert x (ix, nx) values (5, 'cat')
insert x (ix, nx) values (4, 'cat')
insert x (ix, nx) values (6, 'cat')
then this set
insert x (ix, nx) values (7, 'dog')
insert x (ix, nx) values (6, 'dog')
insert x (ix, nx) values (1, 'dog')
insert x (ix, nx) values (1, 'cat')
insert x (ix, nx) values (3, 'dog')
insert x (ix, nx) values (5, 'dog')
insert x (ix, nx) values (2, 'cat')
insert x (ix, nx) values (3, 'cat')
insert x (ix, nx) values (2, 'dog')
insert x (ix, nx) values (4, 'dog')
insert x (ix, nx) values (5, 'cat')
insert x (ix, nx) values (4, 'cat')
insert x (ix, nx) values (6, 'cat')
insert x (ix, nx) values (7, 'cat')
Curious effect and I can see now they changed the Order By search mechanism with just the first query.
Try it yourselves and see if you can spot the difference between 7 and 2000.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply