How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

Question

How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

Viewing 14 posts - 91 through 103 (of 103 total)

You must be logged in to reply to this topic. Login to reply

david.tyler SSCrazy Points: 2206 More actions · Answer 1

I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

jcrawf02 (1/14/2010)
So Gail, something that I wonder occasionally (the rest of the time I wonder about everything else :-P) - if I SELECT TOP 10 and ORDER BY the clustered index, can the optimizer figure out that it just needs to look at the first ten records in the index and work with those? Or is it always the case that it will process the query on the dataset and THEN apply the TOP filter?

Depends on the complexity of the query.

SELECT TOP 10 * FROM SomeTable

This so simple that the top can be pushed right down. SQL will only fetch 10 rows from the table. It knows it doesn't need more.

SELECT Top 10 * FROM SomeTable Where AnIndexedColumn = @somevar

Same here. Simple enough to be an index seek that returns only 10 records

SELECT Top 10 SomeColumn

FROM ATable

WHERE AnotherColumn = @var

Group by SomeColumn

Having Count(*) > 10

Order By SomeColumn

Now, for this one, how many rows are needed from the base table to return 10 at the end? Very difficult to say so, in this case it's likely that all qualifying rows will be fetched, aggregated, filtered again and then the top applied.

The optimiser will try really hard to push the row limitation down as far as possible, and I've seen it do things I wouldn't have expected (including pushing a row limitation down to an index seek when I was filtering on the output of Row_Number

I think this is worthy of a blog post, it's complex.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

isuckatsql SSChampion Points: 10381 More actions · Answer 3

david.tyler (1/14/2010)
I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.

If you are so 'professional', why do you resort to name calling?

SQLRNNR SSC Guru Points: 281334 More actions · Answer 4

Ifila, thanks for posting the information that was being requested. This topic has shown that SSC can help and will help when challenged. I hope the results of the problem being resolved will keep you involved at SSC.

It is good that your issue is resolved.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 5

GilaMonster (1/14/2010)
I think this is worthy of a blog post, it's complex.

woohoo! 😀

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

isuckatsql SSChampion Points: 10381 More actions · Answer 6

After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.

Thanks to all the constructive contributors!

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

ifila (1/14/2010)
david.tyler (1/14/2010)
I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.
If you are so 'professional', why do you resort to name calling?

Careful now, you didn't start out so well either but we persevered in an attempt to help you solve your partiular issue, even if some of us may have gotten a bit sarcastic at times.

Bottom line, we took a challange and developed a solution and provided a recommendation on indexes as well. And, it didn't cost you any $$.

Paul White SSC Guru Points: 150468 More actions · Answer 8

GilaMonster (1/14/2010)I think this is worthy of a blog post, it's complex.

In the meantime, anyone interested in how TOP and OPTION (FAST n) affect the choice of plan made by the optimizer could look at the following two blogs by the SQL Server Query Optimization Team:

Row Goals In Action

FAST hint guidance

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

SQLRNNR SSC Guru Points: 281334 More actions · Answer 9

Thanks Paul.

Gail - can't wait to see that Blog post.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

ifila (1/14/2010)
After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.
Thanks to all the constructive contributors!

Guilty of cantankerism as charged - but hey, it didn't work out so bad in the end.

Ifila, have you considered putting the row restriction on the innermost part of the query? Using TOP 1000 will give you the top 1000 resumes which will of course result in perhaps twice as many email addresses, but it does mean that the slowest, most expensive part of the query has a limiter on it. You could use a variable for your limit value and apply it to both the innermost query, to improve performance, and to your outermost i.e. the pivot, to return a consistent rowcount.

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

isuckatsql SSChampion Points: 10381 More actions · Answer 11

Chris Morris-439714 (1/15/2010)
ifila (1/14/2010)
After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.
Thanks to all the constructive contributors!
Guilty of cantankerism as charged - but hey, it didn't work out so bad in the end.
Ifila, have you considered putting the row restriction on the innermost part of the query? Using TOP 1000 will give you the top 1000 resumes which will of course result in perhaps twice as many email addresses, but it does mean that the slowest, most expensive part of the query has a limiter on it. You could use a variable for your limit value and apply it to both the innermost query, to improve performance, and to your outermost i.e. the pivot, to return a consistent rowcount.
Cheers
ChrisM

Thanks for the suggestion, but i dont understand what you mean:unsure:

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

This bit...

SELECT Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

WHERE ((contains (originalresume, '"j2ee" and "java"'))

AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

change it to

SELECT top 1000 Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

WHERE ((contains (originalresume, '"j2ee" and "java"'))

AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

You will want to put an ORDER BY on that. If you want it random, you could use ORDER BY NEWID(), but you might want it ordered by recency.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

isuckatsql SSChampion Points: 10381 More actions · Answer 13

Chris Morris-439714 (1/15/2010)
This bit...
SELECT Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3
WHERE ((contains (originalresume, '"j2ee" and "java"'))
AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))
change it to
SELECT top 1000 Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3
WHERE ((contains (originalresume, '"j2ee" and "java"'))
AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))
You will want to put an ORDER BY on that. If you want it random, you could use ORDER BY NEWID(), but you might want it ordered by recency.

Excluding the 'Order By' that is the way i had it working before my post on SSC.

Thanks for the suggestion!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

Yeah but...kinda.

This time round, you know exactly why you would put it there, and why you will (almost) always get more results in your final output than you specified in the row limit. You know that you can select the first 1000 matching resumes, most recent first i.e. ordered by recency, then extract out all of the email addresses from them, then isolate 1000 email addresses.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden