Top command question

soochkoam

Valued Member

Points: 51
More actions
November 28, 2011 at 3:16 pm

#248735

I have a query that joins 8 tables together and will return 50 records. When I use the "top 100" command the query will return all 50 records immediately. If I omit this command the query will run indefinitely (I canceled the query after 2 minutes). Any ideas why the query performs so poorly without the top command?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

soochkoam (11/28/2011)
I have a query that joins 8 tables together and will return 50 records. When I use the "top 100" command the query will return all 50 records immediately. If I omit this command the query will run indefinitely (I canceled the query after 2 minutes). Any ideas why the query performs so poorly without the top command?

Offhand: No. Definately not enough information up front.

Now, if we can see the actual execution plan of the query that works and the estimated of the one that's running indefinately we might have a shot, or if you can let the full return query run until completion and get us the actual even better (not sure of your environment). The query itself would help too, along with the schema (and indexes) of the underlying structures.

From there, I would have to ask about how often did you repeat your testing, what is the default concurrency of the environment you're in, and was there any significant difference in activity between the times you took the two tests?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

soochkoam Valued Member Points: 51 More actions · Answer 2

I don't know how to copy the execution plan to here, but after viewing the plan, there is definetly a difference between the two querys. The main thing that stands out is a process called "Parallelism", this process is taking the bulk cost of the slow performing query, next is "Hash Match" process. The query with the "Top" command does not have either of these processes in the execution plan.

Below is the code for the slow performing query.

Thanks in advance,

Sooch

select d.LinkedDebtId, cct.CardNumber, trn2.TransactionId

from PaymentProcessing.App.[Transaction] trn

join PaymentProcessing.App.[Status] st on trn.StatusId = st.StatusId

join Debt.App.LinkedDebt ld on ld.LinkedDebtId = SUBSTRING(trn.ObjectCode,2,8)

join Debt.App.Debt d on ld.LinkedDebtId = d.LinkedDebtId

join Debt.App.DebtLineItem dli on dli.DebtId = d.DebtId

join debt.App.DebtTransaction dtrn on dtrn.DebtLineItemId = dli.DebtLineItemId

join debt.App.[Transaction] trn2 on trn2.TransactionId = dtrn.TransactionId

join PaymentProcessing.App.CCTransaction cct on cct.TransactionId = trn.TransactionId

where StatusName = 'open'

and trn2.PaymentProcessingTransactionId is null

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

soochkoam (11/28/2011)
I don't know how to copy the execution plan to here, but after viewing the plan, there is definetly a difference between the two querys.

Check out the bottom left link in my signature, part of that will show you how to save the execution plan in .sqlplan format.

The main thing that stands out is a process called "Parallelism", this process is taking the bulk cost of the slow performing query, next is "Hash Match" process. The query with the "Top" command does not have either of these processes in the execution plan.

These... usually... aren't a problem. However, test without the TOP 100 but at the tail of the query put in OPTION ( MAXDOP 1) to turn off the parallelism on this particular query.

Below is the code for the slow performing query.
Thanks in advance,
Sooch
select d.LinkedDebtId, cct.CardNumber, trn2.TransactionId
from PaymentProcessing.App.[Transaction] trn
join PaymentProcessing.App.[Status] st on trn.StatusId = st.StatusId
join Debt.App.LinkedDebt ld on ld.LinkedDebtId = SUBSTRING(trn.ObjectCode,2,8)
join Debt.App.Debt d on ld.LinkedDebtId = d.LinkedDebtId
join Debt.App.DebtLineItem dli on dli.DebtId = d.DebtId
join debt.App.DebtTransaction dtrn on dtrn.DebtLineItemId = dli.DebtLineItemId
join debt.App.[Transaction] trn2 on trn2.TransactionId = dtrn.TransactionId
join PaymentProcessing.App.CCTransaction cct on cct.TransactionId = trn.TransactionId
where StatusName = 'open'
and trn2.PaymentProcessingTransactionId is null

Of note: Substring trn.ObjectCode is non-sargable on that component and might force a scan, but it will depend on the order the optimizer decides to apply the predicates, but under most circumstances it's not going to help. You look a bit stuck in this case though. A persisted calculated column on Transaction that stores that substring with an index against it could help here, but I'd really need to dig into your data and the plan, and might be hyper-optimizing anyway.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

David Moutray SSCertifiable Points: 5956 More actions · Answer 4

Without getting into the specifics of the execution plans for the two queries, I can say this has to do with how the query optimizer processes the TOP command. TOP 100 (or 1,000 or whatever) basically tells the optimizer to take the first 100 (or whatever) results. If your output is ordered, you get the top 100 results. If your output is not ordered, you get the first 100 results.

This can impact performance because it allows the optimizer to make additional assumptions about your data and how best to process it. In this particular instance, for example, it appears that you have a large and complex set of data. (You are joining 8 tables.) If you are going to process all of that data, it might benefit from parallel processing. (This time the query only returns 50 rows, but next time it might return 5,000.) If, however, you tell the optimizer that you will never want more than 100 results, then the optimizer can make different assumptions.

Another example of the same phenomenon is adding a UNIQUE constraint to an index. An index might have only unique data, but if it isn't defined and constrained as UNIQUE, then the optimizer can't make the assumption that the data will always be unique. (Remember that a query plan has to be 100% safe for re-use.) Rob Farley has some excellent blog posts on the performance benefits of UNIQUEness.

The more you can tell the optimizer about your data, the better job it can do of ... uh ... optimizing. If you know, when you write a query, that you only want (and expect) 1 record in the result set, then including TOP 1 in the SELECT clause can help the optimizer find the most efficient query plan possible.