Top command question

  • I have a query that joins 8 tables together and will return 50 records. When I use the "top 100" command the query will return all 50 records immediately. If I omit this command the query will run indefinitely (I canceled the query after 2 minutes). Any ideas why the query performs so poorly without the top command?

  • soochkoam (11/28/2011)


    I have a query that joins 8 tables together and will return 50 records. When I use the "top 100" command the query will return all 50 records immediately. If I omit this command the query will run indefinitely (I canceled the query after 2 minutes). Any ideas why the query performs so poorly without the top command?

    Offhand: No. Definately not enough information up front.

    Now, if we can see the actual execution plan of the query that works and the estimated of the one that's running indefinately we might have a shot, or if you can let the full return query run until completion and get us the actual even better (not sure of your environment). The query itself would help too, along with the schema (and indexes) of the underlying structures.

    From there, I would have to ask about how often did you repeat your testing, what is the default concurrency of the environment you're in, and was there any significant difference in activity between the times you took the two tests?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I don't know how to copy the execution plan to here, but after viewing the plan, there is definetly a difference between the two querys. The main thing that stands out is a process called "Parallelism", this process is taking the bulk cost of the slow performing query, next is "Hash Match" process. The query with the "Top" command does not have either of these processes in the execution plan.

    Below is the code for the slow performing query.

    Thanks in advance,

    Sooch

    select d.LinkedDebtId, cct.CardNumber, trn2.TransactionId

    from PaymentProcessing.App.[Transaction] trn

    join PaymentProcessing.App.[Status] st on trn.StatusId = st.StatusId

    join Debt.App.LinkedDebt ld on ld.LinkedDebtId = SUBSTRING(trn.ObjectCode,2,8)

    join Debt.App.Debt d on ld.LinkedDebtId = d.LinkedDebtId

    join Debt.App.DebtLineItem dli on dli.DebtId = d.DebtId

    join debt.App.DebtTransaction dtrn on dtrn.DebtLineItemId = dli.DebtLineItemId

    join debt.App.[Transaction] trn2 on trn2.TransactionId = dtrn.TransactionId

    join PaymentProcessing.App.CCTransaction cct on cct.TransactionId = trn.TransactionId

    where StatusName = 'open'

    and trn2.PaymentProcessingTransactionId is null

  • soochkoam (11/28/2011)


    I don't know how to copy the execution plan to here, but after viewing the plan, there is definetly a difference between the two querys.

    Check out the bottom left link in my signature, part of that will show you how to save the execution plan in .sqlplan format.

    The main thing that stands out is a process called "Parallelism", this process is taking the bulk cost of the slow performing query, next is "Hash Match" process. The query with the "Top" command does not have either of these processes in the execution plan.

    These... usually... aren't a problem. However, test without the TOP 100 but at the tail of the query put in OPTION ( MAXDOP 1) to turn off the parallelism on this particular query.

    Below is the code for the slow performing query.

    Thanks in advance,

    Sooch

    select d.LinkedDebtId, cct.CardNumber, trn2.TransactionId

    from PaymentProcessing.App.[Transaction] trn

    join PaymentProcessing.App.[Status] st on trn.StatusId = st.StatusId

    join Debt.App.LinkedDebt ld on ld.LinkedDebtId = SUBSTRING(trn.ObjectCode,2,8)

    join Debt.App.Debt d on ld.LinkedDebtId = d.LinkedDebtId

    join Debt.App.DebtLineItem dli on dli.DebtId = d.DebtId

    join debt.App.DebtTransaction dtrn on dtrn.DebtLineItemId = dli.DebtLineItemId

    join debt.App.[Transaction] trn2 on trn2.TransactionId = dtrn.TransactionId

    join PaymentProcessing.App.CCTransaction cct on cct.TransactionId = trn.TransactionId

    where StatusName = 'open'

    and trn2.PaymentProcessingTransactionId is null

    Of note: Substring trn.ObjectCode is non-sargable on that component and might force a scan, but it will depend on the order the optimizer decides to apply the predicates, but under most circumstances it's not going to help. You look a bit stuck in this case though. A persisted calculated column on Transaction that stores that substring with an index against it could help here, but I'd really need to dig into your data and the plan, and might be hyper-optimizing anyway.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Without getting into the specifics of the execution plans for the two queries, I can say this has to do with how the query optimizer processes the TOP command. TOP 100 (or 1,000 or whatever) basically tells the optimizer to take the first 100 (or whatever) results. If your output is ordered, you get the top 100 results. If your output is not ordered, you get the first 100 results.

    This can impact performance because it allows the optimizer to make additional assumptions about your data and how best to process it. In this particular instance, for example, it appears that you have a large and complex set of data. (You are joining 8 tables.) If you are going to process all of that data, it might benefit from parallel processing. (This time the query only returns 50 rows, but next time it might return 5,000.) If, however, you tell the optimizer that you will never want more than 100 results, then the optimizer can make different assumptions.

    Another example of the same phenomenon is adding a UNIQUE constraint to an index. An index might have only unique data, but if it isn't defined and constrained as UNIQUE, then the optimizer can't make the assumption that the data will always be unique. (Remember that a query plan has to be 100% safe for re-use.) Rob Farley has some excellent blog posts on the performance benefits of UNIQUEness.

    The more you can tell the optimizer about your data, the better job it can do of ... uh ... optimizing. If you know, when you write a query, that you only want (and expect) 1 record in the result set, then including TOP 1 in the SELECT clause can help the optimizer find the most efficient query plan possible.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply