Why do these similar queries run at such a different speed?

  • The fact that you need a DISTINCT operation in there suggests you might have a structural or data problem you need to concentrate on first.

    Also, why are you returning 50000 rows? Is this a data load process? If it's a report, human beings don't read that kind of data, ever.

    As to the queries & plans, the first thing I'm seeing is the estimated/actual disparity. Estimated 5k and actual 60k... that's a problem. It gets worse further in. That suggests your stats are out of date. You may also be benefitting from parallel execution. The estimated cost on the fast query is 19 while the estimated cost on the slow query is 3.

    But the biggest problem with the slow query is that it timed out when coming up with an execution plan. That means the plan you're working with is a guess. Only way around that is to reduce the complexity of the query in order to make it easier to come up with a viable execution plan. If you really need to perform the DISTINCT aggregation maybe building a materialized view there will help.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Im currently looking into the temp table suggestion, and from a quick test i looks like it will increase performance from 2,5 mins to merly seconds. So thank you very much for that one!

    As for why i am returning 50000 rows? Well, basicly this is a query that will extract a grouped set of data from a large fact table. This dataset is inputed in a temp table for further processing later in the procedure.

    It is not for a user report.

    I am really glad for all your help so far guys! I keep getting smarter with every visit.

  • Grant Fritchey (2/7/2012)


    The fact that you need a DISTINCT operation in there suggests you might have a structural or data problem you need to concentrate on first.

    The reason for the inner query with distinct operation is because i have one large fact-table "Postings"

    This table has multiple dimension-key columns, a handled column and an amount column.

    Handled is a bit value represtenting whetere this row has been "handled". And by handled i mean processed by the procedure im writing.

    So what i want is a grouped sum of amounts for the columns specified in my query. But i only want the groups that have one or more "unhandled" rows. But if i group has 1 unhandled row. I need all the rows belonging to this group summed up.

    You might have seen that the selection of distinct columns is the same as the group by columns in the outer query. I could have (maybe should have) used a group by in the inner query, but i did distinct this time. The rows will be the same anyway)

    So thats why i use a distinct on the inner query. Then i select only the distinct groups that have unhandled rows. and join it against the Postings table to sum up all the amounts of these groups.

    Hope that explains why. And if you have a better suggestion, then please enlighten me 🙂

  • GROUP BY and DISTINCT are both aggregate operations, so you might not have saved anything there.

    In general, check the SELECT properties. If you see a timeout as the reason for early termination, the plan is of limited utility from that point forward.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sorry, I read through your explanation of why you have to use a DISTINCT several times and I'm not getting it. Probably more than usually thick this morning. Sorry.

    Could be you're using it entirely correctly. I just point it out because DISTINCT is very much a code smell indicating structural issues that are being compensated through the use of DISTINCT.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • English is not my first language, so i might not get my point across well.

    For simplicity, lets say i only grouped by two columns in the query. Account_FK and Company_FK

    In my inner select i simply ask:

    Please return to me all distinct account_FK and company_fk that have unhandled rows.

    This because account_fk 1 and company_fk 10 might have multiple rows that are unhandled. But i dont need them all. I just need to know that account 1 and company 10 has unhandled rows.

    Then i can in the main query use this list of "unhandled groups" to sum up all rows (both handled and unhandled) of the "unhandled groups"

    Did that make sense?

  • erikpoker (2/7/2012)


    English is not my first language, so i might not get my point across well.

    For simplicity, lets say i only grouped by two columns in the query. Account_FK and Company_FK

    In my inner select i simply ask:

    Please return to me all distinct account_FK and company_fk that have unhandled rows.

    This because account_fk 1 and company_fk 10 might have multiple rows that are unhandled. But i dont need them all. I just need to know that account 1 and company 10 has unhandled rows.

    Then i can in the main query use this list of "unhandled groups" to sum up all rows (both handled and unhandled) of the "unhandled groups"

    Did that make sense?

    I think so, but it doesn't explain why you need the DISTINCT. I mean, you do from the earlier post that showed how the data changed when you removed it. It's needed. But, more than anything, that comes back to suggesting there's something off about the design or the data. I just want to point it out because that bit of aggregation is clearly having an affect on your queries, so the possibility of removing it is a good candidate for helping to tune the queries.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • erikpoker (2/7/2012)


    English is not my first language, so i might not get my point across well.

    For simplicity, lets say i only grouped by two columns in the query. Account_FK and Company_FK

    In my inner select i simply ask:

    Please return to me all distinct account_FK and company_fk that have unhandled rows.

    This because account_fk 1 and company_fk 10 might have multiple rows that are unhandled. But i dont need them all. I just need to know that account 1 and company 10 has unhandled rows.

    Then i can in the main query use this list of "unhandled groups" to sum up all rows (both handled and unhandled) of the "unhandled groups"

    Did that make sense?

    Hmmm, from your description and without knowing the DDL of the tables or the business logic I'm wondering if this gives you the same result: -

    SELECT SourceSystem_FK, TransactionVersion_FK, TransactionType_FK, Company_FK, Department_FK,

    Account_FK, Currency_FK, Supplier_FK, Customer_FK, Vessel_FK, CounterpartCompany_FK, SFICode_FK,

    Project_FK, ProductGroupGeoTeam_FK, CurrencyOriginal_FK, DynamicsDocumentType_FK, StaffCategory_FK,

    Year, Month, SUM(f.Amount)

    FROM Fact2012Base.Posting f

    CROSS APPLY (SELECT TOP 1 SourceSystem_FK, innerP.TransactionVersion_FK, TransactionType_FK, Company_FK, Department_FK, Account_FK, Currency_FK,

    Supplier_FK, Customer_FK, Vessel_FK, CounterpartCompany_FK, SFICode_FK, Project_FK, ProductGroupGeoTeam_FK, CurrencyOriginal_FK,

    DynamicsDocumentType_FK, StaffCategory_FK, Year, Month

    FROM Fact2012Base.Posting innerP

    INNER JOIN DimensionBase.[Transaction] tr ON tr.TransactionVersion_FK = innerP.TransactionVersion_FK

    AND tr.TransactionType_FK = @TransactionType_FK

    WHERE innerP.Handled = 0

    AND SourceSystem_FK = f.SourceSystem_FK AND TransactionVersion_FK = f.TransactionVersion_FK

    AND Year = f.Year AND Month = f.Month AND Company_FK = f.Company_FK AND Department_FK = f.Department_FK

    AND Account_FK = f.Account_FK AND Currency_FK = f.Currency_FK

    AND (Supplier_FK = f.supplier_FK OR (Supplier_FK IS NULL AND f.Supplier_FK IS NULL))

    AND (Customer_FK = f.Customer_FK OR (Customer_FK IS NULL AND f.Customer_FK IS NULL))

    AND (Vessel_FK = f.Vessel_FK OR (Vessel_FK IS NULL AND f.Vessel_FK IS NULL))

    AND (CounterpartCompany_FK = f.CounterpartCompany_FK OR (CounterpartCompany_FK IS NULL AND f.CounterpartCompany_FK IS NULL))

    AND (SFICode_FK = f.SFICode_FK OR (SFICode_FK IS NULL AND f.SFICode_FK IS NULL))

    AND (Project_FK = f.Project_FK OR (Project_FK IS NULL AND f.Project_FK IS NULL))

    AND (ProductGroupGeoteam_FK = f.ProductGroupGeoteam_FK OR (ProductGroupGeoteam_FK IS NULL AND f.ProductGroupGeoteam_FK IS NULL))

    AND (CurrencyOriginal_FK = f.CurrencyOriginal_FK OR (CurrencyOriginal_FK IS NULL AND f.CurrencyOriginal_FK IS NULL))

    AND (DynamicsDocumentType_FK = f.DynamicsDocumentType_FK OR (DynamicsDocumentType_FK IS NULL AND f.DynamicsDocumentType_FK IS NULL))

    AND (StaffCategory_FK = f.StaffCategory_FK OR (StaffCategory_FK IS NULL AND f.StaffCategory_FK IS NULL))) b

    WHERE Year = @Year AND SourceSystem_FK = @SourceSystem_FK

    GROUP BY SourceSystem_FK, TransactionVersion_FK, TransactionType_FK, Company_FK, Department_FK,

    Account_FK, Currency_FK, Supplier_FK, Customer_FK, Vessel_FK, CounterpartCompany_FK, SFICode_FK,

    Project_FK, ProductGroupGeoTeam_FK, CurrencyOriginal_FK, DynamicsDocumentType_FK, StaffCategory_FK,

    Year, Month


    --edit--

    Howard is right I think, should be an exists really.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • It's hard to get my head around the logic without really digging into it, but I'm pretty sure that you could get the effect you're after without the join to the distinct derived table by just joining to the base tables and using an EXISTS in your where clause to identify any of the groups that have rows that are unhandled.

  • Cadavres suggestion returns the same amount of rows as the original query, but running at a speed of aproximatly 15 seconds. I have not double checked the summed amounts so do not know if they really deliver the same result, but i looks good so far.

    Using exists could possible shave some more seconds off, but i am still leaning towards the temp-table suggestion. That one runs at 500-900ms so thats considerable faster.

    For my usage that is a perfectly acceptable speed, so im content to stop at this point.

    I am overwelmed by the amount of helpful people in this forum. Thanks a lot guys! And a special thanks to cadavre for his multiple replies and follow ups!

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply