Why does this query use up 70G in tempdb

  • One of our developer wrote a query yesterday that filled up the drive that tempdb lives on. I know this is a bad query but I don't know why. Here's the query, which is run on a SQL 2008 server. Any thoughts on why this is such an awful query would be greatly appreciated.

    SELECT a.CustomerID, CustomerName, CustomerGroup

    , a.BillingCode, a.RFAccount

    , (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END) AS Type

    , COUNT(DISTINCT a.EquipmentID), ISNULL(MAX(c.UseDate), MAX(d.UseDate))

    FROM fmtEquipmentStatus a

    LEFT OUTER JOIN fmtCustomer b ON (a.CustomerID = b.CustomerID)

    LEFT OUTER JOIN fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

    LEFT OUTER JOIN fmtHeavyEquipmentUse d ON (a.CustomerID = d.CustomerID)

    WHERE StatusEndDate IS NULL

    AND b.CustomerActive = 'Y'

    GROUP BY a.CustomerID, CustomerName, CustomerGroup

    , a.BillingCode, a.RFAccount

    , (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END)

    ORDER BY 1

    Thanks!

    Elizabeth

  • Please supply the execution plan for this exact query

    "I know this is a bad query but I don't know why"

    Press Ctrl-"M" before running the query, then when it completes you should have a detailed execution plan. If you go through it (even at a high level) you should see sections marked with percentages, this will at least let you know (and us) which portion of the query is taking up the most resources.

    It will be easier to help with this

    Thanks!

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • I've uploaded the execution plan. There's a Table spool that looks like it's returning over 14 million rows, which certainly is a problem. I just don't know how to change the query to avoid that and the Hash match which is also very costly.

  • I forgot to mention that this is an estimated execution plan, not the actual execution plan. It's on a production server, so I'm reluctant to run a query again that I know is bad. I'm just wanting to get more knowledge about how to improve the query.

  • ok, i see starting from right to left, when it gets to the right merge join, it's dealing with 14 trillion rows of data... a couple of those operations after the data is aggregated are using the Tablespool to sort the data, so that's a temp bloater and performance hit.

    can you show the indexes on fmtEquipmentStatus fmtCustomer?

    If there is not an index on CustomerID, CustomerName, CustomerGroup, i think that would help enormously.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • That's 14 billion rows in two table spools. That's a lot of work for TempDB.

    I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.

    Of lesser note:

    Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.

    There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?

  • There are no indexes on the Customer information, which is probably a major issue.

    Indexes on fmtEquipmentStatus:

    index_nameindex_descriptionindex_keys

    fmtEquipmentStatus2clustered located on PRIMARYBillingCode

    fmtEquipmentStatus3nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate

    fmtEquipmentStatus4nonclustered located on PRIMARYEquipmentID, StatusEndDate, EquipmentRateCode, BillingCode

    fmtEquipmentStatus5nonclustered located on PRIMARYEquipmentID, EquipmentClass, BillingCode, CustomerID, StatusEndDate, ContactName

    IDX_EquipmentStatus1nonclustered located on PRIMARYEquipmentID, StatusStartDate, BillingCode, EquipmentRateCode, StatusEndDate

    IDX_EquipmentStatus2nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate, MeterReading

    IDX_EquipmentStatusEndDatenonclustered located on PRIMARYStatusEndDate

    IDX_EquipmentStatusStartDatenonclustered located on PRIMARYStatusStartDate

    IDX_fmtEquipmentStatus10nonclustered located on PRIMARYBillingCode, StatusStartDate, EquipmentID, StatusEndDate

    IDX_fmtEquipmentStatus6nonclustered located on PRIMARYBillingCode, EquipmentRateCode, EquipmentID

    IDX_fmtEquipmentStatus7nonclustered located on PRIMARYBillingCode, EquipmentID

    IDX_fmtEquipmentStatus8nonclustered located on PRIMARYStatusEndDate, EquipmentID

    IDX_fmtEquipmentStatus9nonclustered located on PRIMARYEquipmentID, StatusStartDate

    PK_CustomerEquipmentnonclustered, unique, primary key located on PRIMARYStatusStartDate, EquipmentID

  • Charles, what do you mean by pre aggregate?

    I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.

    Good point - I'll look at this

    [

    Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.

    I don't believe there would be customers that don't exist in the customer table.

    [

    There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?

    Thanks for the tips!

  • I am a bit late to the game on this one and my comment is a little off topic but...you should not use ordinal position in an order by. You should instead reference the column by name. If the column order changes in the query you have to adjust the order by or your results will be out so sequence.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks, Sean, good point.

  • For Pre Aggregation try to aggregate separate tables in separate steps. For the query you have there are several options. You'll have to test each to see which one performs better.

    Option 1: Subquery

    ...

    FROM fmtEquipmentStatus a

    LEFT JOIN (SELECT CustomerID,

    BillingCode,

    MAX(c.UseDate)

    FROM fmtFleetEquipmentUse

    GROUP BY CustomerID,

    BillingCode) c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

    ...

    Option 2: Temp Table (no indexes)

    SELECT CustomerID,

    BillingCode,

    MAX(c.UseDate)

    FROM fmtFleetEquipmentUse

    INTO #fmtFleetEquipmentUse

    GROUP BY CustomerID,

    BillingCode

    ...

    FROM fmtEquipmentStatus a

    LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

    ...

    Option 3: TempTable with clustered index

    CREATE TABLE #fmtFleetEquipmentUse

    (CustomerID...,

    BillingCode...,

    UseDate DATETIME,

    PRIMARY KEY CLUSTERED (CustomerID,BillingCode))

    INSERT INTO #fmtFleetEquipmentUse

    SELECT CustomerID,

    BillingCode,

    MAX(c.UseDate)

    FROM fmtFleetEquipmentUse

    GROUP BY CustomerID,

    BillingCode

    ...

    FROM fmtEquipmentStatus a

    LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

    ...

  • Wow, Charles, this is really helpful! I'll try it out and will post when I get the final answer.

    Thanks, all for the tips!

  • Another quick note on what Charles has offered:

    Depending on how many rows these "mini" queries will yield and if you have a beefy server with ample RAM, consider using parameter tables. If this isn't really an option, consider defining your temporary tables beforehand as you will have better control over your table's definition prior to inserting the data, like NOT NULL, constraints, etc.

    Just something to think about...

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply