Why does this query use up 70G in tempdb

Question

Post reply

Why does this query use up 70G in tempdb

Elizabeth.Block

Hall of Fame

Points: 3516
More actions
November 14, 2012 at 9:57 am

#284945

One of our developer wrote a query yesterday that filled up the drive that tempdb lives on. I know this is a bad query but I don't know why. Here's the query, which is run on a SQL 2008 server. Any thoughts on why this is such an awful query would be greatly appreciated.
SELECT a.CustomerID, CustomerName, CustomerGroup
, a.BillingCode, a.RFAccount
, (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END) AS Type
, COUNT(DISTINCT a.EquipmentID), ISNULL(MAX(c.UseDate), MAX(d.UseDate))
FROM fmtEquipmentStatus a
LEFT OUTER JOIN fmtCustomer b ON (a.CustomerID = b.CustomerID)
LEFT OUTER JOIN fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)
LEFT OUTER JOIN fmtHeavyEquipmentUse d ON (a.CustomerID = d.CustomerID)
WHERE StatusEndDate IS NULL
AND b.CustomerActive = 'Y'
GROUP BY a.CustomerID, CustomerName, CustomerGroup
, a.BillingCode, a.RFAccount
, (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END)
ORDER BY 1
Thanks!
Elizabeth

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 1

Please supply the execution plan for this exact query

"I know this is a bad query but I don't know why"

Press Ctrl-"M" before running the query, then when it completes you should have a detailed execution plan. If you go through it (even at a high level) you should see sections marked with percentages, this will at least let you know (and us) which portion of the query is taking up the most resources.

It will be easier to help with this

Thanks!

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 2

I've uploaded the execution plan. There's a Table spool that looks like it's returning over 14 million rows, which certainly is a problem. I just don't know how to change the query to avoid that and the Hash match which is also very costly.

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 3

I forgot to mention that this is an estimated execution plan, not the actual execution plan. It's on a production server, so I'm reluctant to run a query again that I know is bad. I'm just wanting to get more knowledge about how to improve the query.

Lowell SSC Guru Points: 323518 More actions · Answer 4

ok, i see starting from right to left, when it gets to the right merge join, it's dealing with 14 trillion rows of data... a couple of those operations after the data is aggregated are using the Tablespool to sort the data, so that's a temp bloater and performance hit.

can you show the indexes on ~~fmtEquipmentStatus~~ fmtCustomer?

If there is not an index on CustomerID, CustomerName, CustomerGroup, i think that would help enormously.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Charles Hearn SSCrazy Points: 2680 More actions · Answer 5

That's 14 billion rows in two table spools. That's a lot of work for TempDB.

I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.

Of lesser note:

Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.

There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 6

There are no indexes on the Customer information, which is probably a major issue.

Indexes on fmtEquipmentStatus:

index_nameindex_descriptionindex_keys

fmtEquipmentStatus2clustered located on PRIMARYBillingCode

fmtEquipmentStatus3nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate

fmtEquipmentStatus4nonclustered located on PRIMARYEquipmentID, StatusEndDate, EquipmentRateCode, BillingCode

fmtEquipmentStatus5nonclustered located on PRIMARYEquipmentID, EquipmentClass, BillingCode, CustomerID, StatusEndDate, ContactName

IDX_EquipmentStatus1nonclustered located on PRIMARYEquipmentID, StatusStartDate, BillingCode, EquipmentRateCode, StatusEndDate

IDX_EquipmentStatus2nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate, MeterReading

IDX_EquipmentStatusEndDatenonclustered located on PRIMARYStatusEndDate

IDX_EquipmentStatusStartDatenonclustered located on PRIMARYStatusStartDate

IDX_fmtEquipmentStatus10nonclustered located on PRIMARYBillingCode, StatusStartDate, EquipmentID, StatusEndDate

IDX_fmtEquipmentStatus6nonclustered located on PRIMARYBillingCode, EquipmentRateCode, EquipmentID

IDX_fmtEquipmentStatus7nonclustered located on PRIMARYBillingCode, EquipmentID

IDX_fmtEquipmentStatus8nonclustered located on PRIMARYStatusEndDate, EquipmentID

IDX_fmtEquipmentStatus9nonclustered located on PRIMARYEquipmentID, StatusStartDate

PK_CustomerEquipmentnonclustered, unique, primary key located on PRIMARYStatusStartDate, EquipmentID

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 7

Charles, what do you mean by pre aggregate?

I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.

Good point - I'll look at this

[
Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.

I don't believe there would be customers that don't exist in the customer table.

[
There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?

Thanks for the tips!

Sean Lange SSC Guru Points: 286590 More actions · Answer 8

I am a bit late to the game on this one and my comment is a little off topic but...you should not use ordinal position in an order by. You should instead reference the column by name. If the column order changes in the query you have to adjust the order by or your results will be out so sequence.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 9

Elizabeth.Block

Hall of Fame

Points: 3516

November 14, 2012 at 12:17 pm

#1559075

Thanks, Sean, good point.

Charles Hearn SSCrazy Points: 2680 More actions · Answer 10

For Pre Aggregation try to aggregate separate tables in separate steps. For the query you have there are several options. You'll have to test each to see which one performs better.

Option 1: Subquery

...

FROM fmtEquipmentStatus a

LEFT JOIN (SELECT CustomerID,

BillingCode,

MAX(c.UseDate)

FROM fmtFleetEquipmentUse

GROUP BY CustomerID,

BillingCode) c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

...

Option 2: Temp Table (no indexes)

SELECT CustomerID,

BillingCode,

MAX(c.UseDate)

FROM fmtFleetEquipmentUse

INTO #fmtFleetEquipmentUse

GROUP BY CustomerID,

BillingCode

...

FROM fmtEquipmentStatus a

LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

...

Option 3: TempTable with clustered index

CREATE TABLE #fmtFleetEquipmentUse

(CustomerID...,

BillingCode...,

UseDate DATETIME,

PRIMARY KEY CLUSTERED (CustomerID,BillingCode))

INSERT INTO #fmtFleetEquipmentUse

SELECT CustomerID,

BillingCode,

MAX(c.UseDate)

FROM fmtFleetEquipmentUse

GROUP BY CustomerID,

BillingCode

...

FROM fmtEquipmentStatus a

LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)

...

Elizabeth.Block Hall of Fame Points: 3516 More actions · Answer 11

Wow, Charles, this is really helpful! I'll try it out and will post when I get the final answer.

Thanks, all for the tips!

MyDoggieJessie SSC-Forever Points: 44486 More actions · Answer 12

Another quick note on what Charles has offered:

Depending on how many rows these "mini" queries will yield and if you have a beefy server with ample RAM, consider using parameter tables. If this isn't really an option, consider defining your temporary tables beforehand as you will have better control over your table's definition prior to inserting the data, like NOT NULL, constraints, etc.

Just something to think about...

______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience