November 14, 2012 at 9:57 am
One of our developer wrote a query yesterday that filled up the drive that tempdb lives on. I know this is a bad query but I don't know why. Here's the query, which is run on a SQL 2008 server. Any thoughts on why this is such an awful query would be greatly appreciated.
SELECT a.CustomerID, CustomerName, CustomerGroup
, a.BillingCode, a.RFAccount
, (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END) AS Type
, COUNT(DISTINCT a.EquipmentID), ISNULL(MAX(c.UseDate), MAX(d.UseDate))
FROM fmtEquipmentStatus a
LEFT OUTER JOIN fmtCustomer b ON (a.CustomerID = b.CustomerID)
LEFT OUTER JOIN fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)
LEFT OUTER JOIN fmtHeavyEquipmentUse d ON (a.CustomerID = d.CustomerID)
WHERE StatusEndDate IS NULL
AND b.CustomerActive = 'Y'
GROUP BY a.CustomerID, CustomerName, CustomerGroup
, a.BillingCode, a.RFAccount
, (CASE WHEN CustomerType = 'D' THEN 'Dept' ELSE 'Agency' END)
ORDER BY 1
Thanks!
Elizabeth
November 14, 2012 at 11:20 am
Please supply the execution plan for this exact query
"I know this is a bad query but I don't know why"
Press Ctrl-"M" before running the query, then when it completes you should have a detailed execution plan. If you go through it (even at a high level) you should see sections marked with percentages, this will at least let you know (and us) which portion of the query is taking up the most resources.
It will be easier to help with this
Thanks!
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
November 14, 2012 at 11:28 am
I've uploaded the execution plan. There's a Table spool that looks like it's returning over 14 million rows, which certainly is a problem. I just don't know how to change the query to avoid that and the Hash match which is also very costly.
November 14, 2012 at 11:36 am
I forgot to mention that this is an estimated execution plan, not the actual execution plan. It's on a production server, so I'm reluctant to run a query again that I know is bad. I'm just wanting to get more knowledge about how to improve the query.
November 14, 2012 at 11:45 am
ok, i see starting from right to left, when it gets to the right merge join, it's dealing with 14 trillion rows of data... a couple of those operations after the data is aggregated are using the Tablespool to sort the data, so that's a temp bloater and performance hit.
can you show the indexes on fmtEquipmentStatus fmtCustomer?
If there is not an index on CustomerID, CustomerName, CustomerGroup, i think that would help enormously.
Lowell
November 14, 2012 at 11:45 am
That's 14 billion rows in two table spools. That's a lot of work for TempDB.
I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.
Of lesser note:
Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.
There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?
November 14, 2012 at 11:52 am
There are no indexes on the Customer information, which is probably a major issue.
Indexes on fmtEquipmentStatus:
index_nameindex_descriptionindex_keys
fmtEquipmentStatus2clustered located on PRIMARYBillingCode
fmtEquipmentStatus3nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate
fmtEquipmentStatus4nonclustered located on PRIMARYEquipmentID, StatusEndDate, EquipmentRateCode, BillingCode
fmtEquipmentStatus5nonclustered located on PRIMARYEquipmentID, EquipmentClass, BillingCode, CustomerID, StatusEndDate, ContactName
IDX_EquipmentStatus1nonclustered located on PRIMARYEquipmentID, StatusStartDate, BillingCode, EquipmentRateCode, StatusEndDate
IDX_EquipmentStatus2nonclustered located on PRIMARYEquipmentID, StatusEndDate, StatusStartDate, MeterReading
IDX_EquipmentStatusEndDatenonclustered located on PRIMARYStatusEndDate
IDX_EquipmentStatusStartDatenonclustered located on PRIMARYStatusStartDate
IDX_fmtEquipmentStatus10nonclustered located on PRIMARYBillingCode, StatusStartDate, EquipmentID, StatusEndDate
IDX_fmtEquipmentStatus6nonclustered located on PRIMARYBillingCode, EquipmentRateCode, EquipmentID
IDX_fmtEquipmentStatus7nonclustered located on PRIMARYBillingCode, EquipmentID
IDX_fmtEquipmentStatus8nonclustered located on PRIMARYStatusEndDate, EquipmentID
IDX_fmtEquipmentStatus9nonclustered located on PRIMARYEquipmentID, StatusStartDate
PK_CustomerEquipmentnonclustered, unique, primary key located on PRIMARYStatusStartDate, EquipmentID
November 14, 2012 at 12:00 pm
Charles, what do you mean by pre aggregate?
I would try to pre aggregate the two UseDate columns on the customer ID (for heavy) and the customerID and BillingCode (for fleet). Hopefully this should get you joins to much smaller data sets.
Good point - I'll look at this
[
Also, I would make sure that each column has an alias. I certainly can't identify the table that "StatusEndDate" is coming from. The same issue exists for CustomerName, CustomerGroup, and CustomerType.
I don't believe there would be customers that don't exist in the customer table.
[
There is a left join to fmtCustomer but it's also referenced in the where clause. This is the same as an inner join. Would you really have customers that aren't in your customer table?
Thanks for the tips!
November 14, 2012 at 12:15 pm
I am a bit late to the game on this one and my comment is a little off topic but...you should not use ordinal position in an order by. You should instead reference the column by name. If the column order changes in the query you have to adjust the order by or your results will be out so sequence.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
November 14, 2012 at 12:17 pm
Thanks, Sean, good point.
November 14, 2012 at 12:24 pm
For Pre Aggregation try to aggregate separate tables in separate steps. For the query you have there are several options. You'll have to test each to see which one performs better.
Option 1: Subquery
...
FROM fmtEquipmentStatus a
LEFT JOIN (SELECT CustomerID,
BillingCode,
MAX(c.UseDate)
FROM fmtFleetEquipmentUse
GROUP BY CustomerID,
BillingCode) c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)
...
Option 2: Temp Table (no indexes)
SELECT CustomerID,
BillingCode,
MAX(c.UseDate)
FROM fmtFleetEquipmentUse
INTO #fmtFleetEquipmentUse
GROUP BY CustomerID,
BillingCode
...
FROM fmtEquipmentStatus a
LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)
...
Option 3: TempTable with clustered index
CREATE TABLE #fmtFleetEquipmentUse
(CustomerID...,
BillingCode...,
UseDate DATETIME,
PRIMARY KEY CLUSTERED (CustomerID,BillingCode))
INSERT INTO #fmtFleetEquipmentUse
SELECT CustomerID,
BillingCode,
MAX(c.UseDate)
FROM fmtFleetEquipmentUse
GROUP BY CustomerID,
BillingCode
...
FROM fmtEquipmentStatus a
LEFT JOIN #fmtFleetEquipmentUse c ON (a.CustomerID = c.CustomerID AND a.BillingCode = c.BillingCode)
...
November 14, 2012 at 12:46 pm
Wow, Charles, this is really helpful! I'll try it out and will post when I get the final answer.
Thanks, all for the tips!
November 14, 2012 at 1:55 pm
Another quick note on what Charles has offered:
Depending on how many rows these "mini" queries will yield and if you have a beefy server with ample RAM, consider using parameter tables. If this isn't really an option, consider defining your temporary tables beforehand as you will have better control over your table's definition prior to inserting the data, like NOT NULL, constraints, etc.
Just something to think about...
______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply