September 14, 2012 at 10:13 am
This query takes about 45 seconds to complete (down from 10+ minutes by adding clustered index to table). Looking at the query plan, the largest cost is a clustered index seek -- which, as I understand it, is what I'm supposed to be getting rather than table or index scans.
What other indexes may be missing to speed this up?
I know it's a big topic, where should I start reading for more information on how to read & understand a query plan and tune a query.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT DISTINCT vwea.FranchiseNumber,
vwea.CustomerID,
vwea.EmailBefore,
vwea.EmailAfter,
vwea.ChangeDate,
j.ScheduledEndDate AS MoveDate,
CASE
WHEN j.ScheduledEndDate < vwea.ChangeDate THEN 'After Move Change'
ELSE 'Before move change'
END AS Change
FROM dbo.tblJob j
INNER JOIN dbo.tblWorkOrder wo ON j.DatabaseInfoID = wo.DatabaseInfoID
AND j.WorkOrderID = wo.WorkOrderID
INNER JOIN dbo.EmailAudit vwea ON wo.CustomerID = vwea.CustomerID
INNER JOIN dbo.tblDatabaseInfo ON j.DatabaseInfoID = dbo.tblDatabaseInfo.DatabaseInfoID
INNER JOIN dbo.tblFranchise ON j.DatabaseInfoID = dbo.tblFranchise.DatabaseInfoID
AND vwea.FranchiseNumber = dbo.tblFranchise.FranchiseNumber
WHERE (vwea.FranchiseNumber = 253)
AND j.ScheduledEndDate < vwea.ChangeDate
ORDER BY vwea.ChangeDate DESC;
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO
Thanks,
Rob
(Query plan attached)
September 14, 2012 at 11:53 am
Looking at this, the first thing I'd suggest is you update your statistics. They're pretty badly out of wack. That might be why you're getting such an expensive plan. You're processing huge amounts of rows to only return 144. Also, I only see five tables the query, but, quick count, eight access points in the plan. Are you dealing with views or nested views here? That could also be problematic.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 14, 2012 at 1:03 pm
I'm updating statistics on those tables right now; dbo.EmailAudit is a view. I'll look into that.
Thanks,
Rob
September 14, 2012 at 1:34 pm
The simplification step in the optimization process can clean up issues with views, but not always and not always all the way. Depending on what's there, you might be better off accessing those tables more directly through this query.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 16, 2012 at 7:45 pm
I did an update of the statistics on all of the related tables. That didn't change the execution time.
The view ([font="Courier New"]EmailAudit[/font]) duplicates many of the tables in the original query. Just running the view takes about 25 seconds.
The view definition is:
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
--CREATE VIEW [dbo].[EmailAudit]
--AS
SELECT COALESCE(f.FranchiseNumber, di.FranchiseNumber) AS FranchiseNumber,
a.[CustomerID],
a.[EmailAddress] AS [EmailBefore],
c.EmailAddress AS [EmailAfter],
a.[ChangeDate]
FROM [dbo].[tblEmailAudit] a
INNER JOIN dbo.tblCustomer c ON C.DatabaseInfoID = a.DatabaseInfoID
AND c.CustomerID = a.CustomerID
AND ISNULL(c.FranUniqueID, c.DatabaseInfoID) = ISNULL(a.FranUniqueID, a.DatabaseInfoID)
LEFT JOIN dbo.tblFranchise f ON a.DatabaseInfoID = f.DatabaseInfoID
AND ISNULL(f.FranUniqueID, f.DatabaseInfoID) = ISNULL(a.FranUniqueID, a.DatabaseInfoID)
LEFT JOIN dbo.tblDatabaseInfo di ON a.DatabaseInfoID = di.DatabaseInfoID
WHERE ISNULL(a.EmailAddress,'') <> ISNULL(c.EmailAddress,'');
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO
So I rewrote the original query removing the view and I'm still getting about the same time (40-45 seconds).
-- original query rewritten to remove EmailAudit view --
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT DISTINCT COALESCE(f.FranchiseNumber, di.FranchiseNumber) AS FranchiseNumber,
a.CustomerID,
a.EmailAddress AS EmailBefore,
c.EmailAddress AS EmailAfter,
a.ChangeDate,
j.ScheduledEndDate AS MoveDate,
CASE
WHEN j.ScheduledEndDate < a.ChangeDate THEN 'After Move Change'
ELSE 'Before move change'
END AS Change
FROM dbo.tblEmailAudit a
INNER JOIN dbo.tblCustomer c ON c.CustomerID = a.CustomerID
AND c.DatabaseInfoID = a.DatabaseInfoID
INNER JOIN dbo.tblWorkOrder wo ON c.CustomerId = wo.CustomerId
AND wo.DatabaseInfoId = a.DatabaseInfoId
INNER JOIN dbo.tblJob j ON j.WorkOrderID = wo.WorkOrderID
AND j.DatabaseInfoID = a.DatabaseInfoID
INNER JOIN dbo.tblDatabaseInfo di ON di.DatabaseInfoID = a.DatabaseInfoID
LEFT JOIN dbo.tblFranchise f ON a.DatabaseInfoID = f.DatabaseInfoID
AND ISNULL(f.FranUniqueID, f.DatabaseInfoID) = ISNULL(a.FranUniqueID, a.DatabaseInfoID)
WHERE (COALESCE(f.FranchiseNumber, di.FranchiseNumber) = 253)
AND j.ScheduledEndDate < a.ChangeDate
AND ISNULL(a.EmailAddress,'') <> ISNULL(c.EmailAddress,'')
ORDER BY a.ChangeDate DESC;
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO
The two biggest costs in the query plan are both index seeks (one clustered and the other non-clustered). The third largest cost is an index scan on dbo.tblJob.
September 16, 2012 at 8:19 pm
rgtft (9/16/2012)
The two biggest costs in the query plan are both index seeks (one clustered and the other non-clustered). The third largest cost is an index scan on dbo.tblJob.
There's really only one way for the seeks to be taking longer than a scan... the seeks are being repeated. Look at the properties of the seeks and see that they are executed many times. A better index capable of doing an initial seek and a "range scan" following that would be a tremendous amount faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2012 at 9:17 pm
Silly question, how do I read the execution plan to know what columns to index?
I created two indexes (just eyeballed query and guessed). The first one seems to have improved the query, but the second one (though being used) still doesn't seem to improve anything.
-- Two indexes created --
CREATE NONCLUSTERED INDEX [IX_tblCustomer_CustomerId_EmailAddress] ON [dbo].[tblCustomer]
(
[CustomerID] ASC,
[EmailAddress] ASC
);
CREATE NONCLUSTERED INDEX [IX_tboWorkOrder_CustomerID_DatabaseInfoId] ON [dbo].[tblWorkOrder]
(
[CustomerID] ASC,
[DatabaseInfoId] ASC
);
Thanks,
Rob
September 16, 2012 at 10:42 pm
I see a couple of websites that explain some of this:
I also see that Grant has a book on execution plans:
Rob
September 17, 2012 at 5:53 am
ISNULL and COALESCE on columns in the where clause are going to hurt performance. I still think your stats are off. I'm seeing estimated 4.5 million, actual 15,000. That's a pretty big disparity and is usually indicative of out of date or badly sampled statistics.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 17, 2012 at 6:41 am
Grant Fritchey (9/17/2012)
ISNULL and COALESCE on columns in the where clause are going to hurt performance....
This should be close to a SARGable version of the query:
SELECT DISTINCT COALESCE(f.FranchiseNumber, di.FranchiseNumber) AS FranchiseNumber,
a.CustomerID,
a.EmailAddress AS EmailBefore,
c.EmailAddress AS EmailAfter,
a.ChangeDate,
j.ScheduledEndDate AS MoveDate,
CASE
WHEN j.ScheduledEndDate < a.ChangeDate THEN 'After Move Change'
ELSE 'Before move change'
END AS Change
FROM dbo.tblEmailAudit a
INNER JOIN dbo.tblCustomer c
ON c.CustomerID = a.CustomerID
AND c.DatabaseInfoID = a.DatabaseInfoID
and (
c.EmailAddress <> a.EmailAddress
OR (c.EmailAddress IS NULL AND a.EmailAddress IS NOT NULL)
OR (c.EmailAddress IS NOT NULL AND a.EmailAddress IS NULL)
)
INNER JOIN dbo.tblWorkOrder wo
ON c.CustomerId = wo.CustomerId
AND wo.DatabaseInfoId = a.DatabaseInfoId
INNER JOIN dbo.tblJob j
ON j.WorkOrderID = wo.WorkOrderID
AND j.DatabaseInfoID = a.DatabaseInfoID
LEFT JOIN dbo.tblDatabaseInfo di
ON di.DatabaseInfoID = a.DatabaseInfoID
AND di.FranchiseNumber = 253
LEFT JOIN dbo.tblFranchise f
ON (f.FranUniqueID = a.FranUniqueID OR f.DatabaseInfoID = a.DatabaseInfoID)
--a.DatabaseInfoID = f.DatabaseInfoID
AND f.FranchiseNumber = 253
--AND ISNULL(f.FranUniqueID, f.DatabaseInfoID) = ISNULL(a.FranUniqueID, a.DatabaseInfoID)
WHERE 1 = 1
--(COALESCE(f.FranchiseNumber, di.FranchiseNumber) = 253)
AND j.ScheduledEndDate < a.ChangeDate
--AND ISNULL(a.EmailAddress,'') <> ISNULL(c.EmailAddress,'')
ORDER BY a.ChangeDate DESC;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 17, 2012 at 7:01 am
Grant Fritchey (9/17/2012)
ISNULL and COALESCE on columns in the where clause are going to hurt performance. I still think your stats are off. I'm seeing estimated 4.5 million, actual 15,000. That's a pretty big disparity and is usually indicative of out of date or badly sampled statistics.
I manually ran (below) on all of the associated tables:
UPDATE STATISTICS dbo.tblCustomers WITH FULLSCAN;
As I understand it, without specifying an index or statistic, it should update all of them on the table. Or am I off here?
Thanks,
Rob
September 17, 2012 at 10:49 am
ChrisM@Work (9/17/2012)
Grant Fritchey (9/17/2012)
ISNULL and COALESCE on columns in the where clause are going to hurt performance....This should be close to a SARGable version of the query:
Chris M:
Your code is SARGable and faster (aprox. 9 seconds); the only issue is it's bringing back a few thousand records rather than the 144 I expected. I'll look into your code as to why.
Thanks,
Rob
September 18, 2012 at 2:12 am
rgtft (9/17/2012)
ChrisM@Work (9/17/2012)
Grant Fritchey (9/17/2012)
ISNULL and COALESCE on columns in the where clause are going to hurt performance....This should be close to a SARGable version of the query:
Chris M:
Your code is SARGable and faster (aprox. 9 seconds); the only issue is it's bringing back a few thousand records rather than the 144 I expected. I'll look into your code as to why.
Thanks,
Rob
Hi Rob
The join on dbo.tblFranchise is complex - I've set it up as a left join for safety. The code that this was cloned from - your last code iteration - showed it as a left join, however it was referenced in the WHERE clause, which would usually flag it up as an inner join. Try it with an inner join - but I suspect it will be a little more complex than that. Returning all the values from dbo.tblFranchise in the result set will help you decide how to filter.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply