Clustered Index Seek is limiting factor in query

  • I'm having issues tuning a query and am hoping for some help. The below query takes between 35 and 40 seconds to run. The biggest cost (according to query plan) is an clustered index seek on the Account table. This seek is using 40% of total cost. If I remove the join to Account the query takes about 15 seconds (still not ideal, but a big improvement).

    Each of the three tables are fairly large - Prospect: 4.1 million rows, Contact: 4.3 million rows, Account: 2.1 million rows. This query returns 6330 total rows.

    SELECT Prospect.Email

    , Contact1.MemberId

    , Contact2.FirstName, Contact2.LastName, Contact2.Email, Contact2.MMS_Member_ID__c

    , Account.Email

    FROM Prospect

    INNER JOIN Contact Contact1 ON Contact1.Id = Prospect.ReferringContact -- Contact1.Id is PK

    INNER JOIN Contact Contact2 ON Contact2.AccountId = Prospect.MembershipId

    AND Contact2.Type = 'Primary' AND Contact2.Status = 'Active'

    INNER JOIN Account ON Account.Id = Contact2.AccountId -- Account.Id is PK

    WHERE Prospect.Status = 'Active' AND Contact2.Date >= dbo.floorDate(DATEADD(d,-30,GETDATE()))

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Contact'. Scan count 2, logical reads 25406, physical reads 0, read-ahead reads 84, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Account'. Scan count 0, logical reads 943994, physical reads 44, read-ahead reads 60011, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Prospect__c'. Scan count 1, logical reads 9154, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Rows Executes StmtText StmtId NodeId Parent PhysicalOp LogicalOp Argument DefinedValues EstimateRows EstimateIO EstimateCPU AvgRowSize TotalSubtreeCost OutputList Warnings Type Parallel EstimateExecutions

    -------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ----------- ----------- ------------------------------ ------------------------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------- ------------- ------------- ----------- ---------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ---------------------------------------------------------------- -------- ------------------

    6330 1 |--Hash Match(Inner Join, HASH: ([Account].[Id])=([Contact2].[AccountId]), RESIDUAL: ([Contact].[AccountId] as [Contact2].[AccountId]=[Account].[Id])) 1 2 1 Hash Match Inner Join HASH: ([Account].[Id])=([Contact2].[AccountId]), RESIDUAL: ([Contact].[AccountId] as [Contact2].[AccountId]=[Account].[Id]) NULL 148543.2 0 7.838935 385 240.6476 [Prospect].[Email__c], [Contact2].[LastName], [Contact2].[FirstName], [Contact2].[Email], [Contact2].[MMS_Member_ID__c], [Account].[Email__c] NULL PLAN_ROW 0 1

    160446 1 |--Nested Loops(Inner Join, OUTER REFERENCES: ([Prospect].[MembershipId], [Expr1011]) WITH UNORDERED PREFETCH) 1 3 2 Nested Loops Inner Join OUTER REFERENCES: ([Prospect].[MembershipId], [Expr1011]) WITH UNORDERED PREFETCH NULL 45689.16 0 0.7157868 191 222.3587 [Prospect].[Email__c], [Account].[Id], [Account].[Email__c] NULL PLAN_ROW 0 1

    160465 1 | |--Sort(ORDER BY: ([Prospect].[MembershipId] ASC)) 1 5 3 Sort Sort ORDER BY: ([Prospect].[MembershipId] ASC) NULL 171240.9 0.01126126 13.66516 109 125.8857 [Prospect].[Email__c], [Prospect].[MembershipId] NULL PLAN_ROW 0 1

    160465 1 | | |--Hash Match(Inner Join, HASH: ([Prospect].[ReferringContact])=([Contact1].[Id]), RESIDUAL: ([Prospect].[ReferringContact]=[Contact].[Id] as [Contact1].[Id])) 1 6 5 Hash Match Inner Join HASH: ([Prospect].[ReferringContact])=([Contact1].[Id]), RESIDUAL: ([Prospect].[ReferringContact]=[Contact].[Id] as [Contact1].[Id]) NULL 171240.9 0 82.02168 109 112.2093 [Prospect].[Email__c], [Prospect].[MembershipId] NULL PLAN_ROW 0 1

    572495 1 | | |--Index Seek(OBJECT: ([Prospect].[IX_Prospect_1]), SEEK: ([Prospect].[Status]=N'Member') ORDERED FORWARD) 1 7 6 Index Seek Index Seek OBJECT: ([Prospect].[IX_Prospect_1]), SEEK: ([Prospect].[Status]=N'Member') ORDERED FORWARD [Prospect].[Email__c], [Prospect].[ReferringContact], [Prospect].[MembershipId] 596779.3 6.492665 0.6566142 127 7.149279 [Prospect].[Email__c], [Prospect].[ReferringContact], [Prospect].[MembershipId] NULL PLAN_ROW 0 1

    4380014 1 | | |--Index Scan(OBJECT: ([Contact].[IX_Contact_2] AS [Contact1])) 1 8 6 Index Scan Index Scan OBJECT: ([Contact].[IX_Contact_2] AS [Contact1]) [Contact1].[Id] 4380005 18.22016 4.818162 25 23.03832 [Contact1].[Id] NULL PLAN_ROW 0 1

    160446 160465 | |--Clustered Index Seek(OBJECT: ([Account].[PK_Account]), SEEK: ([Account].[Id]=[Prospect].[MembershipId]) ORDERED FORWARD) 1 10 3 Clustered Index Seek Clustered Index Seek OBJECT: ([Account].[PK_Account]), SEEK: ([Account].[Id]=[Prospect].[MembershipId]) ORDERED FORWARD [Account].[Id], [Account].[Email__c] 1 0.003125 0.0001581 109 95.75723 [Account].[Id], [Account].[Email__c] NULL PLAN_ROW 0 171240.9

    20892 1 |--Index Seek(OBJECT: ([Contact].[IX_Contact_1] AS [Contact2]), SEEK: ([Contact2].[Type]=N'Primary' AND ([Contact2].[Date], [Contact2].[Member_Status__c]) >= ([egroup].[dbo].[floorDate](date 1 11 2 Index Seek Index Seek OBJECT: ([Contact].[IX_Contact_1] AS [Contact2]), SEEK: ([Contact2].[Type]=N'Primary' AND ([Contact2].[Date], [Contact2].[Member_Status__c]) >= ([egroup].[dbo].[floorDate](dateadd(day,(-30),getdate [Contact2].[AccountId], [Contact2].[LastName], [Contact2].[FirstName], [Contact2].[Email], [Contact2].[MMS_Member_ID__c], [Contact2].[Member_Status__c] 248342.8 9.434977 0.7066621 255 10.14164 [Contact2].[AccountId], [Contact2].[LastName], [Contact2].[FirstName], [Contact2].[Email], [Contact2].[MMS_Member_ID__c], [Contact2].[Member_Status__c] NULL PLAN_ROW 0 1

    Looking at the statistics profile, it's obvious that the 160k executes against Account is bad. I am unsure of why this is happening. Any ideas?

  • dingels35 (9/1/2010)


    Looking at the statistics profile, it's obvious that the 160k executes against Account is bad. I am unsure of why this is happening. Any ideas?

    I am much better on reading the sqlplan than showplan_text, but it seems to me that the clustered index seek is really a BOOKMARK LOOKUP or KEY LOOKUP in disguise. When I try top read the text plan, I get the feeling that the email column in Account table is missing in same index as used when picking the other columns in the query from Account table.

    I am sure GilaMonster have better explanation about this.


    N 56°04'39.16"
    E 12°55'05.25"

  • Also, don't use functions in WHERE clauses. The function is evaluated for every row!

    Replace "dbo.floorDate(DATEADD(d,-30,GETDATE()))"

    with

    DATEADD(DAY, DATEDIFF(DAY, 30, GETDATE()), 0)

    This expression is only evaluated once in the query.


    N 56°04'39.16"
    E 12°55'05.25"

  • Please post table definitions, index definitions and execution plan (a .sqlplan file) as per http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Definitions and queryplan are attached.

  • Three suggestions.

    1) Alter existing index

    CREATE NONCLUSTERED INDEX [IX_Contact_1] ON [sforce].[Contact]

    (

    [Join_Date__c] ASC,

    [Member_Type__c] ASC,

    [Member_Status__c] ASC,

    [AccountId] ASC

    ) INCLUDE ( [LastName], [FirstName], [Email], [MMS_Member_ID__c] )

    2) Create new index

    CREATE NONCLUSTERED INDEX [IX_Account_2] ON [sforce].[Account]

    (

    [Id] ASC

    ) include (Email__c)

    3) Edit query

    SELECTReferreeContact.FirstName,

    ReferreeContact.LastName,

    ReferreeContact.Email AS referree_email,

    Account.Email__c AS account_email,

    Prospect__c.Email__c AS prospect_email,

    ReferreeContact.MMS_Member_ID__c

    FROMLTF.sforce.Prospect__c

    -- No need for following JOIN if there is a FK constraint between the two tables

    --INNER JOINltf.sforce.Contact AS ReferrerContact ON ReferrerContact.Id = Prospect__c.Referring_Member__c

    INNER JOINltf.sforce.Contact AS ReferreeContact ON ReferreeContact.AccountId = Prospect__c.Membership_ID__c

    AND ReferreeContact.Member_Type__c = 'Primary'

    AND ReferreeContact.Member_Status__c = 'Active'

    AND ReferreeContact.Join_Date__c >= DATEADD(DAY, DATEDIFF(DAY, 30, GETDATE()), 0)

    INNER JOINltf.sforce.Account ON Account.Id = Prospect__c.Membership_ID__c

    WHEREProspect_Status__c = 'Member'


    N 56°04'39.16"
    E 12°55'05.25"

  • Your estimated & actuals are a bit off, especially at the end of the query. Are your statistics up to date?

    The amount of data you're moving seems to be leading to the optimizer choosing to use scans. Yeah, that seek is probably causing pain, but only because it's being executed 100 thousand times due to the loop join. That sort operation right before the loop isn't helping matters either. Based on the amount of data being moved, and the fact that it looks like the optimizer is choosing good indexes, I'd expect to see hash joins for all these operations. That loop is a problem.

    See if you can update stats with a full scan and then rerun the query. See if that changes the execution plan and the performance. That's where I'd start.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • CREATE NONCLUSTERED INDEX [IX_Account_2] ON [sforce].[Account]

    (

    [Id] ASC

    ) include (Email__c)

    Thanks for the suggestions. I have a question about #2. I am under the assumption that a clustered index should always be better than a non-clustered index, even if the non-clustered index covers all columns in the query. I don't think I've ever seen a primary key as the only member of a non-clustered index before.

    I definitely see the benefit of a covering index, but isn't doesn't a clustered index cover the entire table?

  • dingels35 (9/2/2010)


    I am under the assumption that a clustered index should always be better than a non-clustered index, even if the non-clustered index covers all columns in the query.

    It depends on how the clustered key is setup.

    For performance reasons, the clustered key should be setup over an "ever-increasing" key, be single or composite key, to reduce fragmentation.


    N 56°04'39.16"
    E 12°55'05.25"

  • Grant Fritchey (9/2/2010)


    Your estimated & actuals are a bit off, especially at the end of the query. Are your statistics up to date?

    ...I'd expect to see hash joins for all these operations. That loop is a problem.

    See if you can update stats with a full scan and then rerun the query. See if that changes the execution plan and the performance. That's where I'd start.

    Grant-

    Most of my statistics were very up to date , but I've updated them again anyway. I was very happy to see that execution time was cut down from ~45s to ~16s!

    I've attached the new query plan. Interestingly enough, there are actually two loop joins now in this faster query. Does this make sense to you?

    Table 'Account'. Scan count 0, logical reads 25056, physical reads 4, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Contact'. Scan count 1, logical reads 97607, physical reads 4, read-ahead reads 6848, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Prospect__c'. Scan count 1, logical reads 8378, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    -- are we up to date?

    SELECT name AS stats_name,

    STATS_DATE(object_id, stats_id) AS statistics_update_date

    FROM sys.stats

    WHERE object_id = OBJECT_ID('sforce.Account');

    -- run each of these one at a time

    UPDATE STATISTICS ltf.sforce.Account WITH FULLSCAN

    UPDATE STATISTICS ltf.sforce.Contact WITH FULLSCAN

    UPDATE STATISTICS ltf.sforce.Prospect__c WITH FULLSCAN

  • If it's running faster it makes sense. 😉

    But seriously, 16ms is hard to argue with, however it might still be possible to tweak it more. I'd try the suggestions offered by others just to see what happens there. Also, I usually don't suggest this, but you might want to try a JOIN hint to see if there are HASH joins it runs differently. No promises there, but the fact that it's doing what is basically an unnecessary sort operation suggests that there might be better methods.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This line seems to be obselete, unless it's used for filtering? It is not used in the resultset.

    INNER JOIN ltf.sforce.Contact AS ReferrerContact ON ReferrerContact.Id = Prospect__c.Referring_Member__c

    In that case, just add a WHERE clause like this:

    SELECT ReferreeContact.FirstName,

    ReferreeContact.LastName,

    ReferreeContact.Email AS referree_email,

    Account.Email__c AS account_email,

    Prospect__c.Email__c AS prospect_email,

    ReferreeContact.MMS_Member_ID__c

    FROM LTF.sforce.Prospect__c

    INNER JOIN ltf.sforce.Contact AS ReferreeContact ON ReferreeContact.AccountId = Prospect__c.Membership_ID__c

    AND ReferreeContact.Member_Type__c = 'Primary'

    AND ReferreeContact.Member_Status__c = 'Active'

    AND ReferreeContact.Join_Date__c >= DATEADD(DAY, DATEDIFF(DAY, 30, GETDATE()), 0)

    INNER JOIN ltf.sforce.Account ON Account.Id = Prospect__c.Membership_ID__c

    WHERE Prospect__c.Prospect_Status__c = 'Member'

    AND Prospect__c.Referring_Member__c IS NOT NULL


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (9/2/2010)


    This line seems to be obselete, unless it's used for filtering? It is not used in the resultset.

    INNER JOIN ltf.sforce.Contact AS ReferrerContact ON ReferrerContact.Id = Prospect__c.Referring_Member__c

    SwePeso -

    That's a good catch, I wondered if somebody would bring that up. I made a mistake when first posting this query - the last column (MMS_Member_ID__c) should come from the ReferrerContact table, not the ReferreeContact table.

    In response to your earlier suggestions:

    1) Alter existing index - I did this, but with no real effect, so I changed the index back. Other queries use this index and having Join_Date first is important.

    3) Edit query - Did not do this; same reason as above

    2) Add index to Account - This actually increased performance some. I can't really figure out why - I wouldn't think that a non-clustered index scan would be faster than a clustered index seek. It's using a merge join instead of a loop join, so this must be where the benefit is.

  • You could consider moving the JOIN to Account to an outer query, since it's not needed in the filtering process. Something like below (may not be exact but you get the idea :-)) :

    SELECT FirstName, LastName, ...

    Account.Email__c AS account_email,

    ...

    FROM

    (

    SELECT ReferreeContact.FirstName,

    ReferreeContact.LastName,

    ReferreeContact.Email AS referree_email,

    Prospect__c.Email__c AS prospect_email,

    ReferreeContact.MMS_Member_ID__c

    FROM LTF.sforce.Prospect__c

    INNER JOIN ltf.sforce.Contact AS ReferreeContact ON ReferreeContact.AccountId = Prospect__c.Membership_ID__c

    AND ReferreeContact.Member_Type__c = 'Primary'

    AND ReferreeContact.Member_Status__c = 'Active'

    AND ReferreeContact.Join_Date__c >= DATEADD(DAY, DATEDIFF(DAY, 30, GETDATE()), 0)

    WHERE Prospect__c.Prospect_Status__c = 'Member'

    AND Prospect__c.Referring_Member__c IS NOT NULL

    ) AS derived

    INNER JOIN ltf.sforce.Account ON Account.Id = Membership_ID__c

    Scott Pletcher, SQL Server MVP 2008-2010

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply