Rewriting a slow subquery of a view

Question

Rewriting a slow subquery of a view

SvetNas

SSC-Addicted

Points: 476
More actions
January 18, 2017 at 12:41 pm

#320688

Hello,
I've been trying (not very successfully) to figure out a way to speed up this view (the select is below).
I'm looking for advise on fixing the ugly subquery which populates column [Status] (I've blocked it off with --- to make it more obvious) and results in an Eager Spool which takes up 65% of the query cost. Without the [Status] column the view is very fast.
-- the table and column names have been replaced with generic names so I can post the code
SELECT TblB.TblB_Col1, TblD.TblD_Col1, TblD.TblD_Col2, TblD.TblD_Col3, TblD.TblD_Col4,
TblB.TblB_Col2, TblB.TblB_Col3, TblA.TblA_Col1, TblA.TblA_Col2, TblE.TblE_Col1, TblE.TblE_Col2
-------
, (SELECT ID
FROM dbo.Table_G
WHERE [Status] =
CASE
WHEN (TblD.Date1 IS NOT NULL)THEN 'Status 1'
WHEN (TblD.Date2 < GETDATE())THEN 'Status 2'
WHEN (TblD.Date3 IS NOT NULL)THEN 'Status 3'
WHEN (TblD.Date4 IS NOT NULL)THEN 'Status 4'
WHEN NOT EXISTS (SELECT TOP 1 ActivityId
FROM dbo.Table_H TblH
WHERE TblH.ActivityId=TblA.Id
AND TblH.EventId=TblB.EventId AND TblH.[enabled]=1
)THEN 'Undefined'
ELSE 'Not Started'
END
) AS [Status]
-------
, TblF.TblF_Col1
FROM dbo.Table_A TblA
CROSS JOIN dbo.Table_B TblB
JOIN dbo.Table_C TblC ON TblC.id = TblB.StatusId
AND TblC.Col5 = 'Some Status Value' AND TblC.Classification != 'Some Class Value'
LEFT JOIN dbo.Table_D TblD ON TblD.PersonId=TblB.PersonId
AND TblD.EventId=TblB.EventId AND TblD.[type]=TblA.event_type
LEFT JOIN dbo.Table_E TblE ON TblE.Offer_id=TblD.id
LEFT JOIN dbo.Table_F TblF ON TblF.id = TblB.EventId;
-- total of about 25 mil rows.
Here are the details for the Eager Spool operator from the execution plan:
Seek Keys[1]: Prefix: [EventDB].[dbo].[Table_H].ActivityId
, [EventDB].[dbo].[Table_H].EventId
, [EventDB].[dbo].[Table_H].[enabled] = Scalar Operator([EventDB].[dbo].[Table_A].[id] as [TblA].[id])
, Scalar Operator([EventDB].[dbo].[Table_B].[EventId] as [D].[EventId]), Scalar Operator((1))
Is there a way to rewrite this join so the subquery is not so expensive and slow?
Your help is much appreciated!
SN

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

SvetNas SSC-Addicted Points: 476 More actions · Answer 1

One more thing -
I rewrote the view replacing one of the subqueries (previously for NOT EXISTS) with a left join. This removed the eager spool from the query plan. When I run the new and old version in a sequence and check the query plan the rewritten one seems to have 1/3 of the cost of old.

It is still slow. Any ideas?

-- rewritten with 1 less subquery
SELECT TblB.TblB_Col1, TblD.TblD_Col1, TblD.TblD_Col2, TblD.TblD_Col3, TblD.TblD_Col4,
TblB.TblB_Col2, TblB.TblB_Col3, TblA.TblA_Col1, TblA.TblA_Col2, TblE.TblE_Col1, TblE.TblE_Col2
-------
, (SELECT ID
FROM dbo.Table_G
WHERE [Status] =
CASE
WHEN TblD.Date1 IS NOT NULLTHEN 'Status 1'
WHEN TblD.Date2 < GETDATE()THEN 'Status 2'
WHEN TblD.Date3 IS NOT NULLTHEN 'Status 3'
WHEN TblD.Date4 IS NOT NULLTHEN 'Status 4'
WHEN TblH.Id IS NULLTHEN 'Undefined'
ELSE 'Not Started'
END
) AS [Status]
-------
, TblF.TblF_Col1
FROM dbo.Table_A TblA
CROSS JOIN dbo.Table_B TblB
JOIN dbo.Table_C TblC ON TblC.id = TblB.StatusId
AND TblC.Col5 = 'Some Status Value' AND TblC.Classification != 'Some Class Value'
LEFT JOIN dbo.Table_D TblD ON TblD.PersonId=TblB.PersonId
AND TblD.EventId=TblB.EventId AND TblD.[type]=TblA.event_type
LEFT JOIN dbo.Table_E TblE ON TblE.Offer_id=TblD.id
LEFT JOIN dbo.Table_F TblF ON TblF.id = TblB.EventId;
LEFT JOIN dbo.Table_H TblH ON TblH.ActivityId=TblA.Id AND TblH.EventId=TblB.EventId AND TblH.[enabled]=1

ben.brugman SSChampion Points: 13369 More actions · Answer 2

Can you post the table sizes ?

I do not understand the construction around Table_G.
This looks 'nondeterministic' which could cause problems. The number of different results that this subquery can give, can be very limited (is my impression).
So what is the size of Table_G.

If there are multiple rows in Table_G for 'Status 1' or for 'Status 2' etc, I do see a problem and probably a solution.

Ben

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Can you post the actual execution plan please? If necessary you can obfuscate it using SQL Sentry Plan Explorer.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

SvetNas SSC-Addicted Points: 476 More actions · Answer 4

ben.brugman - Friday, January 20, 2017 2:52 AM
Can you post the table sizes ?
I do not understand the construction around Table_G.
This looks 'nondeterministic' which could cause problems. The number of different results that this subquery can give, can be very limited (is my impression).
So what is the size of Table_G.
If there are multiple rows in Table_G for 'Status 1' or for 'Status 2' etc, I do see a problem and probably a solution.
Ben

Ben,
Table_G is a ref table with only 5 records for the possible statuses and the subquery that uses it assigns the ID of the appropriate status to each row in the view (I was a bit thrown off by that when I first saw it too).

Here are the sizes:

dbo.Table_A -- 5
dbo.Table_B -- ~ 5mil
dbo.Table_C -- 1.5K
dbo.Table_D -- 25K
dbo.Table_E -- 6K
dbo.Table_F -- 40K
dbo.Table_G -- 5

Thanks for taking a look!
SN