multiple LEFT OUTER self joins and a View

Question

Post reply

multiple LEFT OUTER self joins and a View

ishmael

Grasshopper

Points: 10
More actions
September 17, 2008 at 12:56 am

#190847

We have a procedure which generates some dynamic SQL to retrieve multiple points of 'survey data' to the application in a single row. The number of joins is driven by another 'criteria' table. These surveys drive off a view which links partyid's, questionids, responses, and other codes.
The SQL assembled dynamically looks like:
LEFT OUTER JOIN vw_surveydetails sd13 ON sd13.partyID = person.partyID AND sd13.questionid=13
LEFT OUTER JOIN vw_surveydetails sd15 ON sd15.partyID = person.partyID AND sd15.questionid=15
LEFT OUTER JOIN vw_surveydetails sd16 ON sd16.partyID = person.partyID AND sd16.questionid=16
LEFT OUTER JOIN vw_surveydetails sd18 ON sd18.partyID = person.partyID AND sd18.questionid=18
LEFT OUTER JOIN vw_surveydetails sd26 ON sd26.partyID = person.partyID AND sd26.questionid=26
LEFT OUTER JOIN vw_surveydetails sd27 ON sd27.partyID = person.partyID AND sd27.questionid=27
LEFT OUTER JOIN vw_surveydetails sd28 ON sd28.partyID = person.partyID AND sd28.questionid=28
LEFT OUTER JOIN vw_surveydetails sd29 ON sd29.partyID = person.partyID AND sd29.questionid=29
LEFT OUTER JOIN vw_surveydetails sd30 ON sd30.partyID = person.partyID AND sd30.questionid=30
LEFT OUTER JOIN vw_surveydetails sd31 ON sd31.partyID = person.partyID AND sd31.questionid=31
LEFT OUTER JOIN vw_surveydetails sd84 ON sd84.partyID = person.partyID AND sd84.questionid=84
LEFT OUTER JOIN vw_surveydetails sd86 ON sd86.partyID = person.partyID AND sd86.questionid=86
WHERE partyid = xxxxxx
This was fine with about 6 survey questions. However now each additional one seems to take a disproportionate extra amount of time, to the extent that with 12 of them (above) it takes about 12 seconds for an OLD partyID - new ones run quickly.
A SELECT to the view runs very quickly with any PartyID. One of the underlying tables is big (28M rows) but has partyID as primary key.
One of my colleagues found something saying LEFT OUTER JOINS to views are not recommended as they often get all rows from the view defining query, but we need a value for each field (even if null).
I can think of a few ways to change this but they all involve a lot of refactoring and/or denormalization of the DB (i.e. make the view unnecessary). However I'm hoping there might be some clever quicker way?
Many thanks for any suggestions.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

LEFT JOIN(

SELECTPartyID,

MAX(CASE WHEN QuestionID = 13 THEN Col1 ELSE NULL END) AS sd13,

MAX(CASE WHEN QuestionID = 15 THEN Col1 ELSE NULL END) AS sd15,

MAX(CASE WHEN QuestionID = 16 THEN Col1 ELSE NULL END) AS sd16,

MAX(CASE WHEN QuestionID = 18 THEN Col1 ELSE NULL END) AS sd18,

MAX(CASE WHEN QuestionID = 26 THEN Col1 ELSE NULL END) AS sd26,

MAX(CASE WHEN QuestionID = 27 THEN Col1 ELSE NULL END) AS sd27,

MAX(CASE WHEN QuestionID = 28 THEN Col1 ELSE NULL END) AS sd28,

MAX(CASE WHEN QuestionID = 29 THEN Col1 ELSE NULL END) AS sd29,

MAX(CASE WHEN QuestionID = 30 THEN Col1 ELSE NULL END) AS sd30,

MAX(CASE WHEN QuestionID = 31 THEN Col1 ELSE NULL END) AS sd31,

MAX(CASE WHEN QuestionID = 84 THEN Col1 ELSE NULL END) AS sd84,

MAX(CASE WHEN QuestionID = 86 THEN Col1 ELSE NULL END) AS sd86

FROMvw_surveydetails

) AS sd ON sd.partyID = person.partyID

N 56°04'39.16"
E 12°55'05.25"

GSquared SSC Guru Points: 260824 More actions · Answer 2

The first thing I'd do is look at the execution plan for the query and see what's going on with that. Are there table scans? Index scans? Nested loop joins on large numbers of rows? Out-of-date/missing statistics?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ishmael Grasshopper Points: 10 More actions · Answer 3

Re: Peso - this is a clever and quick way of doing it, and works well (needs a GROUP BY PartyID at the end) - however will demand significant refactoring of the dynamic SQL creation SPs

Re: GSquared - the time in the execution plan (actual) is 65% 'lazy spooling'

We've found an approach that seems to work well: an index on the view.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 4

ishmael (9/17/2008)
Re: Peso - this is a clever and quick way of doing it, and works well (needs a GROUP BY PartyID at the end) - however will demand significant refactoring of the dynamic SQL creation SPs
Re: GSquared - the time in the execution plan (actual) is 65% 'lazy spooling'
We've found an approach that seems to work well: an index on the view.

If you don't mind, would you post the create statement for the index you created?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 5

Yeah, an indexed (persisted) view would help with lazy spooling. That makes sense.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Scott D-810830 SSC-Addicted Points: 494 More actions · Answer 6

Given that you are trying to "retrieve multiple points of 'survey data' to the application in a single row", that sounds to me like you are pivoting the data. I have never looked into how to use the PIVOT statement, so I will show you how I would do this. I am assuming you are wanting to see a response and code for each question.

(This is similar to what Peso posted, but this is the way I tend to do it.)

Scott

SELECT p.partyid,

--Dynamic part

--The CASE statements breaks up your data into separate columns, but you still have all your rows.

--The MAX statements compress the rows, so that you get one row per partyid.

MAX(CASE sd.questionid WHEN 1 THEN sd.response ELSE '' END) Response1, --Assuming responses are text

MAX(CASE sd.questionid WHEN 1 THEN sd.code ELSE 0 END) Code1, --Assuming codes are numeric

MAX(CASE sd.questionid WHEN 2 THEN sd.response ELSE '' END) Response2,

MAX(CASE sd.questionid WHEN 2 THEN sd.code ELSE 0 END) Code2,

-- ... do as many of these are you have questions

--End Dynamic part

p.name --putting a column here makes the dynamic part a little easier since you don't need to exclude the last comma

FROM person p

INNER JOIN vw_surveydetails sd ON sd.partyID = p.partyID

WHERE p.partyid = xxxxxx

GROUP BY p.partyid, p.name