multiple LEFT OUTER self joins and a View

  • We have a procedure which generates some dynamic SQL to retrieve multiple points of 'survey data' to the application in a single row. The number of joins is driven by another 'criteria' table. These surveys drive off a view which links partyid's, questionids, responses, and other codes.

    The SQL assembled dynamically looks like:

    LEFT OUTER JOIN vw_surveydetails sd13 ON sd13.partyID = person.partyID AND sd13.questionid=13

    LEFT OUTER JOIN vw_surveydetails sd15 ON sd15.partyID = person.partyID AND sd15.questionid=15

    LEFT OUTER JOIN vw_surveydetails sd16 ON sd16.partyID = person.partyID AND sd16.questionid=16

    LEFT OUTER JOIN vw_surveydetails sd18 ON sd18.partyID = person.partyID AND sd18.questionid=18

    LEFT OUTER JOIN vw_surveydetails sd26 ON sd26.partyID = person.partyID AND sd26.questionid=26

    LEFT OUTER JOIN vw_surveydetails sd27 ON sd27.partyID = person.partyID AND sd27.questionid=27

    LEFT OUTER JOIN vw_surveydetails sd28 ON sd28.partyID = person.partyID AND sd28.questionid=28

    LEFT OUTER JOIN vw_surveydetails sd29 ON sd29.partyID = person.partyID AND sd29.questionid=29

    LEFT OUTER JOIN vw_surveydetails sd30 ON sd30.partyID = person.partyID AND sd30.questionid=30

    LEFT OUTER JOIN vw_surveydetails sd31 ON sd31.partyID = person.partyID AND sd31.questionid=31

    LEFT OUTER JOIN vw_surveydetails sd84 ON sd84.partyID = person.partyID AND sd84.questionid=84

    LEFT OUTER JOIN vw_surveydetails sd86 ON sd86.partyID = person.partyID AND sd86.questionid=86

    WHERE partyid = xxxxxx

    This was fine with about 6 survey questions. However now each additional one seems to take a disproportionate extra amount of time, to the extent that with 12 of them (above) it takes about 12 seconds for an OLD partyID - new ones run quickly.

    A SELECT to the view runs very quickly with any PartyID. One of the underlying tables is big (28M rows) but has partyID as primary key.

    One of my colleagues found something saying LEFT OUTER JOINS to views are not recommended as they often get all rows from the view defining query, but we need a value for each field (even if null).

    I can think of a few ways to change this but they all involve a lot of refactoring and/or denormalization of the DB (i.e. make the view unnecessary). However I'm hoping there might be some clever quicker way?

    Many thanks for any suggestions.

  • LEFT JOIN(

    SELECTPartyID,

    MAX(CASE WHEN QuestionID = 13 THEN Col1 ELSE NULL END) AS sd13,

    MAX(CASE WHEN QuestionID = 15 THEN Col1 ELSE NULL END) AS sd15,

    MAX(CASE WHEN QuestionID = 16 THEN Col1 ELSE NULL END) AS sd16,

    MAX(CASE WHEN QuestionID = 18 THEN Col1 ELSE NULL END) AS sd18,

    MAX(CASE WHEN QuestionID = 26 THEN Col1 ELSE NULL END) AS sd26,

    MAX(CASE WHEN QuestionID = 27 THEN Col1 ELSE NULL END) AS sd27,

    MAX(CASE WHEN QuestionID = 28 THEN Col1 ELSE NULL END) AS sd28,

    MAX(CASE WHEN QuestionID = 29 THEN Col1 ELSE NULL END) AS sd29,

    MAX(CASE WHEN QuestionID = 30 THEN Col1 ELSE NULL END) AS sd30,

    MAX(CASE WHEN QuestionID = 31 THEN Col1 ELSE NULL END) AS sd31,

    MAX(CASE WHEN QuestionID = 84 THEN Col1 ELSE NULL END) AS sd84,

    MAX(CASE WHEN QuestionID = 86 THEN Col1 ELSE NULL END) AS sd86

    FROMvw_surveydetails

    ) AS sd ON sd.partyID = person.partyID


    N 56°04'39.16"
    E 12°55'05.25"

  • The first thing I'd do is look at the execution plan for the query and see what's going on with that. Are there table scans? Index scans? Nested loop joins on large numbers of rows? Out-of-date/missing statistics?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Re: Peso - this is a clever and quick way of doing it, and works well (needs a GROUP BY PartyID at the end) - however will demand significant refactoring of the dynamic SQL creation SPs

    Re: GSquared - the time in the execution plan (actual) is 65% 'lazy spooling'

    We've found an approach that seems to work well: an index on the view.

  • ishmael (9/17/2008)


    Re: Peso - this is a clever and quick way of doing it, and works well (needs a GROUP BY PartyID at the end) - however will demand significant refactoring of the dynamic SQL creation SPs

    Re: GSquared - the time in the execution plan (actual) is 65% 'lazy spooling'

    We've found an approach that seems to work well: an index on the view.

    If you don't mind, would you post the create statement for the index you created?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yeah, an indexed (persisted) view would help with lazy spooling. That makes sense.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Given that you are trying to "retrieve multiple points of 'survey data' to the application in a single row", that sounds to me like you are pivoting the data. I have never looked into how to use the PIVOT statement, so I will show you how I would do this. I am assuming you are wanting to see a response and code for each question.

    (This is similar to what Peso posted, but this is the way I tend to do it.)

    Scott

    SELECT p.partyid,

    --Dynamic part

    --The CASE statements breaks up your data into separate columns, but you still have all your rows.

    --The MAX statements compress the rows, so that you get one row per partyid.

    MAX(CASE sd.questionid WHEN 1 THEN sd.response ELSE '' END) Response1, --Assuming responses are text

    MAX(CASE sd.questionid WHEN 1 THEN sd.code ELSE 0 END) Code1, --Assuming codes are numeric

    MAX(CASE sd.questionid WHEN 2 THEN sd.response ELSE '' END) Response2,

    MAX(CASE sd.questionid WHEN 2 THEN sd.code ELSE 0 END) Code2,

    -- ... do as many of these are you have questions

    --End Dynamic part

    p.name --putting a column here makes the dynamic part a little easier since you don't need to exclude the last comma

    FROM person p

    INNER JOIN vw_surveydetails sd ON sd.partyID = p.partyID

    WHERE p.partyid = xxxxxx

    GROUP BY p.partyid, p.name

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply