Weird Requirement... Multiple Left Joins? Am I missing something?

  • Ok, so I think I should know the answer to this but I'm getting nothing but what looks like a cartesian product... yuk.

    I have 1 table that has a list of master document numbers... for the sake of argument, there is 1 doc num, 1 record.

    There are 3 tables that have values that relate to the DocNum in the Doc Tbl; however, they all don't totally relate to each other. Confused yet?? 😀

    The DocNums are related. However, the Seqence numbers in the other tables determine how they should appear in the final results table.

    Doc TblGR TblGE tblRef tbl

    DocNumsDocnumGRSeqDocNumGESeqDocNumRefSeq

    11A11A11A1

    1B21B21B2

    1C31C3

    1D4

    1E5

    If I ignore the master list of Doc Numbers in the Doc tbl and Left Join GR to GE on DocNum and Seq, I get:

    DocnumGRGR_SeqGEGE_Seq

    1A1A1

    1B2B2

    1NULLNULLC3

    It starts getting whonky if I add the third table because the Sequence number relates to the other two tables. There is no gaurantee that the quantity of sequence numbers is going to be the same in each table. GE could have 5 in the sequence, while GR has 1, and REF has 15. What's REALY chunking up the whole thing is if I try to refer it back to the master list of DocNums. When I try to do that the whole thing blows up and I get what looks to be a cartesian product...

    My end result should look like this:

    DocnumGRGR_SeqGEGE_SeqRefRef_Seq

    1A1A1A1

    1B2B2B2

    1NULLNULLC3C3

    1NULLNULLNULLNULLD4

    1NULLNULLNULLNULLE5

    My recordsets could have 1000-1000000 records on any given day...

    I feel like I should be able to do this; however, I feel like I've overlooked something basic.

    Any ideas anyone? Bueller?

    Thanks

    Crusty.

  • You've given us expected results, which is a good thing. Some people like me visualize solutions most easily when this is provided.

    However you have not provided DDL for your tables, nor INSERTs (setups) for your sample data. And the format that you posted them in would be a pain to convert.

    I think if you do this, someone is going to be able to make short work of your question.

    Remember we're all volunteers here. 🙂


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I will provide some DDL for the tables in the morning... Thanks for the positive reply... 🙂

  • CptCrusty1 (8/27/2012)


    Ok, so I think I should know the answer to this but I'm getting nothing but what looks like a cartesian product... yuk.

    I have 1 table that has a list of master document numbers... for the sake of argument, there is 1 doc num, 1 record.

    There are 3 tables that have values that relate to the DocNum in the Doc Tbl; however, they all don't totally relate to each other. Confused yet?? 😀

    The DocNums are related. However, the Seqence numbers in the other tables determine how they should appear in the final results table.

    Doc TblGR TblGE tblRef tbl

    DocNumsDocnumGRSeqDocNumGESeqDocNumRefSeq

    11A11A11A1

    1B21B21B2

    1C31C3

    1D4

    1E5

    If I ignore the master list of Doc Numbers in the Doc tbl and Left Join GR to GE on DocNum and Seq, I get:

    DocnumGRGR_SeqGEGE_Seq

    1A1A1

    1B2B2

    1NULLNULLC3

    It starts getting whonky if I add the third table because the Sequence number relates to the other two tables. There is no gaurantee that the quantity of sequence numbers is going to be the same in each table. GE could have 5 in the sequence, while GR has 1, and REF has 15. What's REALY chunking up the whole thing is if I try to refer it back to the master list of DocNums. When I try to do that the whole thing blows up and I get what looks to be a cartesian product...

    My end result should look like this:

    DocnumGRGR_SeqGEGE_SeqRefRef_Seq

    1A1A1A1

    1B2B2B2

    1NULLNULLC3C3

    1NULLNULLNULLNULLD4

    1NULLNULLNULLNULLE5

    Hi you are joining "DocNums" as in first part (query) there is no repeat of the value of "DocNum" 1

    in second query there is repeat of doc no .

    if there is repeat of joining data returns multiple record.

    As it works row by row...

  • @ Sandeep,

    I'm sorry, but I'm not quite sure what it is you're telling me; however, in regards to the data, it's correct. Let me get into some detail.

    This is the end result of an ETL process using some pretty mixed up text files. Lets say I have one record in a file that is Pipe-Delimited with double quotes indicated text.

    Using the examples in previous post, when I receive the information I have a text file with the following columns:

    DocNum

    GR

    GE

    REF

    The raw data for the examples provided in the original post:

    1|"A,B"|"A,B,C"|"A,B,C,D,E,F"

    The providor of the data has essentially combined multiple rows of data into 1 row. For the GR (Second column), "A,B" is actually 2 rows of data broken by the comman. It should actually look like:

    DocnumGR

    1A

    1B

    The Sequence number is an attempt at recombining the data and is not apart of the original data. The process in place now uses multiple steps to achieve individual tables for the broken out data exactly as it's shown in the previous examples. The problem is that they combine it all back together with Cartesian joins creating a massive landing table.

    I've consolidated the cleaning of the data into a single procedure; however, I'm trying to recombine the data so that it's not a cartesian join. All the information in the single record pertains to DocNum 1; however, the end result doesn't necessarily have to be totally organized ABC, etc. Every value needs to be in a record with DocNum 1.

    I know this is a very confusing process. The final destination for the data is totally wierd, and will make a dba have a heart attack; however, there is a method to the madness and it works. I'm trying to streamline the process of getting their data to the final step. Also, the example I provided is significantly simplified... but it's the root of the problem.

    I'm trying to make it clearer.... .. sorry.

    Crusty. :ermm:

  • dwain.c (8/27/2012)


    However you have not provided DDL for your tables, nor INSERTs (setups) for your sample data.

    Remember we're all volunteers here. 🙂

    Ok Volunteers... here's your script as promised... I know, I'm slow...

    Create table Doc (

    DocNum nvarchar(15) NOT NULL

    );

    Insert into Doc (DocNum)

    Values ('1');

    ------------------------------

    Create Table GR (

    DocNum nvarchar(15) NOT NULL,

    GR nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into GR (DocNum, GR, Seq) Values (1,'A',1)

    Insert into GR (DocNum, GR, Seq) Values (1,'B',2)

    --------------------------------

    Create Table GE (

    DocNum nvarchar(15) NOT NULL,

    GE nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into GE (DocNum, GE, Seq) Values (1,'A',1)

    Insert into GE (DocNum, GE, Seq) Values (1,'B',2)

    Insert into GE (DocNum, GE, Seq) Values (1,'C',3)

    --------------------------------------------------

    Create Table Ref (

    DocNum nvarchar(15) NOT NULL,

    Ref nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into Ref (DocNum, Ref, Seq) Values (1,'A',1)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'B',2)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'C',3)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'D',4)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'E',5)

    select * from Doc

    select * from GR

    Select * from GE

    select * from Ref

    OH GREAT WIZARDS OF UBER-CODE! HEAR MY SCHREAKING AND BID ME GOOD TIDINGS!!!

  • hiC rusty hop u will get it by reading below code

  • Create table Doc (

    DocNum nvarchar(15) NOT NULL

    );

    Insert into Doc (DocNum)

    Values ('1');

    ------------------------------

    Create Table GR (

    DocNum nvarchar(15) NOT NULL,

    GR nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into GR (DocNum, GR, Seq) Values (1,'A',1)

    Insert into GR (DocNum, GR, Seq) Values (1,'B',2)

    --------------------------------

    Create Table GE (

    DocNum nvarchar(15) NOT NULL,

    GE nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into GE (DocNum, GE, Seq) Values (1,'A',1)

    Insert into GE (DocNum, GE, Seq) Values (1,'B',2)

    Insert into GE (DocNum, GE, Seq) Values (1,'C',3)

    --------------------------------------------------

    Create Table Ref (

    DocNum nvarchar(15) NOT NULL,

    Ref nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    Insert into Ref (DocNum, Ref, Seq) Values (1,'A',1)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'B',2)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'C',3)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'D',4)

    Insert into Ref (DocNum, Ref, Seq) Values (1,'E',5)

    -------

    select * from Ref

    left join GE

    on ref.DocNum=GE.DocNum

  • See the output of bold part of query

    as Ref have 5 values "1" in DocNum

    and GE have 3 values "1" in DocNum

    so output will have 15 rows as join behave row by row.

  • WITH AllSeq AS (

    SELECT Seq FROM GR

    UNION

    SELECT Seq FROM GE

    UNION

    SELECT Seq FROM Ref)

    SELECT d.DocNum,

    gr.GR,

    gr.Seq AS GR_Seq,

    ge.GE,

    ge.Seq AS GE_Seq,

    rf.Ref,

    rf.Seq AS Ref_Seq

    FROM Doc d

    CROSS JOIN AllSeq sq

    LEFT OUTER JOIN GR gr ON gr.DocNum = d.DocNum

    AND gr.Seq = sq.Seq

    LEFT OUTER JOIN GE ge ON ge.DocNum = d.DocNum

    AND ge.Seq = sq.Seq

    LEFT OUTER JOIN Ref rf ON rf.DocNum = d.DocNum

    AND rf.Seq = sq.Seq

    ORDER BY sq.Seq;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • select Doc.DocNum, GR.GR, GR_Seq = GR.Seq, GE.GE, GE_Seq = GE.Seq, Ref.Ref, Ref_Seq = Ref.Seq

    from Doc

    cross join ( values (1),(2),(3),(4),(5),(6),(7),(8),(9) ) N (n)

    left outer join GR on GR.DocNum = Doc.DocNum and N.n = GR.Seq

    left outer join GE on GE.DocNum = DOC.DocNum and N.n = GE.Seq

    left outer join Ref on Ref.DocNum = DOC.DocNum and N.n = Ref.Seq

    where Ref.Ref is not null

    order by Doc.DocNum, N.n

  • What you need to do is Full Outer Joins instead of Left Outer Joins. Join on Doc ID and Row Number.

    SELECT COALESCE(dbo.GR.DocNum, dbo.GE.DocNum, dbo.Ref.DocNum) AS DocNum,

    GR,

    dbo.GR.Seq AS GR_Seq,

    GE,

    dbo.GE.Seq AS GE_Seq,

    Ref,

    dbo.Ref.Seq AS Ref_Seq

    FROM dbo.GR

    FULL OUTER JOIN dbo.GE

    ON dbo.GR.DocNum = dbo.GE.DocNum

    AND dbo.GR.Seq = dbo.GE.Seq

    FULL OUTER JOIN dbo.Ref

    ON dbo.GE.DocNum = dbo.Ref.DocNum

    AND dbo.GE.Seq = dbo.Ref.Seq

    OR dbo.GR.DocNum = dbo.Ref.DocNum

    AND dbo.GR.Seq = dbo.Ref.Seq;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Laurie,

    Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?

    Thanks

    Crusty

  • CptCrusty1 (8/28/2012)


    Laurie,

    Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?

    Thanks

    Crusty

    That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.

    The Full Outer Join version I posted will work in any version of SQL Server, at least from 7.5-on (I haven't played with anything prior to that, so can't be sure it will work there), and will work with any number of sub-values.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • @Laurie, Mark, GSquared. I am humbled.

    I was starting to lean towards Cross Join; however, I'd never had an opportunity to use it before. Seems like it behaves a bit like a pivot table??

    GSQuared, dito, never used Coalesce before. Guess I need to hit the books and learn these two techniques. The results are exactly what I needed.

    I will try all three results for performance against a test version with about 100k records....

    Thanks all.. .I really appreciate your help.

    Sincerely.

    Crusty

  • GS, that's what I thought. I've seen that in a 2008 shop; however, I'm currently in a 2005 shop that is in mid-migration, thus my afore-mentioned quandry.

    Again... My Thanks.

  • CptCrusty1 (8/28/2012)


    Laurie,

    Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?

    Thanks

    Crusty

    Hi - Sorry about that, but you've posted in the SQL 2008 area - one to watch out for next time!

    Yes - it is 2008+

  • Viewing 15 posts - 1 through 15 (of 30 total)

    You must be logged in to reply to this topic. Login to reply