August 27, 2012 at 2:45 pm
Ok, so I think I should know the answer to this but I'm getting nothing but what looks like a cartesian product... yuk.
I have 1 table that has a list of master document numbers... for the sake of argument, there is 1 doc num, 1 record.
There are 3 tables that have values that relate to the DocNum in the Doc Tbl; however, they all don't totally relate to each other. Confused yet?? 😀
The DocNums are related. However, the Seqence numbers in the other tables determine how they should appear in the final results table.
Doc TblGR TblGE tblRef tbl
DocNumsDocnumGRSeqDocNumGESeqDocNumRefSeq
11A11A11A1
1B21B21B2
1C31C3
1D4
1E5
If I ignore the master list of Doc Numbers in the Doc tbl and Left Join GR to GE on DocNum and Seq, I get:
DocnumGRGR_SeqGEGE_Seq
1A1A1
1B2B2
1NULLNULLC3
It starts getting whonky if I add the third table because the Sequence number relates to the other two tables. There is no gaurantee that the quantity of sequence numbers is going to be the same in each table. GE could have 5 in the sequence, while GR has 1, and REF has 15. What's REALY chunking up the whole thing is if I try to refer it back to the master list of DocNums. When I try to do that the whole thing blows up and I get what looks to be a cartesian product...
My end result should look like this:
DocnumGRGR_SeqGEGE_SeqRefRef_Seq
1A1A1A1
1B2B2B2
1NULLNULLC3C3
1NULLNULLNULLNULLD4
1NULLNULLNULLNULLE5
My recordsets could have 1000-1000000 records on any given day...
I feel like I should be able to do this; however, I feel like I've overlooked something basic.
Any ideas anyone? Bueller?
Thanks
Crusty.
August 27, 2012 at 6:37 pm
You've given us expected results, which is a good thing. Some people like me visualize solutions most easily when this is provided.
However you have not provided DDL for your tables, nor INSERTs (setups) for your sample data. And the format that you posted them in would be a pain to convert.
I think if you do this, someone is going to be able to make short work of your question.
Remember we're all volunteers here. 🙂
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
August 27, 2012 at 8:36 pm
I will provide some DDL for the tables in the morning... Thanks for the positive reply... 🙂
August 27, 2012 at 9:42 pm
CptCrusty1 (8/27/2012)
Ok, so I think I should know the answer to this but I'm getting nothing but what looks like a cartesian product... yuk.I have 1 table that has a list of master document numbers... for the sake of argument, there is 1 doc num, 1 record.
There are 3 tables that have values that relate to the DocNum in the Doc Tbl; however, they all don't totally relate to each other. Confused yet?? 😀
The DocNums are related. However, the Seqence numbers in the other tables determine how they should appear in the final results table.
Doc TblGR TblGE tblRef tbl
DocNumsDocnumGRSeqDocNumGESeqDocNumRefSeq
11A11A11A1
1B21B21B2
1C31C3
1D4
1E5
If I ignore the master list of Doc Numbers in the Doc tbl and Left Join GR to GE on DocNum and Seq, I get:
DocnumGRGR_SeqGEGE_Seq
1A1A1
1B2B2
1NULLNULLC3
It starts getting whonky if I add the third table because the Sequence number relates to the other two tables. There is no gaurantee that the quantity of sequence numbers is going to be the same in each table. GE could have 5 in the sequence, while GR has 1, and REF has 15. What's REALY chunking up the whole thing is if I try to refer it back to the master list of DocNums. When I try to do that the whole thing blows up and I get what looks to be a cartesian product...
My end result should look like this:
DocnumGRGR_SeqGEGE_SeqRefRef_Seq
1A1A1A1
1B2B2B2
1NULLNULLC3C3
1NULLNULLNULLNULLD4
1NULLNULLNULLNULLE5
Hi you are joining "DocNums" as in first part (query) there is no repeat of the value of "DocNum" 1
in second query there is repeat of doc no .
if there is repeat of joining data returns multiple record.
As it works row by row...
August 28, 2012 at 7:57 am
@ Sandeep,
I'm sorry, but I'm not quite sure what it is you're telling me; however, in regards to the data, it's correct. Let me get into some detail.
This is the end result of an ETL process using some pretty mixed up text files. Lets say I have one record in a file that is Pipe-Delimited with double quotes indicated text.
Using the examples in previous post, when I receive the information I have a text file with the following columns:
DocNum
GR
GE
REF
The raw data for the examples provided in the original post:
1|"A,B"|"A,B,C"|"A,B,C,D,E,F"
The providor of the data has essentially combined multiple rows of data into 1 row. For the GR (Second column), "A,B" is actually 2 rows of data broken by the comman. It should actually look like:
DocnumGR
1A
1B
The Sequence number is an attempt at recombining the data and is not apart of the original data. The process in place now uses multiple steps to achieve individual tables for the broken out data exactly as it's shown in the previous examples. The problem is that they combine it all back together with Cartesian joins creating a massive landing table.
I've consolidated the cleaning of the data into a single procedure; however, I'm trying to recombine the data so that it's not a cartesian join. All the information in the single record pertains to DocNum 1; however, the end result doesn't necessarily have to be totally organized ABC, etc. Every value needs to be in a record with DocNum 1.
I know this is a very confusing process. The final destination for the data is totally wierd, and will make a dba have a heart attack; however, there is a method to the madness and it works. I'm trying to streamline the process of getting their data to the final step. Also, the example I provided is significantly simplified... but it's the root of the problem.
I'm trying to make it clearer.... .. sorry.
Crusty. :ermm:
August 28, 2012 at 9:24 am
dwain.c (8/27/2012)
However you have not provided DDL for your tables, nor INSERTs (setups) for your sample data.Remember we're all volunteers here. 🙂
Ok Volunteers... here's your script as promised... I know, I'm slow...
Create table Doc (
DocNum nvarchar(15) NOT NULL
);
Insert into Doc (DocNum)
Values ('1');
------------------------------
Create Table GR (
DocNum nvarchar(15) NOT NULL,
GR nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into GR (DocNum, GR, Seq) Values (1,'A',1)
Insert into GR (DocNum, GR, Seq) Values (1,'B',2)
--------------------------------
Create Table GE (
DocNum nvarchar(15) NOT NULL,
GE nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into GE (DocNum, GE, Seq) Values (1,'A',1)
Insert into GE (DocNum, GE, Seq) Values (1,'B',2)
Insert into GE (DocNum, GE, Seq) Values (1,'C',3)
--------------------------------------------------
Create Table Ref (
DocNum nvarchar(15) NOT NULL,
Ref nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into Ref (DocNum, Ref, Seq) Values (1,'A',1)
Insert into Ref (DocNum, Ref, Seq) Values (1,'B',2)
Insert into Ref (DocNum, Ref, Seq) Values (1,'C',3)
Insert into Ref (DocNum, Ref, Seq) Values (1,'D',4)
Insert into Ref (DocNum, Ref, Seq) Values (1,'E',5)
select * from Doc
select * from GR
Select * from GE
select * from Ref
OH GREAT WIZARDS OF UBER-CODE! HEAR MY SCHREAKING AND BID ME GOOD TIDINGS!!!
August 28, 2012 at 9:49 am
hiC rusty hop u will get it by reading below code
Create table Doc (
DocNum nvarchar(15) NOT NULL
);
Insert into Doc (DocNum)
Values ('1');
------------------------------
Create Table GR (
DocNum nvarchar(15) NOT NULL,
GR nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into GR (DocNum, GR, Seq) Values (1,'A',1)
Insert into GR (DocNum, GR, Seq) Values (1,'B',2)
--------------------------------
Create Table GE (
DocNum nvarchar(15) NOT NULL,
GE nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into GE (DocNum, GE, Seq) Values (1,'A',1)
Insert into GE (DocNum, GE, Seq) Values (1,'B',2)
Insert into GE (DocNum, GE, Seq) Values (1,'C',3)
--------------------------------------------------
Create Table Ref (
DocNum nvarchar(15) NOT NULL,
Ref nvarchar(15) NOT NULL,
Seq intNOT NULL
)
Insert into Ref (DocNum, Ref, Seq) Values (1,'A',1)
Insert into Ref (DocNum, Ref, Seq) Values (1,'B',2)
Insert into Ref (DocNum, Ref, Seq) Values (1,'C',3)
Insert into Ref (DocNum, Ref, Seq) Values (1,'D',4)
Insert into Ref (DocNum, Ref, Seq) Values (1,'E',5)
-------
select * from Ref
left join GE
on ref.DocNum=GE.DocNum
See the output of bold part of query
as Ref have 5 values "1" in DocNum
and GE have 3 values "1" in DocNum
so output will have 15 rows as join behave row by row.
August 28, 2012 at 10:04 am
WITH AllSeq AS (
SELECT Seq FROM GR
UNION
SELECT Seq FROM GE
UNION
SELECT Seq FROM Ref)
SELECT d.DocNum,
gr.GR,
gr.Seq AS GR_Seq,
ge.GE,
ge.Seq AS GE_Seq,
rf.Ref,
rf.Seq AS Ref_Seq
FROM Doc d
CROSS JOIN AllSeq sq
LEFT OUTER JOIN GR gr ON gr.DocNum = d.DocNum
AND gr.Seq = sq.Seq
LEFT OUTER JOIN GE ge ON ge.DocNum = d.DocNum
AND ge.Seq = sq.Seq
LEFT OUTER JOIN Ref rf ON rf.DocNum = d.DocNum
AND rf.Seq = sq.Seq
ORDER BY sq.Seq;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537August 28, 2012 at 10:05 am
select Doc.DocNum, GR.GR, GR_Seq = GR.Seq, GE.GE, GE_Seq = GE.Seq, Ref.Ref, Ref_Seq = Ref.Seq
from Doc
cross join ( values (1),(2),(3),(4),(5),(6),(7),(8),(9) ) N (n)
left outer join GR on GR.DocNum = Doc.DocNum and N.n = GR.Seq
left outer join GE on GE.DocNum = DOC.DocNum and N.n = GE.Seq
left outer join Ref on Ref.DocNum = DOC.DocNum and N.n = Ref.Seq
where Ref.Ref is not null
order by Doc.DocNum, N.n
August 28, 2012 at 10:09 am
What you need to do is Full Outer Joins instead of Left Outer Joins. Join on Doc ID and Row Number.
SELECT COALESCE(dbo.GR.DocNum, dbo.GE.DocNum, dbo.Ref.DocNum) AS DocNum,
GR,
dbo.GR.Seq AS GR_Seq,
GE,
dbo.GE.Seq AS GE_Seq,
Ref,
dbo.Ref.Seq AS Ref_Seq
FROM dbo.GR
FULL OUTER JOIN dbo.GE
ON dbo.GR.DocNum = dbo.GE.DocNum
AND dbo.GR.Seq = dbo.GE.Seq
FULL OUTER JOIN dbo.Ref
ON dbo.GE.DocNum = dbo.Ref.DocNum
AND dbo.GE.Seq = dbo.Ref.Seq
OR dbo.GR.DocNum = dbo.Ref.DocNum
AND dbo.GR.Seq = dbo.Ref.Seq;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 28, 2012 at 1:13 pm
Laurie,
Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?
Thanks
Crusty
August 28, 2012 at 1:16 pm
CptCrusty1 (8/28/2012)
Laurie,Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?
Thanks
Crusty
That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.
The Full Outer Join version I posted will work in any version of SQL Server, at least from 7.5-on (I haven't played with anything prior to that, so can't be sure it will work there), and will work with any number of sub-values.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 28, 2012 at 1:17 pm
@Laurie, Mark, GSquared. I am humbled.
I was starting to lean towards Cross Join; however, I'd never had an opportunity to use it before. Seems like it behaves a bit like a pivot table??
GSQuared, dito, never used Coalesce before. Guess I need to hit the books and learn these two techniques. The results are exactly what I needed.
I will try all three results for performance against a test version with about 100k records....
Thanks all.. .I really appreciate your help.
Sincerely.
Crusty
August 28, 2012 at 1:19 pm
GS, that's what I thought. I've seen that in a 2008 shop; however, I'm currently in a 2005 shop that is in mid-migration, thus my afore-mentioned quandry.
Again... My Thanks.
August 28, 2012 at 1:20 pm
CptCrusty1 (8/28/2012)
Laurie,Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?
Thanks
Crusty
Hi - Sorry about that, but you've posted in the SQL 2008 area - one to watch out for next time!
Yes - it is 2008+
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply