Weird Requirement... Multiple Left Joins? Am I missing something?

Question

Weird Requirement... Multiple Left Joins? Am I missing something?

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

laurie-789651 SSCertifiable Points: 7680 More actions · Answer 1

GSquared (8/28/2012)
CptCrusty1 (8/28/2012)
Laurie,
Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?
Thanks
Crusty
That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.

Just for the record, this example is limited to 9, but you can code the numbers table up to any number.

CptCrusty1 Hall of Fame Points: 3386 More actions · Answer 2

I'm going to remember you guys (and gals) and ChristmaHoniQuanzika..... :crying::-D

GSquared SSC Guru Points: 260824 More actions · Answer 3

laurie-789651 (8/28/2012)
GSquared (8/28/2012)
CptCrusty1 (8/28/2012)
Laurie,
Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?
Thanks
Crusty
That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.
Just for the record, this example is limited to 9, but you can code the numbers table up to any number.

Yes, but it will still have a finite limit, and is an unnecessary limit on the query. This kind of situation is exactly what Full Outer Join is meant for, so working around it with hard-coded things like that is unnecessary.

The numbers table version you came up with isn't a bad solution, it's just unnecessarily complex for the desired end result. That's all. No big deal. And a hard-coded numbers table with a few thousand rows would be backwards compatible, and work for any conceivable data complexity needed by this code. So it's limits aren't that big a deal. Just an Occam fan here.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 4

CptCrusty1 (8/28/2012)
@Laurie, Mark, GSquared. I am humbled.
I was starting to lean towards Cross Join; however, I'd never had an opportunity to use it before. Seems like it behaves a bit like a pivot table??
GSQuared, dito, never used Coalesce before. Guess I need to hit the books and learn these two techniques. The results are exactly what I needed.
I will try all three results for performance against a test version with about 100k records....
Thanks all.. .I really appreciate your help.
Sincerely.
Crusty

Cross Join just joins every row in one table (or dataset) to every row in another table (or dataset). Produces what's called a "Cartesian Product".

Coalesce is just IsNull's big brother. All it does is pick the first non-null value in a list. If they're all null, it returns null, otherwise, it gets the first one. Since, in an outer join (Full, Left, or Right), one or more of the columns may be null, I used that to make sure it would get something in that column.

The key to my version is Full Outer Join. What that does is get all rows from both sides of the join, whether they have a matching row in the other side or not.

So, a Full Outer from GR to GE gets all rows in GR, even if GE doesn't have a matching row. And it gets all rows from GE, even if GR doesn't have a matching row. So if GR had 5 and GE 2, it would still get all rows in both. Then it does the same thing with Ref, getting all rows, even if they don't have a match in the other two. If anything does match, it puts them in the same row, but if it doesn't match, it still pulls it.

This kind of problem is exactly what Full Outer Join is designed to handle. It's one of those features of SQL that seems to be less well-known. But, issues with how Joins work is one of the things that gives new database devs major headaches, so it's not too surprising.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

CptCrusty1 Hall of Fame Points: 3386 More actions · Answer 5

GS, I'm going to move this out of the Forum... I could stand to chat with you... LOL... I'll PM you.

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

LEFT JOINs, CROSS JOINs, FULL OUTER JOINs? What's up with that?

Why not something a bit simpler like this?

;WITH CTE AS (

SELECT n=1, DocNum, Ref, Seq

FROM Ref UNION ALL

SELECT 2, DocNum, GE, Seq

FROM GE UNION ALL

SELECT 3, DocNum, GR, Seq

FROM GR)

SELECT a.DocNum

,GR=MAX(CASE n WHEN 3 THEN Ref END)

,GR_Seq=MAX(CASE n WHEN 3 THEN Seq END)

,GE=MAX(CASE n WHEN 2 THEN Ref END)

,GE_Seq=MAX(CASE n WHEN 2 THEN Seq END)

,Ref=MAX(CASE n WHEN 1 THEN Ref END)

,Ref_Seq=MAX(CASE n WHEN 1 THEN Seq END )

FROM CTE a

INNER JOIN Doc b ON a.DocNum = b.DocNum

GROUP BY a.DocNum, Ref

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

GSquared SSC Guru Points: 260824 More actions · Answer 7

dwain.c (8/28/2012)
LEFT JOINs, CROSS JOINs, FULL OUTER JOINs? What's up with that?
Why not something a bit simpler like this?
;WITH CTE AS (
SELECT n=1, DocNum, Ref, Seq
FROM Ref UNION ALL
SELECT 2, DocNum, GE, Seq
FROM GE UNION ALL
SELECT 3, DocNum, GR, Seq
FROM GR)
SELECT a.DocNum
,GR=MAX(CASE n WHEN 3 THEN Ref END)
,GR_Seq=MAX(CASE n WHEN 3 THEN Seq END)
,GE=MAX(CASE n WHEN 2 THEN Ref END)
,GE_Seq=MAX(CASE n WHEN 2 THEN Seq END)
,Ref=MAX(CASE n WHEN 1 THEN Ref END)
,Ref_Seq=MAX(CASE n WHEN 1 THEN Seq END )
FROM CTE a
INNER JOIN Doc b ON a.DocNum = b.DocNum
GROUP BY a.DocNum, Ref

Because that solution will cause an I/O blow-up directly proportional to the number of entries in the Doc table.

I added docs 2-5, each with 1 entry in the GE table, to the provided sample data.

With 1 entry, the I/O on your solution looks like this:

Table 'Doc'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'GR'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'GE'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Ref'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Very nice.

With 5 entries, it looks like this:

Table 'GR'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'GE'. Scan count 5, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Ref'. Scan count 5, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Doc'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

See how the scans increase directly as you increase the rows in Doc?

With 1 entry or with 5 entries, the Full Outer Join solution looks like this:

Table 'Worktable'. Scan count 2, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Ref'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'GR'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'GE'. Scan count 2, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

2 scans per table, regardless of number of rows. Logical reads can get out of hand, but both do that, and scan count is usually a bigger deal in terms of overall performance.

Try it with a few thousand or a million rows. The I/O stats will be a serious bottleneck on yours, and won't be a big deal at all on the Full Outer version.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

CptCrusty1 Hall of Fame Points: 3386 More actions · Answer 8

Amen, GS.... I have to scale this up to about 5-6 tables with an est. million + rows each.... The results are going to be pumped to a website and I don't think a paying customer wants to wait that long....

Thanks for the try Dwain... everyone is being really cool about this...

Crusty.

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 9

Here's a way to generate some simple timing tests.

TRUNCATE TABLE GR;

TRUNCATE TABLE GE;

TRUNCATE TABLE Ref;

Insert into GR(DocNum, GR, Seq)

SELECT 1,'GR_'+CAST(number AS VARCHAR(10)),number

FROM master.dbo.spt_values

WHERE number BETWEEN 1 AND 500 AND type='P'

Insert into GE(DocNum, GE, Seq)

SELECT 1,'GE_'+CAST(number AS VARCHAR(10)),number

FROM master.dbo.spt_values

WHERE number BETWEEN 1 AND 750 AND type='P'

Insert into Ref (DocNum, Ref, Seq)

SELECT 1,'Ref_'+CAST(number AS VARCHAR(10)),number

FROM master.dbo.spt_values

WHERE number BETWEEN 1 AND 1000 AND type='P';

GO

PRINT '========== Mark ==========================================='

DECLARE @DocNum nvarchar(15)

DECLARE @GR nvarchar(15)

DECLARE @GE nvarchar(15)

DECLARE @Ref nvarchar(15)

DECLARE @Seq1 int

DECLARE @Seq2 int

DECLARE @Seq3 int

SET STATISTICS TIME ON

;WITH AllSeq AS (

SELECT Seq FROM GR

UNION

SELECT Seq FROM GE

UNION

SELECT Seq FROM Ref)

SELECT @DocNum = d.DocNum,

@GR = gr.GR,

@Seq1 = gr.Seq,

@GE = ge.GE,

@Seq2 = ge.Seq,

@Ref = rf.Ref,

@Seq3 = rf.Seq

FROM Doc d

CROSS JOIN AllSeq sq

LEFT OUTER JOIN GR gr ON gr.DocNum = d.DocNum

AND gr.Seq = sq.Seq

LEFT OUTER JOIN GE ge ON ge.DocNum = d.DocNum

AND ge.Seq = sq.Seq

LEFT OUTER JOIN Ref rf ON rf.DocNum = d.DocNum

AND rf.Seq = sq.Seq;

SET STATISTICS TIME OFF

GO

PRINT '========== GSquared ==========================================='

DECLARE @DocNum nvarchar(15)

DECLARE @GR nvarchar(15)

DECLARE @GE nvarchar(15)

DECLARE @Ref nvarchar(15)

DECLARE @Seq1 int

DECLARE @Seq2 int

DECLARE @Seq3 int

SET STATISTICS TIME ON

SELECT @DocNum = COALESCE(dbo.GR.DocNum, dbo.GE.DocNum, dbo.Ref.DocNum),

@GR = GR,

@Seq1 = dbo.GR.Seq,

@GE = GE,

@Seq2 = dbo.GE.Seq,

@Ref = Ref,

@Seq3 = dbo.Ref.Seq

FROM dbo.GR

FULL OUTER JOIN dbo.GE

ON dbo.GR.DocNum = dbo.GE.DocNum

AND dbo.GR.Seq = dbo.GE.Seq

FULL OUTER JOIN dbo.Ref

ON dbo.GE.DocNum = dbo.Ref.DocNum

AND dbo.GE.Seq = dbo.Ref.Seq

OR dbo.GR.DocNum = dbo.Ref.DocNum

AND dbo.GR.Seq = dbo.Ref.Seq;

SET STATISTICS TIME OFF

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

Gus,

That's an interesting point on the IOs but in the below test harness (5 docs) it still runs pretty fast, although clearly not as fast as Mark's.

Create table #Doc (

DocNum nvarchar(15) NOT NULL

);

Insert into #Doc (DocNum)

Values ('1'),('2'),('3'),('4'),('5');

------------------------------

Create Table #GR (

DocNum nvarchar(15) NOT NULL,

GR nvarchar(15) NOT NULL,

Seq intNOT NULL

)

--------------------------------

Create Table #GE (

DocNum nvarchar(15) NOT NULL,

GE nvarchar(15) NOT NULL,

Seq intNOT NULL

)

--------------------------------------------------

Create Table #Ref (

DocNum nvarchar(15) NOT NULL,

Ref nvarchar(15) NOT NULL,

Seq intNOT NULL

)

;WITH Tally (number) AS (

SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

Insert into #GR(DocNum, GR, Seq)

SELECT b.number,'GR_'+CAST(a.number AS VARCHAR(10)),a.number

FROM Tally a

CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

--FROM master.dbo.spt_values

--WHERE number BETWEEN 1 AND 500 AND type='P'

;WITH Tally (number) AS (

SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

Insert into #GE(DocNum, GE, Seq)

SELECT b.number,'GE_'+CAST(a.number AS VARCHAR(10)),a.number

FROM Tally a

CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

--FROM master.dbo.spt_values

--WHERE number BETWEEN 1 AND 750 AND type='P'

;WITH Tally (number) AS (

SELECT TOP 4000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns a CROSS JOIN sys.all_columns b)

Insert into #Ref (DocNum, Ref, Seq)

SELECT b.number,'Ref_'+CAST(a.number AS VARCHAR(10)),a.number

FROM Tally a

CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

--FROM master.dbo.spt_values

--WHERE number BETWEEN 1 AND 1000 AND type='P';

GO

PRINT '========== Mark ==========================================='

DECLARE @DocNum nvarchar(15)

DECLARE @GR nvarchar(15)

DECLARE @GE nvarchar(15)

DECLARE @Ref nvarchar(15)

DECLARE @Seq1 int

DECLARE @Seq2 int

DECLARE @Seq3 int

SET STATISTICS TIME ON

;WITH AllSeq AS (

SELECT Seq FROM #GR

UNION

SELECT Seq FROM #GE

UNION

SELECT Seq FROM #Ref)

SELECT @DocNum = d.DocNum,

@GR = gr.GR,

@Seq1 = gr.Seq,

@GE = ge.GE,

@Seq2 = ge.Seq,

@Ref = rf.Ref,

@Seq3 = rf.Seq

FROM #Doc d

CROSS JOIN AllSeq sq

LEFT OUTER JOIN #GR gr ON gr.DocNum = d.DocNum

AND gr.Seq = sq.Seq

LEFT OUTER JOIN #GE ge ON ge.DocNum = d.DocNum

AND ge.Seq = sq.Seq

LEFT OUTER JOIN #Ref rf ON rf.DocNum = d.DocNum

AND rf.Seq = sq.Seq;

SET STATISTICS TIME OFF

GO

PRINT '========== GSquared ==========================================='

DECLARE @DocNum nvarchar(15)

DECLARE @GR nvarchar(15)

DECLARE @GE nvarchar(15)

DECLARE @Ref nvarchar(15)

DECLARE @Seq1 int

DECLARE @Seq2 int

DECLARE @Seq3 int

SET STATISTICS TIME ON

SELECT @DocNum = COALESCE(#GR.DocNum, #GE.DocNum, #Ref.DocNum),

@GR = GR,

@Seq1 = #GR.Seq,

@GE = GE,

@Seq2 = #GE.Seq,

@Ref = Ref,

@Seq3 = #Ref.Seq

FROM #GR

FULL OUTER JOIN #GE

ON #GR.DocNum = #GE.DocNum

AND #GR.Seq = #GE.Seq

FULL OUTER JOIN #Ref

ON #GE.DocNum = #Ref.DocNum

AND #GE.Seq = #Ref.Seq

OR #GR.DocNum = #Ref.DocNum

AND #GR.Seq = #Ref.Seq;

SET STATISTICS TIME OFF

GO

PRINT '========== Dwain ==========================================='

DECLARE @DocNum nvarchar(15)

DECLARE @GR nvarchar(15)

DECLARE @GE nvarchar(15)

DECLARE @Ref nvarchar(15)

DECLARE @Seq1 int

DECLARE @Seq2 int

DECLARE @Seq3 int

SET STATISTICS TIME ON

;WITH CTE AS (

SELECT n=1, DocNum, Ref, Seq

FROM #Ref UNION ALL

SELECT 2, DocNum, GE, Seq

FROM #GE UNION ALL

SELECT 3, DocNum, GR, Seq

FROM #GR)

SELECT @DocNum=a.DocNum

,@GR=MAX(CASE n WHEN 3 THEN Ref END)

,@Seq1=MAX(CASE n WHEN 3 THEN Seq END)

,@GE=MAX(CASE n WHEN 2 THEN Ref END)

,@Seq2=MAX(CASE n WHEN 2 THEN Seq END)

,@Ref=MAX(CASE n WHEN 1 THEN Ref END)

,@Seq3=MAX(CASE n WHEN 1 THEN Seq END )

FROM CTE a

INNER JOIN #Doc b ON a.DocNum = b.DocNum

GROUP BY a.DocNum, Ref

SET STATISTICS TIME OFF

DROP TABLE #Doc, #Ref, #GR, #GE

Results:

(10000 row(s) affected)

(15000 row(s) affected)

(20000 row(s) affected)

========== Mark ===========================================

SQL Server Execution Times:

CPU time = 93 ms, elapsed time = 112 ms.

========== GSquared ===========================================

SQL Server Execution Times:

CPU time = 177918 ms, elapsed time = 180257 ms.

========== Dwain ===========================================

SQL Server Execution Times:

CPU time = 156 ms, elapsed time = 313 ms.

Let me know if I did something wrong (always a possibility before I've had my morning coffee :-)).

Interestingly, when I bumped the doc count to 40, Mark's SQL seemed to begin to benefit from parallizing the query (note elapsed ms < CPU ms). So I retried with OPTION (MAXDOP 1) and got these results.

========== Mark ===========================================

SQL Server Execution Times:

CPU time = 2090 ms, elapsed time = 588 ms.

========== Mark w-MAXDOP 1===========================

SQL Server Execution Times:

CPU time = 936 ms, elapsed time = 925 ms.

========== Dwain ===========================================

SQL Server Execution Times:

CPU time = 2605 ms, elapsed time = 1182 ms.

I've always found it extremely interesting how, when SQL parallelizes a query the CPU time goes up quite significantly in relation to the non-parallelized version, without a proportional drop in elapsed time.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 11

dwain.c (8/29/2012)
Gus,
That's an interesting point on the IOs but in the below test harness (5 docs) it still runs pretty fast, although clearly not as fast as Mark's.
Create table #Doc (
DocNum nvarchar(15) NOT NULL
);
Insert into #Doc (DocNum)
Values ('1'),('2'),('3'),('4'),('5');
------------------------------
Create Table #GR (
DocNum nvarchar(15) NOT NULL,
GR nvarchar(15) NOT NULL,
Seq intNOT NULL
)
--------------------------------
Create Table #GE (
DocNum nvarchar(15) NOT NULL,
GE nvarchar(15) NOT NULL,
Seq intNOT NULL
)
--------------------------------------------------
Create Table #Ref (
DocNum nvarchar(15) NOT NULL,
Ref nvarchar(15) NOT NULL,
Seq intNOT NULL
)
;WITH Tally (number) AS (
SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
Insert into #GR(DocNum, GR, Seq)
SELECT b.number,'GR_'+CAST(a.number AS VARCHAR(10)),a.number
FROM Tally a
CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b
--FROM master.dbo.spt_values
--WHERE number BETWEEN 1 AND 500 AND type='P'
;WITH Tally (number) AS (
SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
Insert into #GE(DocNum, GE, Seq)
SELECT b.number,'GE_'+CAST(a.number AS VARCHAR(10)),a.number
FROM Tally a
CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b
--FROM master.dbo.spt_values
--WHERE number BETWEEN 1 AND 750 AND type='P'
;WITH Tally (number) AS (
SELECT TOP 4000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a CROSS JOIN sys.all_columns b)
Insert into #Ref (DocNum, Ref, Seq)
SELECT b.number,'Ref_'+CAST(a.number AS VARCHAR(10)),a.number
FROM Tally a
CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b
--FROM master.dbo.spt_values
--WHERE number BETWEEN 1 AND 1000 AND type='P';
GO
PRINT '========== Mark ==========================================='
DECLARE @DocNum nvarchar(15)
DECLARE @GR nvarchar(15)
DECLARE @GE nvarchar(15)
DECLARE @Ref nvarchar(15)
DECLARE @Seq1 int
DECLARE @Seq2 int
DECLARE @Seq3 int
SET STATISTICS TIME ON
;WITH AllSeq AS (
SELECT Seq FROM #GR
UNION
SELECT Seq FROM #GE
UNION
SELECT Seq FROM #Ref)
SELECT @DocNum = d.DocNum,
@GR = gr.GR,
@Seq1 = gr.Seq,
@GE = ge.GE,
@Seq2 = ge.Seq,
@Ref = rf.Ref,
@Seq3 = rf.Seq
FROM #Doc d
CROSS JOIN AllSeq sq
LEFT OUTER JOIN #GR gr ON gr.DocNum = d.DocNum
AND gr.Seq = sq.Seq
LEFT OUTER JOIN #GE ge ON ge.DocNum = d.DocNum
AND ge.Seq = sq.Seq
LEFT OUTER JOIN #Ref rf ON rf.DocNum = d.DocNum
AND rf.Seq = sq.Seq;
SET STATISTICS TIME OFF
GO
PRINT '========== GSquared ==========================================='
DECLARE @DocNum nvarchar(15)
DECLARE @GR nvarchar(15)
DECLARE @GE nvarchar(15)
DECLARE @Ref nvarchar(15)
DECLARE @Seq1 int
DECLARE @Seq2 int
DECLARE @Seq3 int
SET STATISTICS TIME ON
SELECT @DocNum = COALESCE(#GR.DocNum, #GE.DocNum, #Ref.DocNum),
@GR = GR,
@Seq1 = #GR.Seq,
@GE = GE,
@Seq2 = #GE.Seq,
@Ref = Ref,
@Seq3 = #Ref.Seq
FROM #GR
FULL OUTER JOIN #GE
ON #GR.DocNum = #GE.DocNum
AND #GR.Seq = #GE.Seq
FULL OUTER JOIN #Ref
ON #GE.DocNum = #Ref.DocNum
AND #GE.Seq = #Ref.Seq
OR #GR.DocNum = #Ref.DocNum
AND #GR.Seq = #Ref.Seq;
SET STATISTICS TIME OFF
GO
PRINT '========== Dwain ==========================================='
DECLARE @DocNum nvarchar(15)
DECLARE @GR nvarchar(15)
DECLARE @GE nvarchar(15)
DECLARE @Ref nvarchar(15)
DECLARE @Seq1 int
DECLARE @Seq2 int
DECLARE @Seq3 int
SET STATISTICS TIME ON
;WITH CTE AS (
SELECT n=1, DocNum, Ref, Seq
FROM #Ref UNION ALL
SELECT 2, DocNum, GE, Seq
FROM #GE UNION ALL
SELECT 3, DocNum, GR, Seq
FROM #GR)
SELECT @DocNum=a.DocNum
,@GR=MAX(CASE n WHEN 3 THEN Ref END)
,@Seq1=MAX(CASE n WHEN 3 THEN Seq END)
,@GE=MAX(CASE n WHEN 2 THEN Ref END)
,@Seq2=MAX(CASE n WHEN 2 THEN Seq END)
,@Ref=MAX(CASE n WHEN 1 THEN Ref END)
,@Seq3=MAX(CASE n WHEN 1 THEN Seq END )
FROM CTE a
INNER JOIN #Doc b ON a.DocNum = b.DocNum
GROUP BY a.DocNum, Ref
SET STATISTICS TIME OFF
DROP TABLE #Doc, #Ref, #GR, #GE
Results:
(10000 row(s) affected)
(15000 row(s) affected)
(20000 row(s) affected)
========== Mark ===========================================
SQL Server Execution Times:
CPU time = 93 ms, elapsed time = 112 ms.
========== GSquared ===========================================
SQL Server Execution Times:
CPU time = 177918 ms, elapsed time = 180257 ms.
========== Dwain ===========================================
SQL Server Execution Times:
CPU time = 156 ms, elapsed time = 313 ms.
Let me know if I did something wrong (always a possibility before I've had my morning coffee :-)).
Interestingly, when I bumped the doc count to 40, Mark's SQL seemed to begin to benefit from parallizing the query (note elapsed ms < CPU ms). So I retried with OPTION (MAXDOP 1) and got these results.
========== Mark ===========================================
SQL Server Execution Times:
CPU time = 2090 ms, elapsed time = 588 ms.
========== Mark w-MAXDOP 1===========================
SQL Server Execution Times:
CPU time = 936 ms, elapsed time = 925 ms.
========== Dwain ===========================================
SQL Server Execution Times:
CPU time = 2605 ms, elapsed time = 1182 ms.
I've always found it extremely interesting how, when SQL parallelizes a query the CPU time goes up quite significantly in relation to the non-parallelized version, without a proportional drop in elapsed time.

Oops, turns out my query doesn't quite work properly, here's a fixed version. Fortunately doesn't affect performance much.

WITH AllSeq AS (

SELECT DocNum,Seq FROM GR

UNION

SELECT DocNum,Seq FROM GE

UNION

SELECT DocNum,Seq FROM Ref)

SELECT sq.DocNum,

gr.GR,

gr.Seq AS GR_Seq,

ge.GE,

ge.Seq AS GE_Seq,

rf.Ref,

rf.Seq AS Ref_Seq

FROM AllSeq sq

LEFT OUTER JOIN GR gr ON gr.DocNum = sq.DocNum

AND gr.Seq = sq.Seq

LEFT OUTER JOIN GE ge ON ge.DocNum = sq.DocNum

AND ge.Seq = sq.Seq

LEFT OUTER JOIN Ref rf ON rf.DocNum = sq.DocNum

AND rf.Seq = sq.Seq;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

GSquared SSC Guru Points: 260824 More actions · Answer 12

Sorry for the delay in replying. Been recovering from surgery.

Try this:

Expand the number of documents beyond 5 up to some reasonable number, like 10,000 or 1-million. Add random GE, GR, and Ref entries for them. I'm not sure how the original requestor would want Docs entries without anything in any of the sub-tables, so leave that aspect alone for now.

Cluster all the tables reasonably. I used DocNum and Seq on the sub-tables, and added a PK (clustered) on DocNum in Docs.

Modify my Full Outer query as follows:

SELECT COALESCE(GR.DocNum, GE.DocNum, Ref.DocNum),

GR,

GR.Seq,

GE,

GE.Seq,

COALESCE(Ref.Ref, R2.Ref),

COALESCE(Ref.Seq, R2.Seq)

FROM dbo.GR

FULL OUTER JOIN dbo.GE

ON GR.DocNum = GE.DocNum

AND GR.Seq = GE.Seq

FULL OUTER JOIN dbo.Ref

ON GE.DocNum = Ref.DocNum

AND GE.Seq = Ref.Seq

FULL OUTER JOIN dbo.Ref AS R2

ON GR.DocNum = R2.DocNum

AND GR.Seq = R2.Seq

Test again. Include IO characteristics in your testing. Always. On loaded systems, CPU time is far less likely to be your main bottleneck, while I/O is far more likely to be key to the performance of the whole server. (Network latency and bandwidth can sometimes (often) trump I/O as a real bottleneck on overall performance, but that's not usually something you can do much about except reduce the size of datasets being shipped around.)

Also, assigning variable values isn't a valid test. Many times, SQL Server will bypass running the intended performance-testing query, and just grab one row and use that for the variable value. It knows it only needs one value, so it only grabs one. I've seen this happen in SQL 2000 and 2005. Haven't tested in 2008 or 2012, but I think it's a fair assumption they haven't made the engine dumber about that kind of thing. Per discussions on this site, others have proven the same thing.

If you want to test without returning a result-set, insert into temp tables. Best if you build the temp tables explicitly outside of the final query, instead of Select Into, so that DDL time doesn't add to the query execution artificially. Avoiding returning a dataset avoids skewing for network issues if you're querying remotely, and avoids rendering-time in the UI as a possible factor as well.

I recommend getting the time and I/O stats out of a server-side trace, rather than Set Stats On, because Set Stats On has been proven to have potential impacts on actual query performance. Doesn't appear to affect this particular set of tests, but it's a good practice nonetheless.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 13

GSquared (9/7/2012)
Sorry for the delay in replying. Been recovering from surgery.
...

Get well soon, GGeezer.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

Indeed Gus, the new version of your query seems to be fastest.

========== Mark ===========================================

(20000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Ref___000000000052'. Scan count 2, logical reads 184, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GE___000000000051'. Scan count 2, logical reads 130, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GR___000000000050'. Scan count 2, logical reads 86, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 171 ms, elapsed time = 438 ms.

========== GSquared ===========================================

(30000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GE___000000000051'. Scan count 1, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GR___000000000050'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Ref___000000000052'. Scan count 2, logical reads 184, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 93 ms, elapsed time = 457 ms.

========== Dwain ===========================================

Warning: Null value is eliminated by an aggregate or other SET operation.

(45000 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GR___000000000050'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#GE___000000000051'. Scan count 1, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Ref___000000000052'. Scan count 1, logical reads 92, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Doc___00000000004F'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 156 ms, elapsed time = 668 ms.

I am not seeing a large difference in IOs unless I am missing something. It is interesting that each of the 3 query is returning a different number of rows. Wonder which is right.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 15

GSquared (9/7/2012)
Sorry for the delay in replying. Been recovering from surgery.
Try this:
Expand the number of documents beyond 5 up to some reasonable number, like 10,000 or 1-million. Add random GE, GR, and Ref entries for them. I'm not sure how the original requestor would want Docs entries without anything in any of the sub-tables, so leave that aspect alone for now.
Cluster all the tables reasonably. I used DocNum and Seq on the sub-tables, and added a PK (clustered) on DocNum in Docs.
Modify my Full Outer query as follows:
SELECT COALESCE(GR.DocNum, GE.DocNum, Ref.DocNum),
GR,
GR.Seq,
GE,
GE.Seq,
COALESCE(Ref.Ref, R2.Ref),
COALESCE(Ref.Seq, R2.Seq)
FROM dbo.GR
FULL OUTER JOIN dbo.GE
ON GR.DocNum = GE.DocNum
AND GR.Seq = GE.Seq
FULL OUTER JOIN dbo.Ref
ON GE.DocNum = Ref.DocNum
AND GE.Seq = Ref.Seq
FULL OUTER JOIN dbo.Ref AS R2
ON GR.DocNum = R2.DocNum
AND GR.Seq = R2.Seq
Test again. Include IO characteristics in your testing. Always. On loaded systems, CPU time is far less likely to be your main bottleneck, while I/O is far more likely to be key to the performance of the whole server. (Network latency and bandwidth can sometimes (often) trump I/O as a real bottleneck on overall performance, but that's not usually something you can do much about except reduce the size of datasets being shipped around.)
Also, assigning variable values isn't a valid test. Many times, SQL Server will bypass running the intended performance-testing query, and just grab one row and use that for the variable value. It knows it only needs one value, so it only grabs one. I've seen this happen in SQL 2000 and 2005. Haven't tested in 2008 or 2012, but I think it's a fair assumption they haven't made the engine dumber about that kind of thing. Per discussions on this site, others have proven the same thing.
If you want to test without returning a result-set, insert into temp tables. Best if you build the temp tables explicitly outside of the final query, instead of Select Into, so that DDL time doesn't add to the query execution artificially. Avoiding returning a dataset avoids skewing for network issues if you're querying remotely, and avoids rendering-time in the UI as a possible factor as well.
I recommend getting the time and I/O stats out of a server-side trace, rather than Set Stats On, because Set Stats On has been proven to have potential impacts on actual query performance. Doesn't appear to affect this particular set of tests, but it's a good practice nonetheless.

Your query doesn't give correct results against the OPs original data - it returns 8 rows instead of 5.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537