Weird Requirement... Multiple Left Joins? Am I missing something?

  • GSquared (8/28/2012)


    CptCrusty1 (8/28/2012)


    Laurie,

    Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?

    Thanks

    Crusty

    That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.

    Just for the record, this example is limited to 9, but you can code the numbers table up to any number.

  • I'm going to remember you guys (and gals) and ChristmaHoniQuanzika..... :crying::-D

  • laurie-789651 (8/28/2012)


    GSquared (8/28/2012)


    CptCrusty1 (8/28/2012)


    Laurie,

    Thanks for your reply. The "Values" reserved word didn't fly in 2005. Is that a 2008+ term?

    Thanks

    Crusty

    That's what's called a "Table Values Function", and it's 2008+. It will also only work with data that's limited to no more than 9 values per set.

    Just for the record, this example is limited to 9, but you can code the numbers table up to any number.

    Yes, but it will still have a finite limit, and is an unnecessary limit on the query. This kind of situation is exactly what Full Outer Join is meant for, so working around it with hard-coded things like that is unnecessary.

    The numbers table version you came up with isn't a bad solution, it's just unnecessarily complex for the desired end result. That's all. No big deal. And a hard-coded numbers table with a few thousand rows would be backwards compatible, and work for any conceivable data complexity needed by this code. So it's limits aren't that big a deal. Just an Occam fan here.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • CptCrusty1 (8/28/2012)


    @Laurie, Mark, GSquared. I am humbled.

    I was starting to lean towards Cross Join; however, I'd never had an opportunity to use it before. Seems like it behaves a bit like a pivot table??

    GSQuared, dito, never used Coalesce before. Guess I need to hit the books and learn these two techniques. The results are exactly what I needed.

    I will try all three results for performance against a test version with about 100k records....

    Thanks all.. .I really appreciate your help.

    Sincerely.

    Crusty

    Cross Join just joins every row in one table (or dataset) to every row in another table (or dataset). Produces what's called a "Cartesian Product".

    Coalesce is just IsNull's big brother. All it does is pick the first non-null value in a list. If they're all null, it returns null, otherwise, it gets the first one. Since, in an outer join (Full, Left, or Right), one or more of the columns may be null, I used that to make sure it would get something in that column.

    The key to my version is Full Outer Join. What that does is get all rows from both sides of the join, whether they have a matching row in the other side or not.

    So, a Full Outer from GR to GE gets all rows in GR, even if GE doesn't have a matching row. And it gets all rows from GE, even if GR doesn't have a matching row. So if GR had 5 and GE 2, it would still get all rows in both. Then it does the same thing with Ref, getting all rows, even if they don't have a match in the other two. If anything does match, it puts them in the same row, but if it doesn't match, it still pulls it.

    This kind of problem is exactly what Full Outer Join is designed to handle. It's one of those features of SQL that seems to be less well-known. But, issues with how Joins work is one of the things that gives new database devs major headaches, so it's not too surprising.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GS, I'm going to move this out of the Forum... I could stand to chat with you... LOL... I'll PM you.

  • LEFT JOINs, CROSS JOINs, FULL OUTER JOINs? What's up with that?

    Why not something a bit simpler like this?

    ;WITH CTE AS (

    SELECT n=1, DocNum, Ref, Seq

    FROM Ref UNION ALL

    SELECT 2, DocNum, GE, Seq

    FROM GE UNION ALL

    SELECT 3, DocNum, GR, Seq

    FROM GR)

    SELECT a.DocNum

    ,GR=MAX(CASE n WHEN 3 THEN Ref END)

    ,GR_Seq=MAX(CASE n WHEN 3 THEN Seq END)

    ,GE=MAX(CASE n WHEN 2 THEN Ref END)

    ,GE_Seq=MAX(CASE n WHEN 2 THEN Seq END)

    ,Ref=MAX(CASE n WHEN 1 THEN Ref END)

    ,Ref_Seq=MAX(CASE n WHEN 1 THEN Seq END )

    FROM CTE a

    INNER JOIN Doc b ON a.DocNum = b.DocNum

    GROUP BY a.DocNum, Ref


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (8/28/2012)


    LEFT JOINs, CROSS JOINs, FULL OUTER JOINs? What's up with that?

    Why not something a bit simpler like this?

    ;WITH CTE AS (

    SELECT n=1, DocNum, Ref, Seq

    FROM Ref UNION ALL

    SELECT 2, DocNum, GE, Seq

    FROM GE UNION ALL

    SELECT 3, DocNum, GR, Seq

    FROM GR)

    SELECT a.DocNum

    ,GR=MAX(CASE n WHEN 3 THEN Ref END)

    ,GR_Seq=MAX(CASE n WHEN 3 THEN Seq END)

    ,GE=MAX(CASE n WHEN 2 THEN Ref END)

    ,GE_Seq=MAX(CASE n WHEN 2 THEN Seq END)

    ,Ref=MAX(CASE n WHEN 1 THEN Ref END)

    ,Ref_Seq=MAX(CASE n WHEN 1 THEN Seq END )

    FROM CTE a

    INNER JOIN Doc b ON a.DocNum = b.DocNum

    GROUP BY a.DocNum, Ref

    Because that solution will cause an I/O blow-up directly proportional to the number of entries in the Doc table.

    I added docs 2-5, each with 1 entry in the GE table, to the provided sample data.

    With 1 entry, the I/O on your solution looks like this:

    Table 'Doc'. Scan count 1, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'GR'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'GE'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Ref'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Very nice.

    With 5 entries, it looks like this:

    Table 'GR'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'GE'. Scan count 5, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Ref'. Scan count 5, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Doc'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    See how the scans increase directly as you increase the rows in Doc?

    With 1 entry or with 5 entries, the Full Outer Join solution looks like this:

    Table 'Worktable'. Scan count 2, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Ref'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'GR'. Scan count 2, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'GE'. Scan count 2, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    2 scans per table, regardless of number of rows. Logical reads can get out of hand, but both do that, and scan count is usually a bigger deal in terms of overall performance.

    Try it with a few thousand or a million rows. The I/O stats will be a serious bottleneck on yours, and won't be a big deal at all on the Full Outer version.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Amen, GS.... I have to scale this up to about 5-6 tables with an est. million + rows each.... The results are going to be pumped to a website and I don't think a paying customer wants to wait that long....

    Thanks for the try Dwain... everyone is being really cool about this...

    Crusty.

  • Here's a way to generate some simple timing tests.

    TRUNCATE TABLE GR;

    TRUNCATE TABLE GE;

    TRUNCATE TABLE Ref;

    Insert into GR(DocNum, GR, Seq)

    SELECT 1,'GR_'+CAST(number AS VARCHAR(10)),number

    FROM master.dbo.spt_values

    WHERE number BETWEEN 1 AND 500 AND type='P'

    Insert into GE(DocNum, GE, Seq)

    SELECT 1,'GE_'+CAST(number AS VARCHAR(10)),number

    FROM master.dbo.spt_values

    WHERE number BETWEEN 1 AND 750 AND type='P'

    Insert into Ref (DocNum, Ref, Seq)

    SELECT 1,'Ref_'+CAST(number AS VARCHAR(10)),number

    FROM master.dbo.spt_values

    WHERE number BETWEEN 1 AND 1000 AND type='P';

    GO

    PRINT '========== Mark ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    ;WITH AllSeq AS (

    SELECT Seq FROM GR

    UNION

    SELECT Seq FROM GE

    UNION

    SELECT Seq FROM Ref)

    SELECT @DocNum = d.DocNum,

    @GR = gr.GR,

    @Seq1 = gr.Seq,

    @GE = ge.GE,

    @Seq2 = ge.Seq,

    @Ref = rf.Ref,

    @Seq3 = rf.Seq

    FROM Doc d

    CROSS JOIN AllSeq sq

    LEFT OUTER JOIN GR gr ON gr.DocNum = d.DocNum

    AND gr.Seq = sq.Seq

    LEFT OUTER JOIN GE ge ON ge.DocNum = d.DocNum

    AND ge.Seq = sq.Seq

    LEFT OUTER JOIN Ref rf ON rf.DocNum = d.DocNum

    AND rf.Seq = sq.Seq;

    SET STATISTICS TIME OFF

    GO

    PRINT '========== GSquared ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    SELECT @DocNum = COALESCE(dbo.GR.DocNum, dbo.GE.DocNum, dbo.Ref.DocNum),

    @GR = GR,

    @Seq1 = dbo.GR.Seq,

    @GE = GE,

    @Seq2 = dbo.GE.Seq,

    @Ref = Ref,

    @Seq3 = dbo.Ref.Seq

    FROM dbo.GR

    FULL OUTER JOIN dbo.GE

    ON dbo.GR.DocNum = dbo.GE.DocNum

    AND dbo.GR.Seq = dbo.GE.Seq

    FULL OUTER JOIN dbo.Ref

    ON dbo.GE.DocNum = dbo.Ref.DocNum

    AND dbo.GE.Seq = dbo.Ref.Seq

    OR dbo.GR.DocNum = dbo.Ref.DocNum

    AND dbo.GR.Seq = dbo.Ref.Seq;

    SET STATISTICS TIME OFF

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Gus,

    That's an interesting point on the IOs but in the below test harness (5 docs) it still runs pretty fast, although clearly not as fast as Mark's.

    Create table #Doc (

    DocNum nvarchar(15) NOT NULL

    );

    Insert into #Doc (DocNum)

    Values ('1'),('2'),('3'),('4'),('5');

    ------------------------------

    Create Table #GR (

    DocNum nvarchar(15) NOT NULL,

    GR nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    --------------------------------

    Create Table #GE (

    DocNum nvarchar(15) NOT NULL,

    GE nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    --------------------------------------------------

    Create Table #Ref (

    DocNum nvarchar(15) NOT NULL,

    Ref nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    ;WITH Tally (number) AS (

    SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #GR(DocNum, GR, Seq)

    SELECT b.number,'GR_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 500 AND type='P'

    ;WITH Tally (number) AS (

    SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #GE(DocNum, GE, Seq)

    SELECT b.number,'GE_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 750 AND type='P'

    ;WITH Tally (number) AS (

    SELECT TOP 4000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #Ref (DocNum, Ref, Seq)

    SELECT b.number,'Ref_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 1000 AND type='P';

    GO

    PRINT '========== Mark ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    ;WITH AllSeq AS (

    SELECT Seq FROM #GR

    UNION

    SELECT Seq FROM #GE

    UNION

    SELECT Seq FROM #Ref)

    SELECT @DocNum = d.DocNum,

    @GR = gr.GR,

    @Seq1 = gr.Seq,

    @GE = ge.GE,

    @Seq2 = ge.Seq,

    @Ref = rf.Ref,

    @Seq3 = rf.Seq

    FROM #Doc d

    CROSS JOIN AllSeq sq

    LEFT OUTER JOIN #GR gr ON gr.DocNum = d.DocNum

    AND gr.Seq = sq.Seq

    LEFT OUTER JOIN #GE ge ON ge.DocNum = d.DocNum

    AND ge.Seq = sq.Seq

    LEFT OUTER JOIN #Ref rf ON rf.DocNum = d.DocNum

    AND rf.Seq = sq.Seq;

    SET STATISTICS TIME OFF

    GO

    PRINT '========== GSquared ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    SELECT @DocNum = COALESCE(#GR.DocNum, #GE.DocNum, #Ref.DocNum),

    @GR = GR,

    @Seq1 = #GR.Seq,

    @GE = GE,

    @Seq2 = #GE.Seq,

    @Ref = Ref,

    @Seq3 = #Ref.Seq

    FROM #GR

    FULL OUTER JOIN #GE

    ON #GR.DocNum = #GE.DocNum

    AND #GR.Seq = #GE.Seq

    FULL OUTER JOIN #Ref

    ON #GE.DocNum = #Ref.DocNum

    AND #GE.Seq = #Ref.Seq

    OR #GR.DocNum = #Ref.DocNum

    AND #GR.Seq = #Ref.Seq;

    SET STATISTICS TIME OFF

    GO

    PRINT '========== Dwain ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    ;WITH CTE AS (

    SELECT n=1, DocNum, Ref, Seq

    FROM #Ref UNION ALL

    SELECT 2, DocNum, GE, Seq

    FROM #GE UNION ALL

    SELECT 3, DocNum, GR, Seq

    FROM #GR)

    SELECT @DocNum=a.DocNum

    ,@GR=MAX(CASE n WHEN 3 THEN Ref END)

    ,@Seq1=MAX(CASE n WHEN 3 THEN Seq END)

    ,@GE=MAX(CASE n WHEN 2 THEN Ref END)

    ,@Seq2=MAX(CASE n WHEN 2 THEN Seq END)

    ,@Ref=MAX(CASE n WHEN 1 THEN Ref END)

    ,@Seq3=MAX(CASE n WHEN 1 THEN Seq END )

    FROM CTE a

    INNER JOIN #Doc b ON a.DocNum = b.DocNum

    GROUP BY a.DocNum, Ref

    SET STATISTICS TIME OFF

    DROP TABLE #Doc, #Ref, #GR, #GE

    Results:

    (10000 row(s) affected)

    (15000 row(s) affected)

    (20000 row(s) affected)

    ========== Mark ===========================================

    SQL Server Execution Times:

    CPU time = 93 ms, elapsed time = 112 ms.

    ========== GSquared ===========================================

    SQL Server Execution Times:

    CPU time = 177918 ms, elapsed time = 180257 ms.

    ========== Dwain ===========================================

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 313 ms.

    Let me know if I did something wrong (always a possibility before I've had my morning coffee :-)).

    Interestingly, when I bumped the doc count to 40, Mark's SQL seemed to begin to benefit from parallizing the query (note elapsed ms < CPU ms). So I retried with OPTION (MAXDOP 1) and got these results.

    ========== Mark ===========================================

    SQL Server Execution Times:

    CPU time = 2090 ms, elapsed time = 588 ms.

    ========== Mark w-MAXDOP 1===========================

    SQL Server Execution Times:

    CPU time = 936 ms, elapsed time = 925 ms.

    ========== Dwain ===========================================

    SQL Server Execution Times:

    CPU time = 2605 ms, elapsed time = 1182 ms.

    I've always found it extremely interesting how, when SQL parallelizes a query the CPU time goes up quite significantly in relation to the non-parallelized version, without a proportional drop in elapsed time.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (8/29/2012)


    Gus,

    That's an interesting point on the IOs but in the below test harness (5 docs) it still runs pretty fast, although clearly not as fast as Mark's.

    Create table #Doc (

    DocNum nvarchar(15) NOT NULL

    );

    Insert into #Doc (DocNum)

    Values ('1'),('2'),('3'),('4'),('5');

    ------------------------------

    Create Table #GR (

    DocNum nvarchar(15) NOT NULL,

    GR nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    --------------------------------

    Create Table #GE (

    DocNum nvarchar(15) NOT NULL,

    GE nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    --------------------------------------------------

    Create Table #Ref (

    DocNum nvarchar(15) NOT NULL,

    Ref nvarchar(15) NOT NULL,

    Seq intNOT NULL

    )

    ;WITH Tally (number) AS (

    SELECT TOP 2000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #GR(DocNum, GR, Seq)

    SELECT b.number,'GR_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 500 AND type='P'

    ;WITH Tally (number) AS (

    SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #GE(DocNum, GE, Seq)

    SELECT b.number,'GE_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 750 AND type='P'

    ;WITH Tally (number) AS (

    SELECT TOP 4000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b)

    Insert into #Ref (DocNum, Ref, Seq)

    SELECT b.number,'Ref_'+CAST(a.number AS VARCHAR(10)),a.number

    FROM Tally a

    CROSS APPLY (SELECT number FROM Tally WHERE number BETWEEN 1 AND 5) b

    --FROM master.dbo.spt_values

    --WHERE number BETWEEN 1 AND 1000 AND type='P';

    GO

    PRINT '========== Mark ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    ;WITH AllSeq AS (

    SELECT Seq FROM #GR

    UNION

    SELECT Seq FROM #GE

    UNION

    SELECT Seq FROM #Ref)

    SELECT @DocNum = d.DocNum,

    @GR = gr.GR,

    @Seq1 = gr.Seq,

    @GE = ge.GE,

    @Seq2 = ge.Seq,

    @Ref = rf.Ref,

    @Seq3 = rf.Seq

    FROM #Doc d

    CROSS JOIN AllSeq sq

    LEFT OUTER JOIN #GR gr ON gr.DocNum = d.DocNum

    AND gr.Seq = sq.Seq

    LEFT OUTER JOIN #GE ge ON ge.DocNum = d.DocNum

    AND ge.Seq = sq.Seq

    LEFT OUTER JOIN #Ref rf ON rf.DocNum = d.DocNum

    AND rf.Seq = sq.Seq;

    SET STATISTICS TIME OFF

    GO

    PRINT '========== GSquared ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    SELECT @DocNum = COALESCE(#GR.DocNum, #GE.DocNum, #Ref.DocNum),

    @GR = GR,

    @Seq1 = #GR.Seq,

    @GE = GE,

    @Seq2 = #GE.Seq,

    @Ref = Ref,

    @Seq3 = #Ref.Seq

    FROM #GR

    FULL OUTER JOIN #GE

    ON #GR.DocNum = #GE.DocNum

    AND #GR.Seq = #GE.Seq

    FULL OUTER JOIN #Ref

    ON #GE.DocNum = #Ref.DocNum

    AND #GE.Seq = #Ref.Seq

    OR #GR.DocNum = #Ref.DocNum

    AND #GR.Seq = #Ref.Seq;

    SET STATISTICS TIME OFF

    GO

    PRINT '========== Dwain ==========================================='

    DECLARE @DocNum nvarchar(15)

    DECLARE @GR nvarchar(15)

    DECLARE @GE nvarchar(15)

    DECLARE @Ref nvarchar(15)

    DECLARE @Seq1 int

    DECLARE @Seq2 int

    DECLARE @Seq3 int

    SET STATISTICS TIME ON

    ;WITH CTE AS (

    SELECT n=1, DocNum, Ref, Seq

    FROM #Ref UNION ALL

    SELECT 2, DocNum, GE, Seq

    FROM #GE UNION ALL

    SELECT 3, DocNum, GR, Seq

    FROM #GR)

    SELECT @DocNum=a.DocNum

    ,@GR=MAX(CASE n WHEN 3 THEN Ref END)

    ,@Seq1=MAX(CASE n WHEN 3 THEN Seq END)

    ,@GE=MAX(CASE n WHEN 2 THEN Ref END)

    ,@Seq2=MAX(CASE n WHEN 2 THEN Seq END)

    ,@Ref=MAX(CASE n WHEN 1 THEN Ref END)

    ,@Seq3=MAX(CASE n WHEN 1 THEN Seq END )

    FROM CTE a

    INNER JOIN #Doc b ON a.DocNum = b.DocNum

    GROUP BY a.DocNum, Ref

    SET STATISTICS TIME OFF

    DROP TABLE #Doc, #Ref, #GR, #GE

    Results:

    (10000 row(s) affected)

    (15000 row(s) affected)

    (20000 row(s) affected)

    ========== Mark ===========================================

    SQL Server Execution Times:

    CPU time = 93 ms, elapsed time = 112 ms.

    ========== GSquared ===========================================

    SQL Server Execution Times:

    CPU time = 177918 ms, elapsed time = 180257 ms.

    ========== Dwain ===========================================

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 313 ms.

    Let me know if I did something wrong (always a possibility before I've had my morning coffee :-)).

    Interestingly, when I bumped the doc count to 40, Mark's SQL seemed to begin to benefit from parallizing the query (note elapsed ms < CPU ms). So I retried with OPTION (MAXDOP 1) and got these results.

    ========== Mark ===========================================

    SQL Server Execution Times:

    CPU time = 2090 ms, elapsed time = 588 ms.

    ========== Mark w-MAXDOP 1===========================

    SQL Server Execution Times:

    CPU time = 936 ms, elapsed time = 925 ms.

    ========== Dwain ===========================================

    SQL Server Execution Times:

    CPU time = 2605 ms, elapsed time = 1182 ms.

    I've always found it extremely interesting how, when SQL parallelizes a query the CPU time goes up quite significantly in relation to the non-parallelized version, without a proportional drop in elapsed time.

    Oops, turns out my query doesn't quite work properly, here's a fixed version. Fortunately doesn't affect performance much.

    WITH AllSeq AS (

    SELECT DocNum,Seq FROM GR

    UNION

    SELECT DocNum,Seq FROM GE

    UNION

    SELECT DocNum,Seq FROM Ref)

    SELECT sq.DocNum,

    gr.GR,

    gr.Seq AS GR_Seq,

    ge.GE,

    ge.Seq AS GE_Seq,

    rf.Ref,

    rf.Seq AS Ref_Seq

    FROM AllSeq sq

    LEFT OUTER JOIN GR gr ON gr.DocNum = sq.DocNum

    AND gr.Seq = sq.Seq

    LEFT OUTER JOIN GE ge ON ge.DocNum = sq.DocNum

    AND ge.Seq = sq.Seq

    LEFT OUTER JOIN Ref rf ON rf.DocNum = sq.DocNum

    AND rf.Seq = sq.Seq;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Sorry for the delay in replying. Been recovering from surgery.

    Try this:

    Expand the number of documents beyond 5 up to some reasonable number, like 10,000 or 1-million. Add random GE, GR, and Ref entries for them. I'm not sure how the original requestor would want Docs entries without anything in any of the sub-tables, so leave that aspect alone for now.

    Cluster all the tables reasonably. I used DocNum and Seq on the sub-tables, and added a PK (clustered) on DocNum in Docs.

    Modify my Full Outer query as follows:

    SELECT COALESCE(GR.DocNum, GE.DocNum, Ref.DocNum),

    GR,

    GR.Seq,

    GE,

    GE.Seq,

    COALESCE(Ref.Ref, R2.Ref),

    COALESCE(Ref.Seq, R2.Seq)

    FROM dbo.GR

    FULL OUTER JOIN dbo.GE

    ON GR.DocNum = GE.DocNum

    AND GR.Seq = GE.Seq

    FULL OUTER JOIN dbo.Ref

    ON GE.DocNum = Ref.DocNum

    AND GE.Seq = Ref.Seq

    FULL OUTER JOIN dbo.Ref AS R2

    ON GR.DocNum = R2.DocNum

    AND GR.Seq = R2.Seq

    Test again. Include IO characteristics in your testing. Always. On loaded systems, CPU time is far less likely to be your main bottleneck, while I/O is far more likely to be key to the performance of the whole server. (Network latency and bandwidth can sometimes (often) trump I/O as a real bottleneck on overall performance, but that's not usually something you can do much about except reduce the size of datasets being shipped around.)

    Also, assigning variable values isn't a valid test. Many times, SQL Server will bypass running the intended performance-testing query, and just grab one row and use that for the variable value. It knows it only needs one value, so it only grabs one. I've seen this happen in SQL 2000 and 2005. Haven't tested in 2008 or 2012, but I think it's a fair assumption they haven't made the engine dumber about that kind of thing. Per discussions on this site, others have proven the same thing.

    If you want to test without returning a result-set, insert into temp tables. Best if you build the temp tables explicitly outside of the final query, instead of Select Into, so that DDL time doesn't add to the query execution artificially. Avoiding returning a dataset avoids skewing for network issues if you're querying remotely, and avoids rendering-time in the UI as a possible factor as well.

    I recommend getting the time and I/O stats out of a server-side trace, rather than Set Stats On, because Set Stats On has been proven to have potential impacts on actual query performance. Doesn't appear to affect this particular set of tests, but it's a good practice nonetheless.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (9/7/2012)


    Sorry for the delay in replying. Been recovering from surgery.

    ...

    Get well soon, GGeezer.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Indeed Gus, the new version of your query seems to be fastest.

    ========== Mark ===========================================

    (20000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Ref___000000000052'. Scan count 2, logical reads 184, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GE___000000000051'. Scan count 2, logical reads 130, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GR___000000000050'. Scan count 2, logical reads 86, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 171 ms, elapsed time = 438 ms.

    ========== GSquared ===========================================

    (30000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GE___000000000051'. Scan count 1, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GR___000000000050'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Ref___000000000052'. Scan count 2, logical reads 184, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 93 ms, elapsed time = 457 ms.

    ========== Dwain ===========================================

    Warning: Null value is eliminated by an aggregate or other SET operation.

    (45000 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GR___000000000050'. Scan count 1, logical reads 43, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#GE___000000000051'. Scan count 1, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Ref___000000000052'. Scan count 1, logical reads 92, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#Doc___00000000004F'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 156 ms, elapsed time = 668 ms.

    I am not seeing a large difference in IOs unless I am missing something. It is interesting that each of the 3 query is returning a different number of rows. Wonder which is right.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • GSquared (9/7/2012)


    Sorry for the delay in replying. Been recovering from surgery.

    Try this:

    Expand the number of documents beyond 5 up to some reasonable number, like 10,000 or 1-million. Add random GE, GR, and Ref entries for them. I'm not sure how the original requestor would want Docs entries without anything in any of the sub-tables, so leave that aspect alone for now.

    Cluster all the tables reasonably. I used DocNum and Seq on the sub-tables, and added a PK (clustered) on DocNum in Docs.

    Modify my Full Outer query as follows:

    SELECT COALESCE(GR.DocNum, GE.DocNum, Ref.DocNum),

    GR,

    GR.Seq,

    GE,

    GE.Seq,

    COALESCE(Ref.Ref, R2.Ref),

    COALESCE(Ref.Seq, R2.Seq)

    FROM dbo.GR

    FULL OUTER JOIN dbo.GE

    ON GR.DocNum = GE.DocNum

    AND GR.Seq = GE.Seq

    FULL OUTER JOIN dbo.Ref

    ON GE.DocNum = Ref.DocNum

    AND GE.Seq = Ref.Seq

    FULL OUTER JOIN dbo.Ref AS R2

    ON GR.DocNum = R2.DocNum

    AND GR.Seq = R2.Seq

    Test again. Include IO characteristics in your testing. Always. On loaded systems, CPU time is far less likely to be your main bottleneck, while I/O is far more likely to be key to the performance of the whole server. (Network latency and bandwidth can sometimes (often) trump I/O as a real bottleneck on overall performance, but that's not usually something you can do much about except reduce the size of datasets being shipped around.)

    Also, assigning variable values isn't a valid test. Many times, SQL Server will bypass running the intended performance-testing query, and just grab one row and use that for the variable value. It knows it only needs one value, so it only grabs one. I've seen this happen in SQL 2000 and 2005. Haven't tested in 2008 or 2012, but I think it's a fair assumption they haven't made the engine dumber about that kind of thing. Per discussions on this site, others have proven the same thing.

    If you want to test without returning a result-set, insert into temp tables. Best if you build the temp tables explicitly outside of the final query, instead of Select Into, so that DDL time doesn't add to the query execution artificially. Avoiding returning a dataset avoids skewing for network issues if you're querying remotely, and avoids rendering-time in the UI as a possible factor as well.

    I recommend getting the time and I/O stats out of a server-side trace, rather than Set Stats On, because Set Stats On has been proven to have potential impacts on actual query performance. Doesn't appear to affect this particular set of tests, but it's a good practice nonetheless.

    Your query doesn't give correct results against the OPs original data - it returns 8 rows instead of 5.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply