Concatenating Rows

  • Jeff. Once again, greet work on the performance testing.

    I did a wee compare last night but do not have the results due to a bit of a power cut.

    Once character replacement was added in to non typed xml then it did run about the same, maybe very slightly slower and I am by no means certain I included all the possible replacements.

    I'll certainly be going with typed xml for my concatenations.

    Using a CLR is absolutely great for readability of script and not having to rememeber the STUF((())()()())XML()())() ness of everything but it lacks the ability to use a custom separator unless you then wrap it in an additional replace anyway.

    Besides, I have auto replacements so I just type |st and it drops in a nice stuffed for xml templated.

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • Jon, when I looked at the initial article is said no cursors...

    What you don't know won't hurt you but what you know will make you plan to know better
  • We can also concatenete rows using coalesce function, I have given example for that,

    Declare @Name as varchar(Max);

    with fruit as (

    select 'Apple' as name, 101 as id union all

    select 'Banana' as name, 102 as id union all

    select 'Orange' as name, 103 as id union all

    select 'Melon' as name, 104 as id union all

    select 'Grape' as name, 105 as id

    )

    select @Name = coalesce(@Name + ',','') + name From Fruit

    select @Name

    The Out put Will be..

    ---------------------

    Apple,Banana,Orange,Melon,Grape

  • Sigh, when will people learn to read before posting.

    You are NOT using coalesce to concatenate.

    in the example above, coalesce is used to remove NULL entries.

    The below works just the same using that example:

    DECLARE @Name AS varchar(Max);

    SELECT @Name = ''

    ;WITH fruit AS (

    SELECT 'Apple' AS name, 101 AS id UNION all

    SELECT 'Banana' AS name, 102 AS id UNION all

    SELECT 'Orange' AS name, 103 AS id UNION all

    SELECT 'Melon' AS name, 104 AS id UNION all

    SELECT 'Grape' AS name, 105 AS id

    )

    SELECT @Name = @Name + ',' + name FROM Fruit

    SELECT @Name

    Also, as mentioned many times consider this:

    DECLARE @Name AS varchar(Max);

    SELECT @Name = ''

    ;WITH fruit AS (

    SELECT 'Apple' AS name, 101 AS id UNION all

    SELECT 'Banana' AS name, 101 AS id UNION all

    SELECT 'Orange' AS name, 102 AS id UNION all

    SELECT 'Melon' AS name, 1042 AS id UNION all

    SELECT 'Grape' AS name, 103 AS id

    )

    How will you get the results:

    ID Names

    101 Apple,Banana

    102 Orange,melon

    103 Grape

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • Andy DBA (10/14/2009)


    Jeff Moden


    Also, I noticed the varchar(5) type you used in your " XML TYPE solution" truncated the results. But, when I tested with varchar(max) or varchar(2000) I did not notice any change in performance.

    Heh... yeah... I hit the ball out of the park and broke the mayor's window. 😛 I noticed the problem last night and changed it to a VARCHAR(8000). Thanks for the kudo and the catch, Andy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rob Fisk (10/15/2009)


    You are NOT using coalesce to concatenate.

    in the example above, coalesce is used to remove NULL entries.

    The below works just the same using that example:

    DECLARE @Name AS varchar(Max);

    SELECT @Name = ''

    ;WITH fruit AS (

    SELECT 'Apple' AS name, 101 AS id UNION all

    SELECT 'Banana' AS name, 102 AS id UNION all

    SELECT 'Orange' AS name, 103 AS id UNION all

    SELECT 'Melon' AS name, 104 AS id UNION all

    SELECT 'Grape' AS name, 105 AS id

    )

    SELECT @Name = @Name + ',' + name FROM Fruit

    SELECT @Name

    Oh, be careful...the code above will leave a trailing comma. The COALESCE (or ISNULL) not only removes null entries, its position in the concatenation formula prevents having a trailing comma.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rob Fisk (10/15/2009)


    Jeff. Once again, greet work on the performance testing.

    I did a wee compare last night but do not have the results due to a bit of a power cut.

    Once character replacement was added in to non typed xml then it did run about the same, maybe very slightly slower and I am by no means certain I included all the possible replacements.

    I'll certainly be going with typed xml for my concatenations.

    Using a CLR is absolutely great for readability of script and not having to rememeber the STUF((())()()())XML()())() ness of everything but it lacks the ability to use a custom separator unless you then wrap it in an additional replace anyway.

    Besides, I have auto replacements so I just type |st and it drops in a nice stuffed for xml templated.

    Thanks for the feedback. I did make a small error that caused some truncation in the XML method that uses TYPE and have repaired it in the code I previously posted.

    As for CLR's... yeap... great for readability... I do have to say again that the SQL Ninjas on this forum can normally beat or come very close to meeting the performance of a CLR. Since most of them (and most SQL Developers in the real world) have a library of T-SQL code for tricks like concatenation, remembering things like STUFF isn't usually a problem.

    I'm still looking for someone to do a CLR performance test on concatenation like this against the code I posted so we can have a complete set of tests on a single machine. It can't be me that does the tests because of three things... 1) it would be like the mouse guarding the cheese because I'm mostly against the use of CLRs except for some very special cases and 2) I can't write a lick of C and 3) I just can't bring myself to turn on CLR's. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For those interested in the performance impact of each concatenation method, I blogged about this some time ago. The XML method consistently does better for reads and IO. I did not use the TYPE directive in my XML test, so I would anticipate a bit more overhead, for the XML subquery solution, but it still should remain the best performing.

    Link:

    http://jahaines.blogspot.com/2009/07/concatenating-column-values-part-2.html

    Note: I do not have a CLR concatenation solution. I am like Jeff and cannot bring myself to code a concatenation solution via CLR.

  • Oh, be careful...the code above will leave a trailing comma. The COALESCE (or ISNULL) not only removes null entries, its position in the concatenation formula prevents having a trailing comma.

    Oops, but it does indicate that it is not coalesce doing the concatenation.

    I just went to try performance testing your script Jeff and I ticked the "discard results after execution" option to eliminate the time taken to write the results since there was no, write to temp table element in the script.

    I got nothing in messages.

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • When you discard query results you dont get any information back to the client (SSMS).

  • Below is simple sql that does that

    declare @res varchar(Max)

    select @res = coalesce(@res + ',', '') + [name]

    from

    (

    select 'Apple' as name, 101 as id union all

    select 'Banana' as name, 102 as id union all

    select 'Orange' as name, 103 as id union all

    select 'Melon' as name, 104 as id union all

    select 'Grape' as name, 105 as id

    ) a

    select @res

  • OK, so the code is about the same but I also inserted the data to a table which was cleared between statistics time toggling:

    --===== Do the testing in a nice "safe" place

    USE TempDB

    GO

    --===== Setup the display environment for the "Messages" Tab

    SET NOCOUNT OFF --Make sure we can see the rowcounts

    SET STATISTICS TIME OFF --Turned ON and OFF later

    GO

    --===========================================================================================================

    --===== Create and populate a 1,000,000 row test table. This is my standard "million row test table".

    --===========================================================================================================

    -- Column "RowNum" has a range of 1 to 100,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

    -- for all rows.

    -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

    -- Jeff Moden

    PRINT '========== Building the test table and indexes =========='

    GO

    SELECT TOP 100000

    SomeID = IDENTITY(INT,1,1),

    SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

    SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeHex12 = RIGHT(NEWID(),12)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN

    Master.dbo.SysColumns t2

    GO

    --===== Add a Clustered PK

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (SomeID)

    GO

    --===== Add an index to support the concatenation

    CREATE INDEX IX_JBMTest_SomeInt ON dbo.JBMTest (SomeInt,SomeLetters2)

    GO

    --===========================================================================================================

    --===== Create the functions to be tested. Again, make sure you're in TempDB

    --===========================================================================================================

    PRINT '========== Building the functions to test =========='

    --===== Ensure we're still using TempDB

    USE TempDB

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(8000)

    CREATE FUNCTION dbo.Concat8KTest

    (@SomeInt INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(8000)

    SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(8000) with Coalesce

    CREATE FUNCTION dbo.Concat8KTestCoalesce

    (@SomeInt INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(8000)

    SELECT @ReturnString = COALESCE(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(MAX)

    CREATE FUNCTION dbo.ConcatMAXTest

    (@SomeInt INT)

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(MAX)

    SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    ----------------------------------------------------------------------------------

    --===== Create a function that uses VARCHAR(MAX) with Coalesce

    CREATE FUNCTION dbo.ConcatMAXTestCoalesce

    (@SomeInt INT)

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @ReturnString VARCHAR(MAX)

    SELECT @ReturnString = COALESCE(@ReturnString+',' ,'') + SomeLetters2

    FROM dbo.JBMTest

    WHERE SomeInt = @SomeInt

    --ORDER BY SomeLetters2 --Uncomment to guarantee ordered output if required

    RETURN @ReturnString

    END

    GO

    --===== Create a table to drop the results in to eliminate resuls set lag

    CREATE TABLE wibble(id INT, string NVARCHAR(MAX))

    --===========================================================================================================

    --===== Do the tests which includes each function we created and some "inline" methods using XML.

    --===== Check the "Messages" table for CPU, Duration, and Row Counts.

    --===========================================================================================================

    SET NOCOUNT OFF

    SET STATISTICS TIME OFF

    PRINT '========== 8k ISNULL solution (Concat8KTest) =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT SomeInt,dbo.Concat8KTest(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== MAX ISNULL solution (ConcatMAXTest) =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT SomeInt,dbo.ConcatMAXTest(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== 8k Coalesce solution (Concat8KTestCoalesce) =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT SomeInt,dbo.Concat8KTestCoalesce(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== MAX Coalesce solution (ConcatMAXTestCoalesce) =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT SomeInt,dbo.ConcatMAXTestCoalesce(SomeInt)

    FROM dbo.JBMTest

    GROUP BY SomeInt

    ORDER BY SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== Inline XML solution =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT t1.SomeInt,STUFF((SELECT ',' + t2.SomeLetters2

    FROM dbo.JBMTest t2

    WHERE t2.SomeInt = t1.SomeInt --Correlation here

    --ORDER BY t2.SomeLetters2 --Uncomment to guarantee ordered output if required

    FOR XML PATH(''))

    ,1,1,'') AS SomeLetters2

    FROM dbo.JBMTest t1

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== Inline XML solution with REPLACE =========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT t1.SomeInt,REPLACE(REPLACE(REPLACE(STUFF((SELECT ',' + t2.SomeLetters2

    FROM dbo.JBMTest t2

    WHERE t2.SomeInt = t1.SomeInt --Correlation here

    --ORDER BY t2.SomeLetters2 --Uncomment to guarantee ordered output if required

    FOR XML PATH(''))

    ,1,1,''),'&#gt','>'),'&#lt','<'),'&#0d','') AS SomeLetters2

    FROM dbo.JBMTest t1

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== Inline XML solution with TYPE=========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECT t1.SomeInt,STUFF((SELECT ',' + t2.SomeLetters2

    FROM dbo.JBMTest t2

    WHERE t2.SomeInt = t1.SomeInt --Correlation here

    --ORDER BY t2.SomeLetters2 --Uncomment to guarantee ordered output if required

    FOR XML PATH(''),TYPE).value('.','VARCHAR(8000)')

    ,1,1,'') AS SomeLetters2

    FROM dbo.JBMTest t1

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    PRINT '========== CLR Solution, left JOIN=========='

    DELETE wibble

    SET STATISTICS TIME ON

    INSERT wibble

    SELECTt1.SomeInt,dbo.Concatenate(t2.SomeLetters2) AS SomeLetters2

    FROMdbo.JBMTest t1

    LEFT JOIN dbo.JBMTest t2

    ONt2.SomeInt = t1.SomeInt

    GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

    ORDER BY t1.SomeInt

    SET STATISTICS TIME OFF

    GO

    --===========================================================================================================

    --===== Housekeeping code for when you are done testing.

    --===========================================================================================================

    USE TempDB; DROP TABLE dbo.JBMTest

    go

    USE TempDB; DROP FUNCTION dbo.Concat8KTest,dbo.Concat8KTestCoalesce,dbo.ConcatMAXTest,dbo.ConcatMAXTestCoalesce

    go

    USE TempDB; DROP TABLE wibble

    In addition to the CLR I added Inline XML solution with REPLACE

    The machine is a core2 DUO 2.2GH with 2GB RAM running SQL 2005 express locally and the results are as follows:

    ========== Building the test table and indexes ==========

    (100000 row(s) affected)

    ========== Building the functions to test ==========

    ========== 8k ISNULL solution (Concat8KTest) ==========

    (0 row(s) affected)

    SQL Server Execution Times:

    CPU time = 3438 ms, elapsed time = 5714 ms.

    (43326 row(s) affected)

    ========== MAX ISNULL solution (ConcatMAXTest) ==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4375 ms, elapsed time = 6887 ms.

    (43326 row(s) affected)

    ========== 8k Coalesce solution (Concat8KTestCoalesce) ==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 3718 ms, elapsed time = 5752 ms.

    (43326 row(s) affected)

    ========== MAX Coalesce solution (ConcatMAXTestCoalesce) ==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 4563 ms, elapsed time = 6953 ms.

    (43326 row(s) affected)

    ========== Inline XML solution ==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 781 ms, elapsed time = 772 ms.

    (43326 row(s) affected)

    ========== Inline XML solution with REPLACE ==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1172 ms, elapsed time = 1172 ms.

    (43326 row(s) affected)

    ========== Inline XML solution with TYPE==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 1984 ms, elapsed time = 1982 ms.

    (43326 row(s) affected)

    ========== CLR Solution, left JOIN==========

    (43326 row(s) affected)

    SQL Server Execution Times:

    CPU time = 3266 ms, elapsed time = 3288 ms.

    (43326 row(s) affected)

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • Outstanding, Rob! I love it when a plan comes together. According to your timings, the XML beat the tar out of the CLR and the simple 8k function almost caught the CLR.

    As a bit of a side bar, the REPLACE method is fine but only if you catch all the possible characters. For example, you missed "&" and a couple of others which may make it worthwhile to use Adam's "TYPE" directive even though it slows things down a bit. With or without the TYPE, the XML still beats up on both the CLR and the 8k function.

    Anyway, thank you again for taking the time to setup and test the CLR concatenation. It gives folks like me something to reference without having to actually instantiate CLR's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • msam77-503326 (10/15/2009)


    Below is simple sql that does that

    declare @res varchar(Max)

    select @res = coalesce(@res + ',', '') + [name]

    from

    (

    select 'Apple' as name, 101 as id union all

    select 'Banana' as name, 102 as id union all

    select 'Orange' as name, 103 as id union all

    select 'Melon' as name, 104 as id union all

    select 'Grape' as name, 105 as id

    ) a

    select @res

    Yep... agreed... lot's of folks have posted that same thing this thread.

    I'd also recommend that, unless you really need it, that you use VARCHAR(8000) instead of VARCHAR(MAX) because it's usage does slow things down a bit.

    I also see that's your first post... "Welcome aboard!"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a bit of a side bar, the REPLACE method is fine but only if you catch all the possible characters. For example, you missed "&" and a couple of others which may make it worthwhile to use Adam's "TYPE" directive even though it slows things down a bit. With or without the TYPE, the XML still beats up on both the CLR and the 8k function.

    Yeah, it was nearing 5 and I didn't have time to hunt down all the characters that should be escaped. It was more of an indication towards the performance bloat the more manual replaces you have to do, not to mention making readability a bitch.

    In hindsight missing some and having you pick up on it highlights the possible unreliability of that method too.

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

Viewing 15 posts - 76 through 90 (of 159 total)

You must be logged in to reply to this topic. Login to reply