Creating a Comma-Separated List (SQL Spackle)

  • Paul,

    Hmm, I don't recall seeing that before. It definitely is good stuff.

    I'll have to run some tests to see if there is a length restriction - could turn out to be important!

    Thanks!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Hi,

    first of all grate article.

    I just used the (./text())[1] modification and got results with length over 70K chars!

    Kupy

  • I have always used: .value('.[1]','varchar(max)'),1,1,'')

    Which appears with the small set to perform about the same as .value('(./text())[1]','varchar(max)'),1,1,''). (They seem to trade off on which is faster even though the execution plan shows the one with text having a much lower cost.

  • WayneS (1/12/2011)


    WayneS (1/12/2011)


    Mark-101232 (1/12/2011)


    Changing

    value('.','varchar(max)'),1,1,'')

    to

    value('(./text())[1]','varchar(max)'),1,1,'')

    appears to give a better query plan

    Thanks for the information... I'll have to test this out.

    Edit: WOW! What a difference! Identical queries, with just that one change. The way in the article gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 3175 ms.

    Your suggested change gets:

    (1000 row(s) affected)

    Table '#TestData'. Scan count 1001, logical reads 2411, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (1 row(s) affected)

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 561 ms.

    Looking at the execution plans (both queries run together), the method in the article uses 99% and your solution 1%.

    Thanks for the tip!

    FYI: I followed up the testing I did at work with testing at home. The difference at home is not so profound, but Mark's method is still routinely faster, sometimes by 33%. There is a difference in the test code between work and home: the home version is dumping the results into a temp table, to keep everything strictly within SQL and eliminate display (sending output to the display is "the great equalizer"!).

    The test code:

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--XML TYPE with value(''.'')';

    SET STATISTICS IO,TIME ON;

    GO

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('.','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    GO

    SET STATISTICS IO,TIME OFF;

    PRINT '';

    PRINT '';

    PRINT '';

    PRINT '--XML TYPE with value(''(./text())[1]'')';

    SET STATISTICS IO,TIME ON;

    GO

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('(./text())[1]','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    SET STATISTICS IO,TIME OFF;

    The results:

    --XML TYPE with value('.')

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 13 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 2 ms.

    Table '#TestData'. Scan count 1001, logical reads 2411, ...

    SQL Server Execution Times:

    CPU time = 140 ms, elapsed time = 213 ms.

    --XML TYPE with value('(./text())[1]')

    SQL Server parse and compile time:

    CPU time = 10 ms, elapsed time = 10 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table '#TestData'. Scan count 1001, logical reads 2411, ...

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 137 ms.

    As I said, not so profound a difference, but yet always faster and it still has a better execution plan!

    The test and, therefor, the test results are flawed. You don't clear cache before the second test. When I do on my machine using a million rows with 10000 accounts the two methods takes turns beating each other by 200 milliseconds. I have another test to make and then I'll be back with code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok... here's the test data generator that I borrowed from Wayne's article. I set it up for 10 times more rows and 10 times more accounts for a million rows spread over 10000 accounts.

    -- Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#TestData','U') IS NOT NULL DROP TABLE #TestData;

    CREATE TABLE #TestData (AccountNumber INT,

    Value CHAR(3));

    -- Build 1000 account numbers with random 3 character data.

    ;WITH

    TENS (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),

    THOUSANDS (N) AS (SELECT 1 FROM TENS t1 CROSS JOIN TENS t2 CROSS JOIN TENS t3),

    MILLIONS (N) AS (SELECT 1 FROM THOUSANDS t1 CROSS JOIN THOUSANDS t2),

    TALLY (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM MILLIONS)

    INSERT INTO #TestData

    SELECT TOP (1000000)

    AccountNumber = CASE WHEN (N%10000) = 0 THEN 1000 ELSE N%1000 END,

    Value = CHAR(CONVERT(INT,RAND(CHECKSUM(NEWID()))*10)+64) +

    CHAR(CONVERT(INT,RAND(CHECKSUM(NEWID()))*10)+64) +

    CHAR(CONVERT(INT,RAND(CHECKSUM(NEWID()))*10)+64)

    FROM TALLY;

    -- Add a clustered index to the table

    CREATE CLUSTERED INDEX IX_#TestData_Cover ON #TestData (AccountNumber, Value);

    Here's the test code I ran. Do notice that I cleared the cache before every run and I also moved the conditional table drop out of the timings. Quick! "What's behind door #3?" 😛

    ---------------------------------------------------------------------------------------------

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--========== XML TYPE with value(''.'') ==============================';

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    SET STATISTICS IO,TIME ON;

    GO

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('.','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    GO-------------------------------------------------------------------------------------------

    SET STATISTICS IO,TIME OFF;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--========== XML TYPE with value(''(./text())[1]'') ==============================';

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    SET STATISTICS IO,TIME ON;

    GO

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('(./text())[1]','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    SET STATISTICS IO,TIME OFF;

    GO-------------------------------------------------------------------------------------------

    SET STATISTICS IO,TIME OFF;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--========== The "old fashioned" way ==============================';

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    SET STATISTICS IO,TIME ON;

    GO

    SELECT AccountNumber,

    CommaList = STUFF(CAST(

    (

    SELECT ',' + Value

    FROM #TestData td2

    WHERE AccountNumber = td1.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),TYPE

    )

    AS VARCHAR(MAX)),1,1,'')

    INTO #test

    FROM #TestData td1

    GROUP BY AccountNumber

    ORDER BY AccountNumber;

    SET STATISTICS IO,TIME OFF;

    And, here are the results from ten runs on my ol' single CPU desktop box. Like I said, "What's behind door #3?"

    Legend: CPU/Duration

    Method# (1) (2) (3)

    Run# ('.') (./text()][1] OFW Winner RunnerUp

    ---- --------- ------------- --------- ------ --------

    1 2953/3129 2609/2676 2500/2584 3/3 2/2

    2 2828/3147 2781/2938 2734/2730 3/3 2/2

    3 2656/2709 2796/2884 2641/2725 3/1 1/3

    4 2813/2944 2500/2569 2594/2706 2/2 3/3

    5 2610/2731 2844/3097 2765/2877 1/1 3/3

    6 2640/2886 2718/2853 2500/2621 3/3 1/1

    7 2781/2874 2844/3077 2516/2635 3/3 1/1

    8 2594/2889 2516/2610 2625/2747 2/2 1/3

    9 2546/2696 2656/2793 2671/2873 1/1 2/2

    10 2766/2913 2547/2718 2531/2605 3/3 2/2

    Please, someone else with a machine that's newer than a 9 year old, single 1.8GHZ CPU runng 2k5 Dev Ed sp3 running on XP sp3 run the code and post your results. Obviously, a single run may not do it. Please be sure to post the box info you ran the code on like I did.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • With your test data, on my 2yr old dual core Intel P-8600 @ 2.40GHz, 8GB RAM, Vista Ultimate 64-bit SP2, SQL 2008 64-bit SP2 (phew!):

    (oh yeah... you really need to update that 9-yr old computer you have!!!)

    Legend: CPU/Duration

    Method# (1) (2) (3)

    Run# ('.') (./text()][1] OFW Winner RunnerUp

    ---- --------- ------------- --------- ------ --------

    1 890/624 872/546 982/664 2/2 1/1

    2 859/684 825/537 905/706 2/2 1/1

    3 904/740 873/584 936/720 2/2 1/3

    4 1000/746 904/626 998/623 2/3 3/3

    5 967/637 874/641 936/646 2/1 3/1

    6 920/687 889/809 1046/679 2/3 1/1

    7 952/639 1028/693 985/675 1/1 3/3

    8 889/640 842/501 920/588 2/2 1/3

    9 903/718 937/649 905/588 1/3 3/2

    10 952/792 872/497 967/639 2/2 1/3

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (1/21/2011)


    With your test data, on my 2yr old dual core Intel P-8600 @ 2.40GHz, 8GB RAM, Vista Ultimate 64-bit SP2, SQL 2008 64-bit SP2 (phew!):

    (oh yeah... you really need to update that 9-yr old computer you have!!!)

    Legend: CPU/Duration

    Method# (1) (2) (3)

    Run# ('.') (./text()][1] OFW Winner RunnerUp

    ---- --------- ------------- --------- ------ --------

    1 890/624 872/546 982/664 2/2 1/1

    2 859/684 825/537 905/706 2/2 1/1

    3 904/740 873/584 936/720 2/2 1/3

    4 1000/746 904/626 998/623 2/3 3/3

    5 967/637 874/641 936/646 2/1 3/1

    6 920/687 889/809 1046/679 2/3 1/1

    7 952/639 1028/693 985/675 1/1 3/3

    8 889/640 842/501 920/588 2/2 1/3

    9 903/718 937/649 905/588 1/3 3/2

    10 952/792 872/497 967/639 2/2 1/3

    All of those runs are very close heats especially considering there's a million rows in the test. Method 3 was the winner on my machine with 1 and 2 splitting the left overs. 2 is the winner on Wayne's machine and once we did the testing correctly, it's not the monster win that everyone thought it was initially. Who knows which will be the winner on other machines?

    As with so many things in SQL Server, I'll say "It Depends". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (1/21/2011)


    (oh yeah... you really need to update that 9-yr old computer you have!!!)

    Heh... nah... two reasons. 1) If I can make something run fast on my machine, it flys on everyone elses. 2) If my computer runs too fast, there's not enough time to take a sip of scotch. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/22/2011)


    WayneS (1/21/2011)


    (oh yeah... you really need to update that 9-yr old computer you have!!!)

    Heh... nah... two reasons. 1) If I can make something run fast on my machine, it flys on everyone elses. 2) If my computer runs too fast, there's not enough time to take a sip of scotch. 😀

    I agree with both of those reasons! It's why I used to build underpowered systems myself.

    But you know I'm going to keep harassing you about that old computer! :-D:w00t:

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Here are the results from my development server (HP ProLiant ML350G5, dual quad core E5430 @ 2.66GHZ, 18GB) SQL Server 2008 R2 CU5 Enterprise Edition x64 on Windows Server 2008 R2 Standard

    Run # (1) (2) (3) (4) Winner Runner Up

    1 811/161 984/151 999/196 982/148 1/4 4/2

    2 890/175 1026/161 937/223 1000/163 1/2 3/4

    3 1027/163 1046/159 860/194 856/149 4/4 3/2

    4 984/158 1026/153 1031/194 920/153 4/(2+4) 1/1

    5 923/174 996/152 967/198 906/155 4/2 1/4

    6 980/151 1030/150 1063/202 952/149 4/4 1/2

    7 1026/164 1030/162 1031/226 983/166 4/2 1/4

    8 1044/164 997/150 1031/214 889/148 4/4 2/2

    9 1013/156 1032/150 1045/206 950/146 4/4 1/2

    10 938/151 1032/154 1043/204 967/149 1/4 4/1

    BTW: Is there a quick trick for formatting the numbers?

    I added a fourth test using my method: .value('.[1]','varchar(max)'),1,1,''):

    SET STATISTICS IO,TIME OFF;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS;

    PRINT '--========== XML TYPE with value(''.[1]'') ==============================';

    IF OBJECT_ID('tempdb..#test') IS NOT NULL DROP TABLE #test;

    SET STATISTICS IO,TIME ON;

    GO

    WITH CTE AS

    (

    SELECT DISTINCT

    AccountNumber

    FROM #TestData

    )

    SELECT AccountNumber,

    CommaList = STUFF((SELECT ',' + Value

    FROM #TestData

    WHERE AccountNumber = CTE.AccountNumber

    ORDER BY Value

    FOR XML PATH(''),

    TYPE).value('.[1]','varchar(max)'),1,1,'')

    INTO #test

    FROM CTE

    ORDER BY AccountNumber;

    SET STATISTICS IO,TIME OFF;

  • Jeff Moden (1/21/2011)


    Ok... here's the test data generator that I borrowed from Wayne's article. I set it up for 10 times more rows and 10 times more accounts for a million rows spread over 10000 accounts.

    -- Conditionally drop the test table to make reruns easier.

    IF OBJECT_ID('tempdb..#TestData','U') IS NOT NULL DROP TABLE #TestData;

    CREATE TABLE #TestData (AccountNumber INT,

    Value CHAR(3));

    -- Build 1000 account numbers with random 3 character data.

    You said 10,000 accounts, but your code only creates 1,000. I just wanted to make sure 1,000 is what you ran with so my results would be comparable.

  • UMG Developer (1/24/2011)


    You said 10,000 accounts, but your code only creates 1,000. I just wanted to make sure 1,000 is what you ran with so my results would be comparable.

    Agh! Good catch (more coffee please :blush:). The code and the comments in the code match. There are only 1000 being created by the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have implemented this script but am getting the error "Invalid column name 'Value'" when running it. Is there something special that I need to do in order for Mgmt Studio to recognize the "unnamed expression" part of that? I've implemented both versions, this one:

    For XML Path(''), Type).Value('(./text())[1]','varchar(max)')

    And this one:

    For XML Path(''), TYPE).Value('.','varchar(max)')

    But I don't think those have any bearing on the 'Value' syntax.

  • gregory.anderson (2/28/2011)


    I have implemented this script but am getting the error "Invalid column name 'Value'" when running it. Is there something special that I need to do in order for Mgmt Studio to recognize the "unnamed expression" part of that? I've implemented both versions, this one:

    For XML Path(''), Type).Value('(./text())[1]','varchar(max)')

    And this one:

    For XML Path(''), TYPE).Value('.','varchar(max)')

    But I don't think those have any bearing on the 'Value' syntax.

    No specific tricks, but maybe some piece of the code got lost in translation during the copy and paste.

    The .value function is a function of xquery and is supported in SSMS 2005 and up.

    http://msdn.microsoft.com/en-us/library/ms178030.aspx for a bit more info on it.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • gregory.anderson (2/28/2011)


    I have implemented this script but am getting the error "Invalid column name 'Value'" when running it. Is there something special that I need to do in order for Mgmt Studio to recognize the "unnamed expression" part of that? I've implemented both versions, this one:

    For XML Path(''), Type).Value('(./text())[1]','varchar(max)')

    And this one:

    For XML Path(''), TYPE).Value('.','varchar(max)')

    But I don't think those have any bearing on the 'Value' syntax.

    I think your problem is that the XML commands are case sensative, and your .Value needs to be .value.

Viewing 15 posts - 31 through 45 (of 84 total)

You must be logged in to reply to this topic. Login to reply