sql pivot problem with unknown number of records to columns.

  • Hi Please help me out in this scenario.

    Table

    IDNAME

    1a

    1b

    1c

    1d

    2e

    2f

    3g

    3h

    3i

    4j

    5K

    5L

    5m

    5N

    5O

    5P

    required output

    id name1name 2name3name4name 5name6

    1abc

    2ef

    3ghi

    4j

    5klmnop

    i need a query which gives the out put

  • Using this sample data: -

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    SELECT ID, NAME

    INTO #testEnvironment

    FROM (VALUES(1,'a'),(1,'b'),(1,'c'),(1,'d'),(2,'e'),(2,'f'),

    (3,'g'),(3,'h'),(3,'i'),(4,'j'),(5,'K'),(5,'L'),

    (5,'m'),(5,'N'),(5,'O'),(5,'P')

    )a(ID, NAME);

    Then something like this: -

    DECLARE @sql NVARCHAR(MAX);

    SELECT @sql='SELECT ID,'+CHAR(13)+CHAR(10)+

    STUFF((SELECT ','+CHAR(13)+CHAR(10)+'NULLIF(MAX(CASE WHEN POS = '+CAST(POS AS VARCHAR(3))+' THEN NAME ELSE '+CHAR(39)+CHAR(39)+' END),'+CHAR(39)+CHAR(39)+

    ') AS '+QUOTENAME('NAME'+CAST(POS AS VARCHAR(3)))

    FROM (SELECT DISTINCT ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NAME)

    FROM #testEnvironment

    )a(POS)

    FOR XML PATH(''), TYPE).value('.','VARCHAR(MAX)'),1,3,'')+CHAR(13)+CHAR(10)+'FROM (SELECT ID, NAME, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NAME)'+CHAR(13)+CHAR(10)+

    SPACE(6)+'FROM #testEnvironment'+CHAR(13)+CHAR(10)+SPACE(6)+')a(ID,NAME,POS)'+CHAR(13)+CHAR(10)+'GROUP BY ID;';

    EXECUTE sp_executesql @sql;

    Which returns: -

    ID NAME1 NAME2 NAME3 NAME4 NAME5 NAME6

    ----------- ----- ----- ----- ----- ----- -----

    1 a b c d NULL NULL

    2 e f NULL NULL NULL NULL

    3 g h i NULL NULL NULL

    4 j NULL NULL NULL NULL NULL

    5 K L m N O P


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Using the above sample from Cadavre to load the data here is an alternative. Admittedly you will need to prejudge the possible number of options and pad out the sql but this code can be used in views as it isnt dynamic:

    Select ID

    ,[Name1] = [1]

    ,[Name2] = [2]

    ,[Name3] = [3]

    ,[Name4] = [4]

    ,[Name5] = [5]

    ,[Name6] = [6]

    ,[Name7] = [7]

    ,[Name8] = [8]

    from (

    select distinct

    ID

    ,Name

    ,RID = Dense_Rank() OVER(PARTITION BY ID ORDER BY NAME)

    from #testEnvironment

    ) as XX

    pivot ( min(Name) for RID in ( [1],[2],[3],[4],[5],[6],[7],[8] )

    ) as pvt

    order by ID

  • Steve JP (2/28/2013)


    Using the above sample from Cadavre to load the data here is an alternative. Admittedly you will need to prejudge the possible number of options and pad out the sql but this code can be used in views as it isnt dynamic:

    Select ID

    ,[Name1] = [1]

    ,[Name2] = [2]

    ,[Name3] = [3]

    ,[Name4] = [4]

    ,[Name5] = [5]

    ,[Name6] = [6]

    ,[Name7] = [7]

    ,[Name8] = [8]

    from (

    select distinct

    ID

    ,Name

    ,RID = Dense_Rank() OVER(PARTITION BY ID ORDER BY NAME)

    from #testEnvironment

    ) as XX

    pivot ( min(Name) for RID in ( [1],[2],[3],[4],[5],[6],[7],[8] )

    ) as pvt

    order by ID

    The OP said for an "unknown" number, which means that realistically, dynamic is the only way to go.

    If it was for a known number, then I find that the PIVOT operator is generally slower than using CASE statements then aggregating, e.g.

    SELECT ID,

    NULLIF(MAX(CASE WHEN POS = 1 THEN NAME ELSE '' END),'') AS [NAME1],

    NULLIF(MAX(CASE WHEN POS = 2 THEN NAME ELSE '' END),'') AS [NAME2],

    NULLIF(MAX(CASE WHEN POS = 3 THEN NAME ELSE '' END),'') AS [NAME3],

    NULLIF(MAX(CASE WHEN POS = 4 THEN NAME ELSE '' END),'') AS [NAME4],

    NULLIF(MAX(CASE WHEN POS = 5 THEN NAME ELSE '' END),'') AS [NAME5],

    NULLIF(MAX(CASE WHEN POS = 6 THEN NAME ELSE '' END),'') AS [NAME6]

    FROM (SELECT ID, NAME, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NAME)

    FROM #testEnvironment

    )a(ID,NAME,POS)

    GROUP BY ID;


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Throwing the 2 versions of the code through a test rig shows that the methods are reasonable similar on speed. Its how the methods get used is probable more important than the milliseconds difference in its perfromance.

    For each sceniaro there was 50 interations which had a random number of values for the each row and the time taken to run each statement was recorded. For my code I padded the code out to hold a potential of 15 columns [Name1].....[Name15] just to make the testing easier & trouble freee 🙂 I havent tested the width of the values that populate the columns and at a guess this will have an effect.

    The first run had a random number of columns upto 5 values for each row.

    RowsCadavre Steve_JP

    10125

    10032

    500137

    1000159

    25002018

    50003029

    75004144

    100004549

    50000210242

    100000370442

    The second run had a random number of columns upto 10 values for each row.

    RowsCadavre Steve_JP

    10103

    10051

    500157

    10002313

    25003825

    50007050

    750010075

    1000012697

    50000649487

    1000001170864

    Hopefully this helpful. At least both methods avoid RBAR

  • Another version with use of dynamic SQL produces required output:

    SET NOCOUNT ON;

    -- setup sample table

    SELECT ID, NAME

    INTO #testEnvironment

    FROM (VALUES(1,'a'),(1,'b'),(1,'c'),(1,'d'),(2,'e'),(2,'f'),

    (3,'g'),(3,'h'),(3,'i'),(4,'j'),(5,'K'),(5,'L'),

    (5,'m'),(5,'N'),(5,'O'),(5,'P')

    )a(ID, NAME);

    -- Get required output

    DECLARE @sql NVARCHAR(4000) = 'SELECT ID';

    SELECT @sql = @sql + '

    ,MAX(CASE N WHEN ' + N + ' THEN NAME ELSE '''' END) as Name' + N

    FROM (SELECT DISTINCT CAST(ROW_NUMBER() OVER (PARTITION BY ID ORDER BY NAME) AS VARCHAR) N

    FROM #testEnvironment) Q;

    SELECT @sql = @sql + '

    FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY NAME) N

    FROM #testEnvironment) Q

    GROUP BY ID';

    EXEC (@SQL);

    DROP TABLE #testEnvironment;

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Steve JP (2/28/2013)


    Throwing the 2 versions of the code through a test rig shows that the methods are reasonable similar on speed. Its how the methods get used is probable more important than the milliseconds difference in its perfromance.

    Interesting, can you provide the code you used in your test rig? I find that the CASE versions are generally much quicker than PIVOT, so I'd like to see why we have a different result here.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply