Convert Rows into Columns using ForEachLoop Container

  • I need to convert Rows into Columns.

    I have 8 columns. But in reality these are

    4 pairs. I need to convert them into 2 columns.

    After failing to do it with UNPIVOT task I decided to

    try it using ForEachLoop Container.

    Here is what I do:

    1. Load Source data into Recordset (Recordset Destination component)

    Use Object variable User::vRSsource

    2. Then I connect to ForEachLoop Container where I use ForEach ADo Enumerator and choose ADO Object Source variable: User::vRSsource

    Viariable Mappings are:

    Index 0 -> User::vCol1

    Index 1 -> User::vCol2

    Index 2 -> User::vCol1

    Index 3 -> User::vCol2

    etc.

    Once I got first row with User::vCol1 and User::vCol2

    I'd like to write to Recordset Destination vRSrotated

    My plan was to fill out vRSrotated and then

    just flush the contents of it into OLE DB Destination table.

    But then if I just add another Data Flow step

    and try to access that Recordset vRSrotated

    I don't know how to do it.

    It's no Recordset Source.

    There is only Data Reader Source

    which expects Connection Manager.

    But my source in in memory. It's in the object variable User::vRSrotated

    Which component I should use?

    Is my approach correct or I'm moving in the wrong direction?

    It is actually very easy to achive the goal just using this T-SQL:

    select COL001,COL002 from bap_prod.dbo.SICDIM_Norm

    group by

    COL001,COL002

    union all

    select COL003,COL004 from bap_prod.dbo.SICDIM_Norm

    group by

    COL003,COL004

    union all

    select COL005,COL006 from bap_prod.dbo.SICDIM_Norm

    group by

    COL005,COL006

    union all

    select COL007,COL008 from bap_prod.dbo.SICDIM_Norm

    group by

    COL007,COL008

    order by

    COL001,COL002

    But I'm just curious and try to achieve it without "Execute SQL" task.

    Plus my boss doesn't want to see any "Execute SQL" tasks in any package.

  • I think you could\should do it using a script component. After you query your data using an ole db source your next step can be a script component. On the output of the script component you change the SynchronousInputID property to None. Now your script component will output a completely new dataset that you can define.

    Here is a blog post by Jamie Thompson that has an example of an asynchronous Script Component, http://blogs.conchango.com/jamiethomson/archive/2005/07/25/SSIS-Nugget_3A00_-The-difference-between-synchronous-and-asynchronous-script-components.aspx

  • I thought I would show how you could have used the UNPIVOT & PIVOT functions, given that you were unable to do so. This is just another method, as your original group by scenario with union all certainly works, and without testing for comparison, it might be difficult to know which would perform better, although for large volumes of data, I suspect your original SQL might be best.

    Here's the code:

    DECLARE @SICDIM_Norm TABLE (

    COL001 int,

    COL002 int,

    COL003 int,

    COL004 int,

    COL005 int,

    COL006 int,

    COL007 int,

    COL008 int

    )

    INSERT INTO @SICDIM_Norm VALUES(1,2,3,4,5,6,7,8)

    INSERT INTO @SICDIM_Norm VALUES(2,3,4,5,6,7,8,9)

    DECLARE @TEMP_TBL TABLE(

    RN int,

    COLNAME varchar(6),

    COLVALUE int

    )

    INSERT INTO @TEMP_TBL (COLNAME, COLVALUE)

    SELECT COLNAME, COLVALUE

    FROM @SICDIM_Norm

    UNPIVOT(COLVALUE FOR COLNAME IN (COL001,COL002,COL003,COL004,COL005,COL006,COL007,COL008)) UPVT

    UPDATE @TEMP_TBL

    SET COLNAME =

    CASE

    WHEN COLNAME IN ('COL003','COL005','COL007') THEN 'COL001'

    WHEN COLNAME IN ('COL004','COL006','COL008') THEN 'COL002'

    END

    WHERE COLNAME NOT IN ('COL001','COL002')

    ;WITH ROWS AS (

    SELECT ROW_NUMBER() OVER(PARTITION BY COLNAME ORDER BY COLVALUE) AS RN, COLNAME, COLVALUE

    FROM @TEMP_TBL

    )

    UPDATE @TEMP_TBL

    SET RN = X.RN

    FROM @TEMP_TBL AS T INNER JOIN ROWS AS X

    ON T.COLNAME = X.COLNAME AND

    T.COLVALUE = X.COLVALUE

    SELECT COL001, COL002

    FROM @TEMP_TBL

    PIVOT(MIN(COLVALUE) FOR COLNAME IN (COL001,COL002)) AS PVT

    The code shown creates a table variable to hold test data, and has two rows inserted just to ensure that multiple values can be tested. That table also has an additional column to group the records by, which is populated using the ROW_NUMBER() function. This is essential to the eventual PIVOT. The UNPIVOT decomposes ALL columns first, and then the UPDATE translates things back to two colum name values, and then the PIVOT works it's magic to create them. One might think that you could eliminate the RN field from the table variable, as you don't see it get used in the PIVOT, but that's just not the case. It does get used as a group by mechanism, and is essential to producing a correct result.

    Without significantly greater test data, it would be difficult to test this for performance vs the original T-SQL. Any takers?

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Guys,

    Thank you so much for your valuable input.

    I definately will study the code Smunson posted.

    It looks very interetsed.

    But probably in my case it is a bit overkill.

    I managed to get the results just by using

    Multicast, four Derived Columns components and one Union All.

    But I'll definately will explore all other options for UNPIVOTing rows.

    I think using Script Loops would be the slowest option.

    But I read an article with a great interest because

    before I had no clue how to process Pipeline data row by row in a Script component. I'm sure I will use this technique soon.

    Thank you so much guys!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply