October 10, 2008 at 12:12 pm
I need to convert Rows into Columns.
I have 8 columns. But in reality these are
4 pairs. I need to convert them into 2 columns.
After failing to do it with UNPIVOT task I decided to
try it using ForEachLoop Container.
Here is what I do:
1. Load Source data into Recordset (Recordset Destination component)
Use Object variable User::vRSsource
2. Then I connect to ForEachLoop Container where I use ForEach ADo Enumerator and choose ADO Object Source variable: User::vRSsource
Viariable Mappings are:
Index 0 -> User::vCol1
Index 1 -> User::vCol2
Index 2 -> User::vCol1
Index 3 -> User::vCol2
etc.
Once I got first row with User::vCol1 and User::vCol2
I'd like to write to Recordset Destination vRSrotated
My plan was to fill out vRSrotated and then
just flush the contents of it into OLE DB Destination table.
But then if I just add another Data Flow step
and try to access that Recordset vRSrotated
I don't know how to do it.
It's no Recordset Source.
There is only Data Reader Source
which expects Connection Manager.
But my source in in memory. It's in the object variable User::vRSrotated
Which component I should use?
Is my approach correct or I'm moving in the wrong direction?
It is actually very easy to achive the goal just using this T-SQL:
select COL001,COL002 from bap_prod.dbo.SICDIM_Norm
group by
COL001,COL002
union all
select COL003,COL004 from bap_prod.dbo.SICDIM_Norm
group by
COL003,COL004
union all
select COL005,COL006 from bap_prod.dbo.SICDIM_Norm
group by
COL005,COL006
union all
select COL007,COL008 from bap_prod.dbo.SICDIM_Norm
group by
COL007,COL008
order by
COL001,COL002
But I'm just curious and try to achieve it without "Execute SQL" task.
Plus my boss doesn't want to see any "Execute SQL" tasks in any package.
October 14, 2008 at 8:49 am
I think you could\should do it using a script component. After you query your data using an ole db source your next step can be a script component. On the output of the script component you change the SynchronousInputID property to None. Now your script component will output a completely new dataset that you can define.
Here is a blog post by Jamie Thompson that has an example of an asynchronous Script Component, http://blogs.conchango.com/jamiethomson/archive/2005/07/25/SSIS-Nugget_3A00_-The-difference-between-synchronous-and-asynchronous-script-components.aspx
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
October 15, 2008 at 7:40 am
I thought I would show how you could have used the UNPIVOT & PIVOT functions, given that you were unable to do so. This is just another method, as your original group by scenario with union all certainly works, and without testing for comparison, it might be difficult to know which would perform better, although for large volumes of data, I suspect your original SQL might be best.
Here's the code:
DECLARE @SICDIM_Norm TABLE (
COL001 int,
COL002 int,
COL003 int,
COL004 int,
COL005 int,
COL006 int,
COL007 int,
COL008 int
)
INSERT INTO @SICDIM_Norm VALUES(1,2,3,4,5,6,7,8)
INSERT INTO @SICDIM_Norm VALUES(2,3,4,5,6,7,8,9)
DECLARE @TEMP_TBL TABLE(
RN int,
COLNAME varchar(6),
COLVALUE int
)
INSERT INTO @TEMP_TBL (COLNAME, COLVALUE)
SELECT COLNAME, COLVALUE
FROM @SICDIM_Norm
UNPIVOT(COLVALUE FOR COLNAME IN (COL001,COL002,COL003,COL004,COL005,COL006,COL007,COL008)) UPVT
UPDATE @TEMP_TBL
SET COLNAME =
CASE
WHEN COLNAME IN ('COL003','COL005','COL007') THEN 'COL001'
WHEN COLNAME IN ('COL004','COL006','COL008') THEN 'COL002'
END
WHERE COLNAME NOT IN ('COL001','COL002')
;WITH ROWS AS (
SELECT ROW_NUMBER() OVER(PARTITION BY COLNAME ORDER BY COLVALUE) AS RN, COLNAME, COLVALUE
FROM @TEMP_TBL
)
UPDATE @TEMP_TBL
SET RN = X.RN
FROM @TEMP_TBL AS T INNER JOIN ROWS AS X
ON T.COLNAME = X.COLNAME AND
T.COLVALUE = X.COLVALUE
SELECT COL001, COL002
FROM @TEMP_TBL
PIVOT(MIN(COLVALUE) FOR COLNAME IN (COL001,COL002)) AS PVT
The code shown creates a table variable to hold test data, and has two rows inserted just to ensure that multiple values can be tested. That table also has an additional column to group the records by, which is populated using the ROW_NUMBER() function. This is essential to the eventual PIVOT. The UNPIVOT decomposes ALL columns first, and then the UPDATE translates things back to two colum name values, and then the PIVOT works it's magic to create them. One might think that you could eliminate the RN field from the table variable, as you don't see it get used in the PIVOT, but that's just not the case. It does get used as a group by mechanism, and is essential to producing a correct result.
Without significantly greater test data, it would be difficult to test this for performance vs the original T-SQL. Any takers?
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 16, 2008 at 8:01 am
Guys,
Thank you so much for your valuable input.
I definately will study the code Smunson posted.
It looks very interetsed.
But probably in my case it is a bit overkill.
I managed to get the results just by using
Multicast, four Derived Columns components and one Union All.
But I'll definately will explore all other options for UNPIVOTing rows.
I think using Script Loops would be the slowest option.
But I read an article with a great interest because
before I had no clue how to process Pipeline data row by row in a Script component. I'm sure I will use this technique soon.
Thank you so much guys!
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply