April 20, 2016 at 6:56 am
Hello All,
I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)
For Example:
Tab 1:
=======
Date|| CustomerID || CustomerVisits || Referral
Tab 2:
=======
Date|| CustomerID || Entries
Tab 3:
=======
Date || Referral || Unique Count
.... and so on...
I need to load them into the same SQL Table, which has the following structure:
SQL_TABLE1:
===========
Date
Customer ID
CustomerVisits
Referral
Entries
UniqueCount
...etc...
So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.
I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions
Thanks & Regards
Dhivya
April 20, 2016 at 7:35 am
Dhivya Elan (4/20/2016)
Hello All,I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)
For Example:
Tab 1:
=======
Date|| CustomerID || CustomerVisits || Referral
Tab 2:
=======
Date|| CustomerID || Entries
Tab 3:
=======
Date || Referral || Unique Count
.... and so on...
I need to load them into the same SQL Table, which has the following structure:
SQL_TABLE1:
===========
Date
Customer ID
CustomerVisits
Referral
Entries
UniqueCount
...etc...
So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.
I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions
Thanks & Regards
Dhivya
I would say that the Foreach loop should work. Please explain why you think it will not.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
April 20, 2016 at 8:00 am
Dhivya Elan (4/20/2016)
Hello All,I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)
For Example:
Tab 1:
=======
Date|| CustomerID || CustomerVisits || Referral
Tab 2:
=======
Date|| CustomerID || Entries
Tab 3:
=======
Date || Referral || Unique Count
.... and so on...
I need to load them into the same SQL Table, which has the following structure:
SQL_TABLE1:
===========
Date
Customer ID
CustomerVisits
Referral
Entries
UniqueCount
...etc...
So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.
I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions
Thanks & Regards
Dhivya
Use a Derived Column in your data flow. That's all you need..
April 20, 2016 at 8:02 am
Hi Phil,
I tried using the ForEachLoop container. Here's what I did:
1) Made all my Excel Source tabs identical, by adding missing column headers (although they might not contain any data). This means that, if there are a few columns in my 3rd and 4th tabs, which dont exist in my first tab, then I added just the headers in my First Tab. I left the data empty. By doing this, all my tabs had the same number of columns and same column header names.
2) I then implemented the ForEach Loop container to import the data in a loop from each of the tabs
3) Problem I faced: On my first source tab, there are few column headers with no data. So excel samples the first few rows and assigns string datatype to it. But the same column in the 4th or 5th tab is actually Numeric as it has some number values in it.
So the routine fails when it is unable to match the data types.
I am unable to change the datatype in the source because the source has an add-in which actually deletes the content of the excel and reloads it on a daily basis.
Regards
D
April 20, 2016 at 8:07 am
Hi John
I've been trying to use the derived column transformation, unable to find a guide on how-to, that matches my requirement. I would really appreciate any helpful links.
I understand that I would have to create a variable that would import my source sheet names
and then in the derived column expression, I need to assign my sheet name. Please correct me if I'm wrong.
I am not sure how to create a variable to get the names of my sheet, and if I need to store it in some temp table or not.
Regards
D
April 20, 2016 at 8:10 am
Dhivya Elan (4/20/2016)
Hi JohnI've been trying to use the derived column transformation, unable to find a guide on how-to, that matches my requirement. I would really appreciate any helpful links.
I understand that I would have to create a variable that would import my source sheet names
and then in the derived column expression, I need to assign my sheet name. Please correct me if I'm wrong.
I am not sure how to create a variable to get the names of my sheet, and if I need to store it in some temp table or not.
Regards
D
Create the variable.
Assign the value of the sheet name being processed to the variable from within the config for the Foreach container.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
April 20, 2016 at 8:12 am
Dhivya Elan (4/20/2016)
Hi Phil,I tried using the ForEachLoop container. Here's what I did:
1) Made all my Excel Source tabs identical, by adding missing column headers (although they might not contain any data). This means that, if there are a few columns in my 3rd and 4th tabs, which dont exist in my first tab, then I added just the headers in my First Tab. I left the data empty. By doing this, all my tabs had the same number of columns and same column header names.
2) I then implemented the ForEach Loop container to import the data in a loop from each of the tabs
3) Problem I faced: On my first source tab, there are few column headers with no data. So excel samples the first few rows and assigns string datatype to it. But the same column in the 4th or 5th tab is actually Numeric as it has some number values in it.
So the routine fails when it is unable to match the data types.
I am unable to change the datatype in the source because the source has an add-in which actually deletes the content of the excel and reloads it on a daily basis.
Regards
D
Within your foreach loop, you need multiple data flows (probably one for each sheet).
Add logic to direct the flow to the relevant data flow, depending on the 'current' sheet.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
April 20, 2016 at 8:31 am
I'm confused as to why you need a for each loop to read in one file. You said that you have a new file each day. Where the multiples come in is that each file has multiple tabs. This should be represented by multiple data flows, not by looping w/ a For Each container. I recommend the following:
1. Create a data flow for each tab.
2. <In Data Flow>Source Component to get columns for the tab you are working
3. <In Data Flow>Derived Column Transformation to add in a column to signify the tab you are working
4. <In Data Flow> Destination column for table.
I think you said you'd need to add in the missing columns for each tab? Is this just to make the tab have the same columns as the detination? Would these all be NULL, or where would you get values? Either way, you can add the columns in via the Derived Column transform in step 3.
Am I missing something?
April 20, 2016 at 8:38 am
HI John,
I only added the missing columns to each tab, to test the ForEachLoop container. But then I encountered some problems. So I dropped that idea.
Currently, the solution that I have is very similar to what you have suggested:
1. Create a data flow for each tab.
2. <In Data Flow>Source Component to get columns for the tab you are working
3. <In Data Flow>Derived Column Transformation to add in a column to signify the tab you are working
4. <In Data Flow> Destination column for table.
This does not have any NULL columns.
Could you please help me with Point number 3? How should I create a variable which will get me the names of the excel tab please?
Regards
D
April 20, 2016 at 8:59 am
If you have a separate data flow for each tab, you don't need to store them in a variable because they don't change. Just add the value (DT_STR,50,1252)"Your Tab Name Here" as a new column in the Derived Column Transform. Adjust the data type and length to match your tab name column.
April 21, 2016 at 2:18 am
Hi John,
This works, thanks for your help.
Regards
D
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply