November 15, 2011 at 10:10 pm
HI, i have 7000 rows in my excel file and some of them are repeated rows. i want to transform the data from excel source to oledb destination. to get the distinct rows into my destintion which transformation is better or is there any other way to get the data without repeating rows?
November 16, 2011 at 3:23 am
priya.pk20 (11/15/2011)
HI, i have 7000 rows in my excel file and some of them are repeated rows. i want to transform the data from excel source to oledb destination. to get the distinct rows into my destintion which transformation is better or is there any other way to get the data without repeating rows?
Two options:
* use a SORT component. It has a checkbox at the bottom that you can select to filter out duplicate values. This can have bad performance, but for 7000 rows you should be fine.
* stage the data into a staging table in SQL Server and use TSQL to delete the duplicate rows. This has the best performance, but for 7000 rows this might be overkill.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 17, 2011 at 9:45 am
Thank you koen:-),
from your answer, i have another question, if the data is larger than 7000 rows , what would be the best way..:satisfied:
thanks
priya
November 17, 2011 at 10:13 am
Use the suggested staging table (may be temporary) and SELECT DISTINCT.
November 17, 2011 at 10:21 am
is there any other ways to remove redundant data in loading using ssis?
November 17, 2011 at 11:49 am
Staging data into a table is the best and cleanest way as suggested above.
But if for some reason you dont want to use this approach, you could try reading distinct rows in the
source editor itself. In the excel source editor choose SQL Command as the data access mode.
Write a query similar to the following:
select distinct col1,col2,col3
from [sheet1$]
Sam Vanga
http://SamuelVanga.com
November 17, 2011 at 12:11 pm
priya.pk20 (11/17/2011)
is there any other ways to remove redundant data in loading using ssis?
Not with the generic SSIS Excel Source. You would need a C# program that would use Interop to read the Excel sheet.
November 17, 2011 at 11:19 pm
samvanga (11/17/2011)
Staging data into a table is the best and cleanest way as suggested above.But if for some reason you dont want to use this approach, you could try reading distinct rows in the
source editor itself. In the excel source editor choose SQL Command as the data access mode.
Write a query similar to the following:
select distinct col1,col2,col3
from [sheet1$]
Does the JET provider (or ACE for 2007 and up) recognise the DISTINCT keyword? If it does, that would be great.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 18, 2011 at 8:02 am
Koen Verbeeck (11/17/2011)[hrDoes the JET provider (or ACE for 2007 and up) recognise the DISTINCT keyword? If it does, that would be great.
Turns out it doesn't. I just learned this
Earlier i was using 97-2003 version of excel. And it worked with a JET 4.0 provider. Below is the complete connection sting i was using.
Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=D:\Documents and Settings\New Microsoft Excel Worksheet.xls
;Extended Properties="Excel 8.0;HDR=YES";
Sam Vanga
http://SamuelVanga.com
November 18, 2011 at 12:54 pm
Here is an article on how to do this:
http://merlecarr.wordpress.com/2011/03/29/ssis-select-distinct-from-excel-data-source/
Thanks,
Fitsum
November 18, 2011 at 2:56 pm
fitsumkh (11/18/2011)
Here is an article on how to do this:http://merlecarr.wordpress.com/2011/03/29/ssis-select-distinct-from-excel-data-source/
Thanks,
Fitsum
Wow! Very smart!
Thanks -- I learned something...
November 30, 2011 at 9:06 am
thanks for all, for ur sharings..
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply