Redirect duplicate records using ssis

Question

Redirect duplicate records using ssis

nhimabindhu

Ten Centuries

Points: 1242
More actions
December 4, 2014 at 2:52 am

#293134

Hi,
We have a requirement like the source is excel and there are duplicates in the file we need to redirect the duplicates into one table and the unique records will be in another table. There may be 20+ columns in the excel sheet.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259155 More actions · Answer 1

Easiest option: dump records in a staging table. Then select again and use the ROW_NUMBER function to indicate duplicates. Everything with row number 1 goes into the unique table, all other row number go to the duplicate table.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

nhimabindhu Ten Centuries Points: 1242 More actions · Answer 2

We should not use a staging table while loading from source we should pull unique records in single table and the duplicate in next table without using row_number,rank & dense rank.

Koen Verbeeck SSC Guru Points: 259155 More actions · Answer 3

nhimabindhu (12/4/2014)
We should not use a staging table while loading from source we should pull unique records in single table and the duplicate in next table without using row_number,rank & dense rank.

Why not?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

nhimabindhu Ten Centuries Points: 1242 More actions · Answer 4

Is there any other approach other than this which we achieve the result as per requirement.:-)

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

If we don't understand your requirement, it's difficult to advise you. Such arbitrary stipulations make this sound a bit like a homework assignment. So, what makes you so hostile to staging tables and windowing functions?

John

Phil Parkin SSC Guru Points: 247037 More actions · Answer 6

nhimabindhu (12/4/2014)
Is there any other approach other than this which we achieve the result as per requirement.:-)

If your requirement is such that terrible performance is acceptable, there is. Are you happy to accept performance which is quite likely 10x worse than Koen's suggestion?

tripleAxe SSCertifiable Points: 5706 More actions · Answer 7

Well, I don't know how big your spreadsheets are, so don't know what the performance will be like for you with this, but how about you have two Excel Source data flow source components in the package. One for the duplicates and one for the unique records. Set the data access mode to each to be SQL command and then have the SQL command text like the two examples below (obviously you would need to add the extra columns and maybe change the sheet name).

Duplicate rows

SELECT F1,F2,F3,F4,F5

FROM `Sheet1$`

GROUP BY F1,F2,F3,F4,F5

HAVING COUNT(*) > 1

Unique rows

SELECT F1,F2,F3,F4,F5

FROM `Sheet1$`

GROUP BY F1,F2,F3,F4,F5

HAVING COUNT(*) = 1

Personally, I'd still go with the staging table suggestions made earlier.

stevefromOZ SSC-Forever Points: 43646 More actions · Answer 8

And if it *is* a homework problem, a quick google search produces some approaches that use SSIS transforms to handle this problem.

Steve.