NEED TO REMOVE DUPLICATES IN FLATFILE

Question

Post reply

NEED TO REMOVE DUPLICATES IN FLATFILE

Kannan Vignesh

SSC Eights!

Points: 851
More actions
May 8, 2015 at 11:31 am

#296719

Hi Everyone,
which one is the best way to remove duplicates in flatfile before loading into DataBase Table.
currently i am using aggregate transformation. is any other transformation there for remove duplicate..?
kindly suggest.
thanks
Kannan

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63633 More actions · Answer 1

one option is to import into a staging table, then append the unique values to another table...

Kannan Vignesh SSC Eights! Points: 851 More actions · Answer 2

that is fine. but remove duplicates in flatfile itself..?

Lowell SSC Guru Points: 323755 More actions · Answer 3

removing duplicates will mean rewriting the file, so whether you put it in sq serverl, and sort+ group, or do it in memory via aggregates or script tasks, it still needs to be written to disk again.

i'd probably lean towards maybe a script task to suck it into a data table, where you can use linq to sort and group by, and output the fiel again

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

Kannan Vignesh (5/8/2015)
that is fine. but remove duplicates in flatfile itself..?

The absolute best way is to have the people that are creating the flat file do their jobs better and prevent the duplicates from being written to the flat file to begin with.

Heh... well, you did ask for the BEST way. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)