Itentifying and processing duplicates in flat file.

Question

Post reply

Itentifying and processing duplicates in flat file.

Mukti

SSCrazy

Points: 2826
More actions
September 12, 2008 at 11:03 am

#123561

Hi All,
I have a flat file like:
HKey|GroupKey|Item
1|11|Item1
2|22|Item1
2|22|Item2
2|22|Item3
3|33|Item1
3|33|Item2
4|44|Item1
This is just sample data, in the real file 'Item' column can contain upto 250 characters and the file itself can contain 70,000 to 80,000 records.
Now I want to process this file and concatinate Item column where Hkey's match. So the result should look like
1|11|Item1
2|22|Item1~Item2~Item3
3|33|Item1~Item2
4|44|Item1
What is the best to identify duplicates and generate this file?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Lynn Pettis

SSC Guru

Points: 442467

September 12, 2008 at 11:16 am

#871192

Have to ask, why would you want to to that?

😎

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 2

You could do this in a script component withing your data flow, but Lynn makes a great point: why are you taking nice normalized data and messing it up?

Mukti SSCrazy Points: 2826 More actions · Answer 3

This is a feed for an application that needs the data in that format. I think I am going to go for the script option.

Thanks.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

Is this nice data in a table already, or do you only have it in a text file?

😎

Mukti SSCrazy Points: 2826 More actions · Answer 5

Mukti

SSCrazy

Points: 2826

September 12, 2008 at 2:43 pm

#871337

It's in a text file.