Itentifying and processing duplicates in flat file.

  • Hi All,

    I have a flat file like:

    HKey|GroupKey|Item

    1|11|Item1

    2|22|Item1

    2|22|Item2

    2|22|Item3

    3|33|Item1

    3|33|Item2

    4|44|Item1

    This is just sample data, in the real file 'Item' column can contain upto 250 characters and the file itself can contain 70,000 to 80,000 records.

    Now I want to process this file and concatinate Item column where Hkey's match. So the result should look like

    1|11|Item1

    2|22|Item1~Item2~Item3

    3|33|Item1~Item2

    4|44|Item1

    What is the best to identify duplicates and generate this file?

  • Have to ask, why would you want to to that?

    😎

  • You could do this in a script component withing your data flow, but Lynn makes a great point: why are you taking nice normalized data and messing it up?

  • This is a feed for an application that needs the data in that format. I think I am going to go for the script option.

    Thanks.

  • Is this nice data in a table already, or do you only have it in a text file?

    😎

  • It's in a text file.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply