Data Flow filtering puzzle

Question

Post reply

Data Flow filtering puzzle

Holly Kilpatrick

Ten Centuries

Points: 1195
More actions
March 13, 2008 at 9:20 am

#120462

Is there a way, in the SSIS data flow, to retain in the data flow only the records that have the max value in a certain field, grouped by other fields?
I have a contract number and version nb, of which there may be multiple records, and for each distinct contract nb and version nb, I only want the record with max(sequence nb) for that contract nb and version nb.
So if coming into my data flow I have
contract nb, version nb, sequence nb
10000, 001,0
10000,001,1
10000,001,2
10001,001,0
I want to end up with 2 records:
10000,001,2
10001,001,0
In Transact-SQL I would express it
Select cntrc_nb, ver_nb, max(seq_nb)
from table
group by cntrc_nb, ver_nb
Is there a way to do that in the data flow? I looked at conditional split, but haven't figured out how to do it.
Thanks,
Holly

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Frank Bazan SSCertifiable Points: 5378 More actions · Answer 1

You should be able to use an aggregate component. Let me know if you need help configuring this.

Kindest Regards,

Frank Bazan

Chinna SSCertifiable Points: 5408 More actions · Answer 2

hey frank can i know how can we do that using aggragate function ...bcoz we need to mention the distinct nid and the max of the other two columns so how can we do that suing aggregatio plz let me know

Thanks,
Chinna

Its the Journey which gives you Happiness not the Destination-- Dan Millman

Holly Kilpatrick Ten Centuries Points: 1195 More actions · Answer 3

Thank you, Frank. I see how to configure the Aggregate component, I think, and that allows me to select the correct sequence nb. But I don't see how to get the rest of the fields from the source component back into the data flow. In addition to the fields I mentioned, there are a number of text fields, which I can't bring into the aggregate component, because they aren't relevant to the grouping.

I wanted to use a merge join, but the Source component only has one output, so I don't see how to get the remaining fields back into the flow.

Holly

Frank Bazan SSCertifiable Points: 5378 More actions · Answer 4

From what I've understood in the original question you wanted the equivalent of T-SQL's:-

SELECT DISTINCT contract nb, version nb, MAX(sequence nb).

You drop an aggregate component on to the dataflow. Connect the pipeline to it. Then check all 3 columns. In the column list on the bottom of the component editor choose the required operation for each column. In this case you want to GROUP BY for contract nb, version nb and MAXIMUM for sequence nb.

Note that you will only see the MAX, MIN and other mathematical functions with an number datatype.

Hope this helps.

Kindest Regards,

Frank Bazan

Frank Bazan SSCertifiable Points: 5378 More actions · Answer 5

Sorry I posted my last before I saw your next post...

In which case what you have to do is a multicast. You aggregate one stream and then merge it back to the non aggregated pipeline.

In order to do this you just need to make sure that the joining columns are present in both streams. In your case you should be able to join back using your 2 text fields (one of them was contract nb, can't remember the other). You may need to add a sort or two in there as well before you can merge.

Kindest Regards,

Frank Bazan

Holly Kilpatrick Ten Centuries Points: 1195 More actions · Answer 6

Perfect!!! I had to put a Sort in both the flows and then used a Merge Join to get them back together.

I really do appreciate your help. I might have figured that out after much time, after all they say monkeys could write Shakespeare given enough time (I don't believe that for a minute!). But thanks to your help I have it working in a time frame my employer likes.

Holly

XPSCodes-667952 SSChasing Mays Points: 603 More actions · Answer 7

If joining columns are not present in both streams, you can choose to ignore the columns. It will be considered as nulls instead.

--------------------------------------------------------------------------------------
Save our world, its all we have! A must watch video Pale Blue Dot[/url]