March 13, 2008 at 9:20 am
Is there a way, in the SSIS data flow, to retain in the data flow only the records that have the max value in a certain field, grouped by other fields?
I have a contract number and version nb, of which there may be multiple records, and for each distinct contract nb and version nb, I only want the record with max(sequence nb) for that contract nb and version nb.
So if coming into my data flow I have
contract nb, version nb, sequence nb
10000, 001,0
10000,001,1
10000,001,2
10001,001,0
I want to end up with 2 records:
10000,001,2
10001,001,0
In Transact-SQL I would express it
Select cntrc_nb, ver_nb, max(seq_nb)
from table
group by cntrc_nb, ver_nb
Is there a way to do that in the data flow? I looked at conditional split, but haven't figured out how to do it.
Thanks,
Holly
March 13, 2008 at 10:15 am
You should be able to use an aggregate component. Let me know if you need help configuring this.
Kindest Regards,
Frank Bazan
March 13, 2008 at 11:20 am
hey frank can i know how can we do that using aggragate function ...bcoz we need to mention the distinct nid and the max of the other two columns so how can we do that suing aggregatio plz let me know
Thanks,
Chinna
Its the Journey which gives you Happiness not the Destination-- Dan Millman
March 13, 2008 at 11:32 am
Thank you, Frank. I see how to configure the Aggregate component, I think, and that allows me to select the correct sequence nb. But I don't see how to get the rest of the fields from the source component back into the data flow. In addition to the fields I mentioned, there are a number of text fields, which I can't bring into the aggregate component, because they aren't relevant to the grouping.
I wanted to use a merge join, but the Source component only has one output, so I don't see how to get the remaining fields back into the flow.
Holly
March 13, 2008 at 11:38 am
From what I've understood in the original question you wanted the equivalent of T-SQL's:-
SELECT DISTINCT contract nb, version nb, MAX(sequence nb).
You drop an aggregate component on to the dataflow. Connect the pipeline to it. Then check all 3 columns. In the column list on the bottom of the component editor choose the required operation for each column. In this case you want to GROUP BY for contract nb, version nb and MAXIMUM for sequence nb.
Note that you will only see the MAX, MIN and other mathematical functions with an number datatype.
Hope this helps.
Kindest Regards,
Frank Bazan
March 13, 2008 at 11:43 am
Sorry I posted my last before I saw your next post...
In which case what you have to do is a multicast. You aggregate one stream and then merge it back to the non aggregated pipeline.
In order to do this you just need to make sure that the joining columns are present in both streams. In your case you should be able to join back using your 2 text fields (one of them was contract nb, can't remember the other). You may need to add a sort or two in there as well before you can merge.
Kindest Regards,
Frank Bazan
March 13, 2008 at 11:57 am
Perfect!!! I had to put a Sort in both the flows and then used a Merge Join to get them back together.
I really do appreciate your help. I might have figured that out after much time, after all they say monkeys could write Shakespeare given enough time (I don't believe that for a minute!). But thanks to your help I have it working in a time frame my employer likes.
Holly
March 13, 2008 at 1:40 pm
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply