January 5, 2009 at 5:37 am
Hi,
I need to join two data (say from falt files) and pick only the first matching record in the right table for each record in the left table.
For eg: Say the left table has the list of managers and the right table has the list of managers and the name of the employees under him (say sorted by experience), I need to pick the top 1 employee for each manager. In case i do an INNER JOIN i am going to get all the employees under each manager which i dont need. I just need the first guy.
January 5, 2009 at 5:48 am
How do you define the first guy??
SELECT * FROM Source INNER JOIN (
SELECT Manager_id, MIN(emp_id) FROM Source GROUP BY Manager_ID
) dtFirst ON Source.Manager_id = dtFirst.Manager_id and Source.emp_id = dtFirst.emp_id
inner join Mangers....
January 5, 2009 at 6:16 am
Thanks for the post. But the source tables are not physical tables and the data read are from 2 different flat files. And so i wont be able to use such queries for the purpose.
in other words, the data from the two files are the metadata flowing through the components in the data flow.
January 5, 2009 at 7:06 am
Is it possible to fetch only the first matching record in the right table using the MERG JOIN component in the data flow?
January 5, 2009 at 7:24 am
Well step 1 in any etl is EXTRACT.
So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.
January 5, 2009 at 7:54 am
Ninja's_RGR'us (1/5/2009)
Well step 1 in any etl is EXTRACT.So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.
Right on - except I'd use a permanent staging table if this is something that you will be running repeatedly.
Phil
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 5, 2009 at 8:18 am
I'm new to SSIS.
Could you pls enlighten on how to use a temp table in SSIS?
January 5, 2009 at 8:56 am
Create a table (like any other table). Make sure the name is significant like Staging_Managers.
The insert all the data in that table. (Make sure you truncate before the load... or after).
Repeat for all files and you're good to go.
January 5, 2009 at 9:04 am
Thanks for the info
January 5, 2009 at 6:37 pm
You could also use Sort transform on the employees path, sort by manager_id and remove duplicates I think.
Regards
Piotr
...and your only reply is slàinte mhath
January 6, 2009 at 5:24 am
I'm a bit curious here (never used SSIS). So you are saying that this sort transform basically only keeps 1 row PER manager. Can you decide which row the keep based on a criteria? Or does it all work by saving the top (n) rows order by (list of columns here)?
January 6, 2009 at 5:34 am
The sort transform is a mechanism whereby data in a pipeline can be sorted prior to sending it on to the next task.
There is an option 'Remove Rows with Duplicate Sort Values' which allows you to filter the data which is passed on down the pipeline. I have not used it, so I am not sure how it decides which rows to discard ...
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
January 6, 2009 at 5:56 am
I have not tried out this sort option. Should.
Anyways... i solved the issue by adding a script task which would add a RowNum (by partition) and then i pick the records whose rownum = 1
January 6, 2009 at 11:34 am
The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.
Regards
Piotr
...and your only reply is slàinte mhath
January 6, 2009 at 6:11 pm
Piotr Rodak (1/6/2009)
The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.Regards
Piotr
But what does 'first' mean, in this context? If you have duplicates based (only) on the sort criteria, surely there is no concept of order within those duplicates?
Phil
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply