How to pick only the first match in a inner join in SSIS

Question

Post reply

How to pick only the first match in a inner join in SSIS

Soorya

Ten Centuries

Points: 1314
More actions
January 5, 2009 at 5:37 am

#127773

Hi,
I need to join two data (say from falt files) and pick only the first matching record in the right table for each record in the left table.
For eg: Say the left table has the list of managers and the right table has the list of managers and the name of the employees under him (say sorted by experience), I need to pick the top 1 employee for each manager. In case i do an INNER JOIN i am going to get all the employees under each manager which i dont need. I just need the first guy.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

How do you define the first guy??

SELECT * FROM Source INNER JOIN (

SELECT Manager_id, MIN(emp_id) FROM Source GROUP BY Manager_ID

) dtFirst ON Source.Manager_id = dtFirst.Manager_id and Source.emp_id = dtFirst.emp_id

inner join Mangers....

Soorya Ten Centuries Points: 1314 More actions · Answer 2

Thanks for the post. But the source tables are not physical tables and the data read are from 2 different flat files. And so i wont be able to use such queries for the purpose.

in other words, the data from the two files are the metadata flowing through the components in the data flow.

Soorya Ten Centuries Points: 1314 More actions · Answer 3

Is it possible to fetch only the first matching record in the right table using the MERG JOIN component in the data flow?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

Well step 1 in any etl is EXTRACT.

So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.

Phil Parkin SSC Guru Points: 246960 More actions · Answer 5

Ninja's_RGR'us (1/5/2009)
Well step 1 in any etl is EXTRACT.
So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.

Right on - except I'd use a permanent staging table if this is something that you will be running repeatedly.

Phil

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Soorya Ten Centuries Points: 1314 More actions · Answer 6

I'm new to SSIS.

Could you pls enlighten on how to use a temp table in SSIS?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Create a table (like any other table). Make sure the name is significant like Staging_Managers.

The insert all the data in that table. (Make sure you truncate before the load... or after).

Repeat for all files and you're good to go.

Soorya Ten Centuries Points: 1314 More actions · Answer 8

Soorya

Ten Centuries

Points: 1314

January 5, 2009 at 9:04 am

#921216

Thanks for the info

Piotr.Rodak SSCrazy Eights Points: 9376 More actions · Answer 9

You could also use Sort transform on the employees path, sort by manager_id and remove duplicates I think.

Regards

Piotr

...and your only reply is slàinte mhath

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

I'm a bit curious here (never used SSIS). So you are saying that this sort transform basically only keeps 1 row PER manager. Can you decide which row the keep based on a criteria? Or does it all work by saving the top (n) rows order by (list of columns here)?

Phil Parkin SSC Guru Points: 246960 More actions · Answer 11

The sort transform is a mechanism whereby data in a pipeline can be sorted prior to sending it on to the next task.

There is an option 'Remove Rows with Duplicate Sort Values' which allows you to filter the data which is passed on down the pipeline. I have not used it, so I am not sure how it decides which rows to discard ...

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

Soorya Ten Centuries Points: 1314 More actions · Answer 12

I have not tried out this sort option. Should.

Anyways... i solved the issue by adding a script task which would add a RowNum (by partition) and then i pick the records whose rownum = 1

Piotr.Rodak SSCrazy Eights Points: 9376 More actions · Answer 13

The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.

Regards

Piotr

...and your only reply is slàinte mhath

Phil Parkin SSC Guru Points: 246960 More actions · Answer 14

Piotr Rodak (1/6/2009)
The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.
Regards
Piotr

But what does 'first' mean, in this context? If you have duplicates based (only) on the sort criteria, surely there is no concept of order within those duplicates?

Phil

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin