How to pick only the first match in a inner join in SSIS

  • Hi,

    I need to join two data (say from falt files) and pick only the first matching record in the right table for each record in the left table.

    For eg: Say the left table has the list of managers and the right table has the list of managers and the name of the employees under him (say sorted by experience), I need to pick the top 1 employee for each manager. In case i do an INNER JOIN i am going to get all the employees under each manager which i dont need. I just need the first guy.

  • How do you define the first guy??

    SELECT * FROM Source INNER JOIN (

    SELECT Manager_id, MIN(emp_id) FROM Source GROUP BY Manager_ID

    ) dtFirst ON Source.Manager_id = dtFirst.Manager_id and Source.emp_id = dtFirst.emp_id

    inner join Mangers....

  • Thanks for the post. But the source tables are not physical tables and the data read are from 2 different flat files. And so i wont be able to use such queries for the purpose.

    in other words, the data from the two files are the metadata flowing through the components in the data flow.

  • Is it possible to fetch only the first matching record in the right table using the MERG JOIN component in the data flow?

  • Well step 1 in any etl is EXTRACT.

    So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.

  • Ninja's_RGR'us (1/5/2009)


    Well step 1 in any etl is EXTRACT.

    So I'd put all that data into temp table then I'd start the transform tasks. Then my query would easily work.

    Right on - except I'd use a permanent staging table if this is something that you will be running repeatedly.

    Phil

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I'm new to SSIS.

    Could you pls enlighten on how to use a temp table in SSIS?

  • Create a table (like any other table). Make sure the name is significant like Staging_Managers.

    The insert all the data in that table. (Make sure you truncate before the load... or after).

    Repeat for all files and you're good to go.

  • Thanks for the info

  • You could also use Sort transform on the employees path, sort by manager_id and remove duplicates I think.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • I'm a bit curious here (never used SSIS). So you are saying that this sort transform basically only keeps 1 row PER manager. Can you decide which row the keep based on a criteria? Or does it all work by saving the top (n) rows order by (list of columns here)?

  • The sort transform is a mechanism whereby data in a pipeline can be sorted prior to sending it on to the next task.

    There is an option 'Remove Rows with Duplicate Sort Values' which allows you to filter the data which is passed on down the pipeline. I have not used it, so I am not sure how it decides which rows to discard ...

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I have not tried out this sort option. Should.

    Anyways... i solved the issue by adding a script task which would add a RowNum (by partition) and then i pick the records whose rownum = 1

  • The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • Piotr Rodak (1/6/2009)


    The sort transform returns the first row if duplicates are found in columns specified by sort criteria. At least it worked this way for us :), but if I were to implement this type of functionality, the easiest way would be to return either first or last row.

    Regards

    Piotr

    But what does 'first' mean, in this context? If you have duplicates based (only) on the sort criteria, surely there is no concept of order within those duplicates?

    Phil

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply