SS2K5 Newbie has question regarding SSIS Package Task...

  • As I noted in the subject line, I'm a newbie...especially with SS2K5. I'm creating a package to import Point of Care (POC) lab results into a SS table...easy enough and I've successfully done that, but without editing the table or associated indexes, I'm looking to prevent "duplicate" entries into the Patient table. We are receiving data from many practice groups and to date, we have none that are able to electronically provide us with either all of the data that we have requested or the same as another practice or group. Therefore, I have to keep things as generic as possible at this point.

    Does anyone have suggestions on how I can query the table in SS from SSIS and then only import those records that do not currently exist. Thanks in advance for your help

    Rick Mills, MT(ASCP)
    Analyst / Programmer
    Indiana Health Information Exchange
    rmills@ihie.com

  • I'm confused - are you dealing with one table or several? And if several, are you trying to not duplicate PK or the whole record? Just need more info - there's several ways to skin this cat. If dataset 1 and dataset 2 are both in SQL, then you can use some TSQL to exclude things you don't want. If some of your data is incoming from SSIS and your comparison set is in SQL already, then you can do the duplicate checking in SSIS, or load all the data, then execute some TSQL to clean stuff up. Can you provide a little more detail, table DDL, or package info?

  • Sorry for being so vague. I have files that I receive weekly that are text files. I want to compare these text files to the table in SQL, then only import the unique records from the text file or append them to the SQL table.

    Thanks for your help

    Rick Mills, MT(ASCP)
    Analyst / Programmer
    Indiana Health Information Exchange
    rmills@ihie.com

  • I agree with SCB, but first you need to import text file to the table(such as SQLStageTable) and then use some TSQL like this (Table and ID is the example)

    Insert Into SQLProductionTable

    Select * from SQLStageTable where ID not in (Select ID from SQLProductionTable)

    In my past experience, I handle more complicated requirements(such as conditioning modify, Add, conditioning delete) so I use OLE DB command and it come with cost.

    Note: 1) So many ways to do this task, this is just only some idea.

    2) By using Stage table, you can verify that all data come in right format better than upload to production directly.

  • Gotcha. The easiest SSIS route to follow here (if you don't want to pump your data into sql and then use TSQL) is to have a data flow task, with a flat file source, flowing into a lookup transform. the flat file source of course connects to your flat file. The lookup transform should be configured to join the key columns from the text file with the key columns in your database table. Then the outflow of the lookup transform (success) runs to a trash flat file destination (because rows that are already there, you don't want to reinsert), and the error outflow of the lookup transform runs to your database (because this row wasn't there, now you want to put it in). When you connect the failure constraint (the red arrow) to your oledb or sql server destination, you should get a popup asking if you want to ignore failure, redirect the row, or fail the component. You want to choose "redirect row".

  • As already stated, SSIS has many methods to achieve what you're looking for. Alot of the specific recommendations depends on the data you're receiving and what you consider a duplicate record. (I.e. primary key match, name match, etc.) As well, SSIS has some fuzzy lookups you should investigate.

    Also, consider looking at the slowly changing dimension task. It may be too rigid for your purposes, but it's quite a handy transformation when tring to do general updates or merge processes. (Caution against customizing it once you've run it through the wizard: subsequent updates via the wizard will wipe out any custom updates. Test before proceeding.)

  • As an extra option when using large datasets: you could check out the TableDiff component from sqlbi.eu.

    We have used it in a scenario where several million rows where to be checked (building history each night).

    The TableDiff component will then outperform the SCD (breakpoint somewhere between 100k and 1 mio records, I believe), and as a benefit: I found it easier to maintain.

    Peter Rijs
    BI Consultant, The Netherlands

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply