SSIS question - Determining if an inv num exists in a table and doing something with the data afterwards...

  • Just wanted to start by saying I am a newbie to SSIS...

    What I have is a file that I am importing to a stage table. There are two tables after stage (current & history), the current table is a current picture of the data, and history has all of the data.

    Here is my logic:

    1 - If the invoice number doesn't exist in the current or history table, insert into both from stage.

    2 - If the invoice number exists in the current table,

    a - insert from stage to history and

    b - update from stage to current

    Could you include an example of whichever option you think would be best? Thanks in advance...I have started the insert into both sql code but not the insert/update....The code below works great for inserting data only once, but I wasn't sure of the logic to test on a bigger if...then statement...(i.e. Grab invoice number, then If invoice number doesn't exist, do code below, if exists, do the insert/update)

    INSERT INTO current

    (Invoice,

    Orig_FSC,

    Current_FSC)

    SELECTst.Invoice,

    st.Orig_FSC,

    st.Current_FSC

    FROM Stage st

    left join current cur on st.invoice = cur.invoice

    Where

    Not Exists(Select 1 from current RT where st.invoice = RT.invoice)

    INSERT INTO history

    (Invoice,

    Orig_FSC,

    Current_FSC)

    SELECTst.Invoice,

    st.Orig_FSC,

    st.Current_FSC

    FROM Stage st

    left join history hist on st.invoice = hist.invoice

    Where

    Not Exists(Select 1 from history RT where st.invoice = RT.invoice)

  • To start, I wouldn't think of doing the above with plain T-SQL. SSIS moves data using the Data Flow Task. Use one of those.

    Use an OLE DB Source to retrieve the "new" rows from your source. Use a Lookup component to find out if the row is in your "current" table, sending rows that aren't to the "no match" output. (I'm assuming SSIS 2008 - let me know if you're on 2005.) Send the "no match" rows to an OLE DB Destination to insert into your "current" table. Send the "match" rows to an OLE DB Command destination that issues an "INSERT INTO history (...) SELECT ... FROM current WHERE key = ?" type query. Then add another OLE DB Command that issues a "DELETE FROM current WHERE key = ?" query. Then add an OLE DB Destination component to insert those new rows into the "current" table.

    Clear as mud?

    Todd McDermid - SQL Server MVP, MCTS (SQL 08 BI), MCSD.Net
    My Blog - Dimension Merge SCD Component for SSIS - SSIS Community Tasks and Components

  • mikedu (2/21/2011)


    What I have is a file that I am importing to a stage table. There are two tables after stage (current & history), the current table is a current picture of the data, and history has all of the data.

    Here is my logic:

    1 - If the invoice number doesn't exist in the current or history table, insert into both from stage.

    2 - If the invoice number exists in the current table,

    a - insert from stage to history and

    b - update from stage to current

    Could you include an example of whichever option you think would be best? Thanks in advance...I have started the insert into both sql code but not the insert/update....The code below works great for inserting data only once, but I wasn't sure of the logic to test on a bigger if...then statement...(i.e. Grab invoice number, then If invoice number doesn't exist, do code below, if exists, do the insert/update)

    Well, as mentioned, once you're in the staging table you're pretty much done with SSIS. There's ways to try to deal with what you're doing in SSIS, but one thing at a time.

    In this case, I would probably do the following, after insert to the staging table:

    1- Insert all data to history table. You insert here no matter what you do, so just dump the data. In this case, you also might think of doing a multi-cast at the SSIS level and just insert to both targets anyway, history and staging, unless you're transactional or have some other external rules.

    2- Update all matched records in current

    3- Delete all matched records from staging (it's usually faster because of seeking then the anti-semi join scan for the LEFT JOIN / IS NULL routine. As always, you'll need to test your data volume.)

    4- Insert anything left in staging into current.

    Now, what else you would do in SSIS would depend on data volume, really. If your history contains 5 million records and you're only bringing in 5-10 changes at a time, I'd stay with staging. I try to use lookups only when the price of the cache of the lookup data is reasonable compared to the data I'm bringing in. It's a rough guess on my part with no serious benchmarking against it, though, so I'm not mentioning actual numbers. It's a test type of situation, but with more details we can probably help you with more optimization.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply