Help with my UPSERT ETL project

  • Was hoping someone could help me here, I have a project to create an SSIS package (SQL Server 2005) that will read from a CSV source (flat-file) and import to a SQL table.

    The flat file has a unique SEQ row that acts as a unique key/identifier, based on that the UPSERT will either update/insert.

    I followed the article here to create a test project:

    http://sqlserverpedia.com/blog/sql-server-bloggers/ssis-%E2%80%93-performing-an-upsert/

    I have a trigger defined as:

    CREATE TRIGGER [dbo].[trg_SSIS_Update]

    ON [dbo].[vw_SSIS_Import]

    INSTEAD OF INSERT

    AS

    BEGIN

    UPDATE import

    SET

    DATE = i.Date,

    NFIRST = i.NFIRST,

    NLAST = i.NLAST,

    BIRTHDAY = i.BIRTHDAY,

    COUNTRY = i.COUNTRY

    FROM dbo.csv_data import

    INNER JOIN inserted i

    ON i.SEQ = import.SEQ

    END

    Which is used by the UPSERT > OLEDB Destination so that it can update in bulk and not row by row for performance.

    Now what I would like to do, is implement some kind of "tracking" to be able to troubleshoot if there are errors, how would one create a mechanism that would show me all new/updated rows that the ETL does?

    For the sake of having to "troubleshoot" the data.

    Any help is greatly appreciated.

  • What sort of errors are you expecting? Any serious errors may not, of course, even make it into the db, so your envisaged method might not be a comprehensive solution.

    But you could always add 'created' date/user and 'last modified' date/user fields to your table too - they will help.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • That is a nice article you are using. But personally, I would create a temp table in the SSIS package and write the update rows to that table. Then perform the set based update using that temp table.

    This way, the BI developers don't have to touch the database and create views/triggers for every table that needs updating. Triggers are a bit tricky regarding to maintenance, as they can easily be forgotten when you deploy your project (although having the right settings in SSMS helps a lot).

    But back to your question:

    as Phil suggested, add two columns: creation_datetime with a default of getdate() and update_datetime, also with a default of getdate(). Then, when a record is updated, the update_datetime is set to the current getdate().

    If you need exact logging of how many records are inserted or updated, you can use the @@rowcount function.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply