SQL to do a upsert or update/insert

  • I am using SQL 2005.

    I have a 2 tables. A source and a destination table.

    The source table is refreshed every 30 seconds with data that contains (updates to records and new records) from a flat file using SSIS.

    I want to update/insert to the destination table from the source table. The source table will have approx 10,000 records.

    What is the best way to accomplish this? Or should I say most efficient way?

    Thanks

    Teekay

  • Probably would be faster to truncate the destination table and bulk insert from the source to the destination.

    I really do not see a reason to spend time trying to figure out what to update and what to insert, if you need all the data from the source.

    This really does not even need to be a package. You could do something like

    TRUNCATE TABLE MyDatabaseName.dbo.DesinationTable;

    INSERT INTO MyDatabaseName.dbo.DestinationTable

    SELECT *

    FROM MyDatabaseName.dbo.SourceTable WITH (TABLOCK);

  • I forgot to mention that you can set the tsql I posted into a job to run as often as you want.

    I do not know what recovery model you have selected for the destination database,but using bulk logged, will give you minimally logged transactions, as you are not replicating. So, you should consider using the bulk logged recovery model for the insert and then switch back to full.

    If you are using the simple recovery model, you needn't worry about minimally logged transactions.

  • I'd typically add a column to the source that loads with a "loaded" value. That way you know what's been sent in. Then you could update xx records with a "processing" or other token. this is in case your process breaks or runs too slow.

    For Upsert (Pre-2008), I usually match the source with the destination and do updates:

    update dest

    set destcol = sourcecol

    where sourcepk = destpk

    and token = 'processing'

    Once that's done, I delete that data (or set a new token, depending on needs)

    delete source

    from dest

    where sourcepk = destpk

    and token = 'processing'

    then everything that's left is an insert.

    Note that SSIS can do most of this with better package programming, removing the need for a source table.

  • To give a more correct answer, I think we need more information about the tables and the data. We have just about enough info to give a swap, but that's about it.

    If you could post the DDL of the tables, some sample data in the form of unioned inserts, and what the expected results would look like, we could give you a much better answer.

    😎

  • Just rethinking about the problem here. But another solution would be to use DML triggers. This would fit your requirement and give a near real time replica of the source table. This would elimate your need to run the process every 30 seconds. The triggers should be set to use set based inserts/updates/deletes.

    Like Lynn suggested, post your DDL and some test data. We do not know the tables or the data involved. We do not even know if these tables are in the same data base, or even on the same instance.

  • How many rows in the desitination table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • He said there are aprox. 10,000, now, though records can be inserted and updated.

  • Thanks Adam, but he said

    "The source table will have approx 10,000 records."

    I want to know how many rows are in the destination table. That will answer a lot of questions on how this should be done.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank for eveyones input... Sorry for the delayed response... I have been down and out with the flu for that past few days.

    The data table are fairly straight forward (Text (MAX 60), Integers, Money and Dates) The tables have primary keys. I will be receiving about 10,000 records in the csv file.

    I am trying to replicate a stage table to a production table (the two tables are identical). I am doing all the conversion/error checking from the .csv files to the stage table using SSIS. Once the data is successfully loaded into the stage table, I want to quickly move the data into production.

    SSIS seem to be a lot of work for what i am trying to accomplish. Basically update all the records in production that are different in stage. Any new records in stage that do not exist in production should be inserted.

    Deletions are handled sperately.

    The front end of the database will be a web application. So people will be reading the data sets when I am writing to the database.

    So I want to make updating/inserting/deleting to/from the production database least evasive as possible. (I may have to do some data caching on the application side to avoid incomplete data sets.... but this is just a thought)

    Thanks again,

    Teekay

  • I am trying to replicate a stage table to a production table (the two tables are identical).

    You're still not telling the whole story here... you already said the staging table has about 10,000 rows... how many does the destination table have? It's very important for this...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Let me ask it a slightly different way... could the content of the staging table be used to replace 100% of the final destination table in production?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I agree with Jeff. It depends on how you are moving the data as to the best advice.

  • Did I just see him mention that there were at most 60 TEXT fields? As in TEXT, not varchar?

    Wow...that's going to hurt.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thank you all again for your responses....

    Jeff -The production table will have an estimated 30,000 rows to start and will grow over time as records are added. We will be archiving data yearly, so I do not foresee the table to go beyond 500,000 records (and this is a VERY high estimate)

    Here is how this works... hopefully I can explain this process easily.

    A legacy process is sending .csv file every 30 seconds or so. the .csv file contains the last 7 days of orders. It sends the last 7 days of orders incase the system goes offline or there is problem. This way, the system can easliy recover itself. We may change the number of days to be less, but we are in the beginning stages of this project, so we are just testing how this will work.

    The .csv file is FTPed and FileWatcher waits for the transfer to complete. Once the transfer is complete, SSIS deletes all the data in the stage table and imports the data from the .csv file. The SSIS process handles all the data conversion and error checking. Once the data is in the stage table, I what to update the records that have change in stage to production and insert new records in stage to production. Deletion of records will be handled separate.

    Matt - The comment about "TEXT" is incorrect. I am using "VARCHAR(60)" in SQL 2005. However, I am not sure why SSIS does not offer "VARCHAR" as a data type of OLE. This is why a made this typo. Sorry for the confusion.

    Hopefully this helps.

    Teekay

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply