Data Transfer from staging table into main table

  • Hi,

    What is the best way to transfer data from the staging table into the main table.

    Example:

    Staging Table Name: TableA_satge (# of rows - millions)

    Main Table Name: TableA_main (# of rows - billions)

    Note: Staging table may have some data same as the main table.

    Currently I am doing:

    - Load data into staging table (TableA_stage)

    - Remove any duplication of rows from the staging table (TableA_stage)

    - Disable all indexes on main table (TableA_main)

    - Insert into main table (TableA_main) from staging table (TableA_stage)

    - Remove any duplication of rows from the main table using CTE (TableA_main)

    - Rebuild indexes on main_table (TableA_main)

    The problem with the above method is that, it takes a lot of time and log file size grows very big.

    Thanks in advance!!

  • With SSIS you can use the Lookup component to easily verify if a row from the staging table is an insert (new row) or an update (already existing row).

    Write only the inserts to an empty table and then use partition switching to add the rows to the main table (I hope you have enterprise edition).

    Make sure the recovery model is set to simple.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • It's very slow... Transferring into the cache

    Do you have any other suggestion... Also, I don't have enterprise environment.

    Thank you!!

  • monilps (1/15/2014)


    It's very slow... Transferring into the cache

    How did you configure it? I hope you wrote a SQL statement and didn't use the dropdown menu to select the table.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I used a drop down menu, but will try with SQL Statement.

    Thanks.

  • monilps (1/15/2014)


    I used a drop down menu, but will try with SQL Statement.

    Thanks.

    If you use the dropdown statement you're going to load the entire table with billion of rows into memory, so no wonder it is slow.

    In the SQL SELECT statement, only select the business key. That's all you need to find out if a row is an insert or an update.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • It's a heap table..

  • monilps (1/16/2014)


    It's a heap table..

    I don't see why that is relevant?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I have to pull all the columns to compare it.

  • No keys, full column compare, millions of rows?

    Yeah, that's gonna take a while, no way to avoid it.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • monilps (1/16/2014)


    I have to pull all the columns to compare it.

    That does not make it a heap. A heap is a table without a clustered index. Doesn't mean one or more columns can't form a natural key.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (1/16/2014)


    If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.

    That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.

    I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (1/16/2014)


    Koen Verbeeck (1/16/2014)


    If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.

    That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.

    I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.

    It's checksum that can give (more easily) problems, but with a decent hash function (SHA256 or higher) this should work.

    In theory 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Friend,

    Do you have experience to use the command MERGE. It's great for your case!

    Sincely

    Eliseu Fermino - Brazil

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply