Data Transfer from staging table into main table

Question

Data Transfer from staging table into main table

monilps

SSCommitted

Points: 1686
More actions
January 14, 2014 at 2:50 pm

#281103

Hi,
What is the best way to transfer data from the staging table into the main table.
Example:
Staging Table Name: TableA_satge (# of rows - millions)
Main Table Name: TableA_main (# of rows - billions)
Note: Staging table may have some data same as the main table.
Currently I am doing:
- Load data into staging table (TableA_stage)
- Remove any duplication of rows from the staging table (TableA_stage)
- Disable all indexes on main table (TableA_main)
- Insert into main table (TableA_main) from staging table (TableA_stage)
- Remove any duplication of rows from the main table using CTE (TableA_main)
- Rebuild indexes on main_table (TableA_main)
The problem with the above method is that, it takes a lot of time and log file size grows very big.
Thanks in advance!!

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 1

With SSIS you can use the Lookup component to easily verify if a row from the staging table is an insert (new row) or an update (already existing row).

Write only the inserts to an empty table and then use partition switching to add the rows to the main table (I hope you have enterprise edition).

Make sure the recovery model is set to simple.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

monilps SSCommitted Points: 1686 More actions · Answer 2

It's very slow... Transferring into the cache

Do you have any other suggestion... Also, I don't have enterprise environment.

Thank you!!

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 3

monilps (1/15/2014)
It's very slow... Transferring into the cache

How did you configure it? I hope you wrote a SQL statement and didn't use the dropdown menu to select the table.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

monilps SSCommitted Points: 1686 More actions · Answer 4

I used a drop down menu, but will try with SQL Statement.

Thanks.

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 5

monilps (1/15/2014)
I used a drop down menu, but will try with SQL Statement.
Thanks.

If you use the dropdown statement you're going to load the entire table with billion of rows into memory, so no wonder it is slow.

In the SQL SELECT statement, only select the business key. That's all you need to find out if a row is an insert or an update.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

monilps SSCommitted Points: 1686 More actions · Answer 6

monilps

SSCommitted

Points: 1686

January 16, 2014 at 8:07 am

#1680677

It's a heap table..

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 7

monilps (1/16/2014)
It's a heap table..

I don't see why that is relevant?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

monilps SSCommitted Points: 1686 More actions · Answer 8

monilps

SSCommitted

Points: 1686

January 16, 2014 at 1:50 pm

#1680854

I have to pull all the columns to compare it.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 9

No keys, full column compare, millions of rows?

Yeah, that's gonna take a while, no way to avoid it.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 10

monilps (1/16/2014)
I have to pull all the columns to compare it.

That does not make it a heap. A heap is a table without a clustered index. Doesn't mean one or more columns can't form a natural key.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 11

If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 12

Koen Verbeeck (1/16/2014)
If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.

That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.

I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 13

Evil Kraig F (1/16/2014)
Koen Verbeeck (1/16/2014)
If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.
That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.
I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.

It's checksum that can give (more easily) problems, but with a decent hash function (SHA256 or higher) this should work.

In theory 😀

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

eliseufermino 8001 Newbie Points: 3 More actions · Answer 14

Hi Friend,

Do you have experience to use the command MERGE. It's great for your case!

Sincely

Eliseu Fermino - Brazil