January 14, 2014 at 2:50 pm
Hi,
What is the best way to transfer data from the staging table into the main table.
Example:
Staging Table Name: TableA_satge (# of rows - millions)
Main Table Name: TableA_main (# of rows - billions)
Note: Staging table may have some data same as the main table.
Currently I am doing:
- Load data into staging table (TableA_stage)
- Remove any duplication of rows from the staging table (TableA_stage)
- Disable all indexes on main table (TableA_main)
- Insert into main table (TableA_main) from staging table (TableA_stage)
- Remove any duplication of rows from the main table using CTE (TableA_main)
- Rebuild indexes on main_table (TableA_main)
The problem with the above method is that, it takes a lot of time and log file size grows very big.
Thanks in advance!!
January 14, 2014 at 3:09 pm
With SSIS you can use the Lookup component to easily verify if a row from the staging table is an insert (new row) or an update (already existing row).
Write only the inserts to an empty table and then use partition switching to add the rows to the main table (I hope you have enterprise edition).
Make sure the recovery model is set to simple.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 15, 2014 at 1:49 pm
It's very slow... Transferring into the cache
Do you have any other suggestion... Also, I don't have enterprise environment.
Thank you!!
January 15, 2014 at 2:10 pm
monilps (1/15/2014)
It's very slow... Transferring into the cache
How did you configure it? I hope you wrote a SQL statement and didn't use the dropdown menu to select the table.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 15, 2014 at 2:27 pm
I used a drop down menu, but will try with SQL Statement.
Thanks.
January 15, 2014 at 2:48 pm
monilps (1/15/2014)
I used a drop down menu, but will try with SQL Statement.Thanks.
If you use the dropdown statement you're going to load the entire table with billion of rows into memory, so no wonder it is slow.
In the SQL SELECT statement, only select the business key. That's all you need to find out if a row is an insert or an update.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 16, 2014 at 8:07 am
It's a heap table..
January 16, 2014 at 1:47 pm
monilps (1/16/2014)
It's a heap table..
I don't see why that is relevant?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 16, 2014 at 1:50 pm
I have to pull all the columns to compare it.
January 16, 2014 at 1:56 pm
No keys, full column compare, millions of rows?
Yeah, that's gonna take a while, no way to avoid it.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 16, 2014 at 1:58 pm
monilps (1/16/2014)
I have to pull all the columns to compare it.
That does not make it a heap. A heap is a table without a clustered index. Doesn't mean one or more columns can't form a natural key.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 16, 2014 at 1:59 pm
If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 16, 2014 at 2:00 pm
Koen Verbeeck (1/16/2014)
If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.
That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.
I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 16, 2014 at 2:02 pm
Evil Kraig F (1/16/2014)
Koen Verbeeck (1/16/2014)
If you have to compare all columns, just calculate a hash over all those columns and store it alongside in the table. Then you can do your lookup using only the hash.That only works if you have a key to compare with for deltas. You can get the same hash result (at least you can with checksum) from different rows depending on the content, and the more rows you have the higher the chance for a hash duplication.
I only rely on hash/checksum for checking deltas based on keys. I wouldn't use it for full row compares without a secondary association.
It's checksum that can give (more easily) problems, but with a decent hash function (SHA256 or higher) this should work.
In theory 😀
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
January 17, 2014 at 6:23 am
Hi Friend,
Do you have experience to use the command MERGE. It's great for your case!
Sincely
Eliseu Fermino - Brazil
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply