April 21, 2009 at 3:16 am
Everyone,
Please guide me which one is good for big databases.
Scenario:
I have to compare two of my tables one is archived and other is production. i want to update my archive table with respect to the greater modified date, so i have lets say 10000 records which have greater modified date and all that are in my #temp table.
NOW i have two ways:
- i can update those records from temp table to my archive table
and
- i can delete those records from archived and INSERT again from temp to archive table.
Records may increase from 10000 also.
what should i do in order to keep performance issue in mind.??
Thanks,
Noman
April 21, 2009 at 7:53 am
I would update the rows that exist and insert the ones that don't. Like this:
UPDATE Archive
SET COLUMN_list = T.Column_List
FROM
Archive A JOIN
#temp T ON
A.primary_key = T.primary_key
INSERT INTO Archive
(
COLUMN_list
)
SELECT
T.COLUMN_list,
FROM
#temp T LEFT JOIN
Archive A ON
T.primary_key = A.primary_key
WHERE
A.primary_key IS NULL
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 21, 2009 at 9:42 am
I like Jack's solution, though I might update, delete the matches, insert the rest. Not sure which is better.
April 21, 2009 at 4:48 pm
Noman - I have run into something very similar to this in some of the processes we run, after doing some exhaustive searches I found a combination of a few things that helped and were recommended by a few experts: Transactions and Truncating. Both appear to increase the rate at which a process like yours will take place. I created some sample data (9 columns by 10000 rows) and then created an archive table that had half as many records. I used Jack's answer to check against and below are the stats:
Truncate and Transactions: 62ms average total execution time over three trials same data
Update and then Insert: 168ms average total execution time over three trials same data
Either way you can't blink your eyes faster than this gets done with 10000 records at 9 wide.
Truncate Table Archive
Begin Tran
INSERT INTO Archive
(
COLUMN_list
)
SELECT T.COLUMN_list
FROM #temp T
Commit
April 21, 2009 at 7:43 pm
Nice one Wesley. Certainly truncate will be fast. Of course any indexes will affect the solutions as well.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 22, 2009 at 8:16 am
Jack, how so? I am not familiar enough with indexes to understand your point. Will they have to be rebuilt after a truncate?
April 22, 2009 at 8:26 am
Indexes either speed up or slow down the Updates (how's that for being specific) and will slow down the insert, especially for a clustered index.
For example in your test data, if you add a clustered index on audit_id, the update/insert, because you are doing only an update, on my laptop the average was around 140ms and the truncate/insert was around 100ms. I'd have to run a few more tests to really get a good feel, but that's the general idea.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 22, 2009 at 8:28 am
So, if I understand correctly, a good index will likely speed up the update process, but have no bearing on the insert?
April 22, 2009 at 8:43 am
Here's the way I understand it, and I could be wrong, so don't take this for the definitive answer. Check out Gail Shaw's[/url] and Kimberly Tripp's[/url] blogs, they have some excellent posts on indexes.
If the columns in the index are being updated then the index will likely slow down the update because SQL Server has to update the index pages as well. In the case of an index on a primary key that will NOT be updated it should "find" the rows to update faster and thus the update will be faster.
Inserts will probably be a bit slower because the index pages will have to be updated as well, particularly if the insert does an insert in the middle of a clustered index. This is why, especially in OLTP systems, it is recommended that your clustered index be on a monotonically increasing value like an identity column or a created date. Think about it like this, if I have a clustered index on last name and I insert 'Aaron, Henry' which will be at the beginning of the index, every row has to be moved, so that insert will be slower than if I have an identity column as the clustered index and insert 'Aaron, Henry'.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
April 22, 2009 at 8:56 am
Thanks, Jack your response, and the links, proved very informative.
April 23, 2009 at 7:18 pm
I sense extreme danger... someone tell me why you'd truncate a perfectly good archive table.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2009 at 7:40 pm
If you truncate the table, then you will lose all the data to include anything that may not be in your current production file. Is that what you really want to do? Also, what if there is a record in your archive table that just happens to have a modified date within the range you are pulling down from Production but the record is no long in Production? If you simply delete out everything from your archive table within that date range, you are going to lose that record. Again, I am not sure that is what you want. I'd say update matching records first, then insert the new ones. Just my take but I am not sure what your exact business needs are.
April 24, 2009 at 8:35 am
I had thoughts about that too. The conclusion I came to was that it's not truly an archive table, as I know it. An archive table would be filled with data that was correct on a certain date. In the processes I have created recently there needed to be a way to check a table for changes, or multiple tables for changes across one person/place/thing. Triggers are not desired, so what I was left with was comparing an image (archive) of the table to its current state. So that's one reason. The other is what the original poster stated:
NOW i have two ways:
- i can update those records from temp table to my archive table
and
- i can delete those records from archived and INSERT again from temp to archive table.
...what should i do in order to keep performance issue in mind.??
Jeff, I loved the "I sense extreme danger", almost had coffee in my sinuses.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply