February 9, 2009 at 10:37 pm
I have a database which summarises data from data extracted from an separate system. This process runs regularly (every 30 minutes) to grab recently changed rows. For the sake of performance, these changed rows are imported into a temporary table rather than directly into a live table. Once received however, an update process is required to update the main summary table with the recently received data. Note that the summary table enjoys a very high number of transactions because it is very important.
Two approaches I have explored with this is to:
1. Regenerate the summary table using TSQL under a different name and use delete, update and insert statements to modify the data in live on a row by row basis. (Currently implemented option).
2. Regenerate the summary table using TSQL under a different name, create new indexes on the new table, drop the old table and rename the new table back to the old name.
Is either of these superior to the other? Are the other alternatives that could be explored?
Cheers
February 11, 2009 at 5:38 am
It's hard to answer without having a lot more information. In general doing anything row by row is SLOW compared to using a set-based operation. It's better to do all the like operations in one statement or if there are huge amounts, you can split the operation into smaller batches (e.g. 1000 at a time).
Depending on how you know if something is deleted, you do all the deletes in one statement.
Depending on how you know if something is Inserted, do all the inserts in one statement .
Do the same for all the updates in one statement.
I'll work up some sample code and post later.
Toni
February 11, 2009 at 6:28 am
Cheers Toni
Some more information. The live table has 4,500,000 rows in it. Its data is refreshed ever 60 minutes. There are usually 5 or so fairly simple transactions running per second on the table. When doing a full update of this table from the replica table which was regenerated separately from the recently received data, this was currently done with a table join which subsequently would cause significant table / row lock issues, affecting both the update process as well as queries running on the table.
I have now implemented method 2 which causes no table locks and seems to have much less general load. Also takes slightly less time. I am not sure however if this is good practice since for a very brief moment, the summary table will be dropped before the new table is renamed - not sure if it possible for a user to get a failed transaction in between. I haven't seen this so I am not sure if it would be likely or not.
Perhaps using MS replication is another option?
February 11, 2009 at 8:27 am
Let me see if I have this correctly then. The Summary Table is only updated based on the changes being made as reflected in your recently received (extract) data. So you would create a copy of the current Summary Table, apply the changes from the extract, then swap the tables. Is that it?
In your second scenario, all changes are applied to a copy of the current table then you do a swap of the tables. Dropping and switching to a new table while running live seems to be asking for a problem to happen at some point in my view.
You could do transactional replication to your summary table from the copy of the current table as you apply each set of changes but only if all changes are coming from the copy/extract table only. This would keep the two tables in sync but adds complexity in how to manage and set up these processes.
The other alternative is that once you have the copy of the Summary table updated, then synchronize it with the current Summary table without joins through the following pseudo-code.
-- For all records in the copy table but not in the Summary table insert them
INSERT INTO Summary
SELECT data -- your datafields
FROM copy WHERE PK_Copy NOT IN (SELECT PK_Summary FROM Summary WITH (NOLOCK))
-- For all records in the Summary table but not in the copy table, delete them
DELETE FROM Summary
WHERE PK_Summary NOT IN (SELECT PK_Copy FROM copy)
-- This is trickier but add a column to your copy table to know if a row was updated
-- (or have it in your original table def)
ALTER TABLE copy
ADD updated CHAR(1) NULL
-- then update this for rows that show in the copy after applying the changes and are also found in the -- summary table BEFORE any inserts or deletes are applied to the summary table
-- If possible, it would be better to set this flag as you apply the changes to the copy of your current
-- summary table based on the extract table data.
UPDATE copy SET updated='Y' -- if you did not do it when applying your changes to the copy
WHERE PK_Copy IN (SELECT primary_key_Field FROM extract_table)
AND PK_Copy IN (SELECT PK_Summary FROM Summary WITH (NOLOCK))
-- Then do the updates based on this flag
UPDATE Summary SET data = c.data
FROM copy c WHERE PK_Summary = PK_Copy and Flag = 'U'
These should give you the desired results without joins and a minimal of locking. Again if there are a lot of changes occuring at once, then you might want to split the statements into batches (perhaps use an Identity and do a group (say of 1000 at a time) where Identity_Val < 1000 then one with it between 1001 and 2000, etc).
Hope this helps.
Toni
February 11, 2009 at 8:55 am
I did not test this part out but it should work too for the updates... Still has to be run before Inserts or Deletes.
-- do the updates without needing a flag
UPDATE Summary SET data = c.data
FROM copy c WHERE PK_Summary = PK_Copy
AND PK_Copy IN (SELECT primary_key_Field FROM extract_table)
AND PK_Copy IN (SELECT PK_Summary FROM Summary WITH (NOLOCK))
Of course, you should test it this or anything before applying to production.
Toni
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply