April 20, 2018 at 8:50 am
We have monthly loads of a THREE month of data: say, in April it's Jan, Feb, and Mar, in May - Feb, Mar, and Apr. This is because at the day of loads not all data for the latest month are complete, some records are missing, other may be updated, and so on (these are medical data on patient visits, procedures, lab, etc.). We need to add these data to the master table. Actually there are more than 20 table to be loaded
The strategy is as follows:
1. we find the minimum date in the 'load' file
2. delete all records from the 'master' table after this date - it may be from200,000 records up to 1,000,000.
3. insert the entire load table into the 'master' table
HOWEVER deleting will add a lot to the log.
Another approach:
1. we find the minimum date in the 'load' file
2. insert all records before this date from the 'master' to #temp
3. truncate the 'master' table
4. insert into master table everything from both '#temp' and 'load' tables
While this plan looks better for the log file we do not feel easy about truncating the 'master' table.
Can you suggest the best way?
Thanks
.
April 20, 2018 at 9:28 am
valeryk2000 - Friday, April 20, 2018 8:50 AMWe have monthly loads of a THREE month of data: say, in April it's Jan, Feb, and Mar, in May - Feb, Mar, and Apr. This is because at the day of loads not all data for the latest month are complete, some records are missing, other may be updated, and so on (these are medical data on patient visits, procedures, lab, etc.). We need to add these data to the master table. Actually there are more than 20 table to be loaded
The strategy is as follows:
1. we find the minimum date in the 'load' file
2. delete all records from the 'master' table after this date - it may be from200,000 records up to 1,000,000.
3. insert the entire load table into the 'master' table
HOWEVER deleting will add a lot to the log.
Another approach:
1. we find the minimum date in the 'load' file
2. insert all records before this date from the 'master' to #temp
3. truncate the 'master' table
4. insert into master table everything from both '#temp' and 'load' tablesWhile this plan looks better for the log file we do not feel easy about truncating the 'master' table.
Can you suggest the best way?
Thanks.
Partition the table by month. Use SWITCH to nearly instantly drop the old monthly partitions and add the new ones.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 20, 2018 at 9:48 am
We had, until recently, a weekly load of around 50 million rows into a table of 12,000,000,000 rows. It was taking about 8 hours, much of which was rebuilding the two ordinary indexes. Changing this to partition switching brought the time down to about 20 minutes, including rebuilding the indexes on the partition.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 18, 2018 at 12:00 pm
ChrisM@Work - Friday, April 20, 2018 9:48 AMWe had, until recently, a weekly load of around 50 million rows into a table of 12,000,000,000 rows. It was taking about 8 hours, much of which was rebuilding the two ordinary indexes. Changing this to partition switching brought the time down to about 20 minutes, including rebuilding the indexes on the partition.
If you were to also make the change from partitioned tables to partitioned views, you could have different indexes based on how old the data was. You can also more easily transfer just a couple of partitions to a test system than you can with partitioned tables (which is impossible to do in a straight forward manner).
Either way, I agree that partitioning is the way to go for something like this and it doesn't just help the load/delete processes. If the partitions become "static" in nature after a month or two, you can set such partitions to READ_ONLY, take one final backup, and never have to back them up again. I've got a single table 1.1TB database with such data in it and it contains monthly partitions since May of 2010. It takes 0-6 minutes to back up depending on where we're at for the current month because I only have to backup the current month and the next month and the next month is always empty. ๐
This is the table I wish I had used a partitioned view on instead of partitioning it as a table.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply