July 16, 2018 at 9:30 am
I have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
Thanks
July 16, 2018 at 9:41 am
You don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN
DELETE TOP (10000) FROM table WHERE condition;
IF @@ROWCOUNT = 0 BREAK;
END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.
If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
July 16, 2018 at 9:57 am
Scott Coleman - Monday, July 16, 2018 9:41 AMYou don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN
DELETE TOP (10000) FROM table WHERE condition;
IF @@ROWCOUNT = 0 BREAK;
END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
36.5M rows for each of the 3 tables.
July 16, 2018 at 10:08 am
quinn.jay - Monday, July 16, 2018 9:57 AMScott Coleman - Monday, July 16, 2018 9:41 AMYou don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN
DELETE TOP (10000) FROM table WHERE condition;
IF @@ROWCOUNT = 0 BREAK;
END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
36.5M rows for each of the 3 tables.
in addition,thats two months of data, the table hold 2.5 yrs worth at this time
July 16, 2018 at 12:44 pm
You have no indexes. So for EVERY select/update/delete, SQL has to do a full scan of the entire table in order to find the records to action.
In order to speed up the delete, create a clustered index on the date field.
July 16, 2018 at 2:28 pm
quinn.jay - Monday, July 16, 2018 10:08 AMquinn.jay - Monday, July 16, 2018 9:57 AMScott Coleman - Monday, July 16, 2018 9:41 AMYou don't provide rowcounts, but a large delete may be faster if done in chunks.
WHILE 1=1 BEGIN
DELETE TOP (10000) FROM table WHERE condition;
IF @@ROWCOUNT = 0 BREAK;
END
You also don't provide the recovery model. Breaking a large DELETE into pieces is especially helpful in SIMPLE recovery databases, as i reduces the strain on the transaction log. It also helps if "condition" is something that can be satisfied with a clustered index, such as a date range.If you're deleting a large percentage of the table, say 90+ percent, it may be faster to copy the rows you want to keep to a temp table, truncate the main table, and then reload from the temp table.
36.5M rows for each of the 3 tables.
in addition,thats two months of data, the table hold 2.5 yrs worth at this time
With that many records, partitioning might be a good option:
I believe that's a little more involved than a low hanging fruit solution though.
July 17, 2018 at 5:27 am
I don't quite understand your process.
You say the table has 2 1/2 years of data, so for simplicity let's say all of 2016, 2017 and the first half of 2018. Maybe that's off by a month or two in either direction, but for the sake of argument it doesn't matter.
Then you say your process is to delete the current and past month, for simplicity June and July 2018. Then I assume you reload this data as part of your process from some source.
Doesn't that mean all data from 2016 through May 2018 never gets touched? Is that the intention? Why not archive the non-current data into another table where it can reside untouched by your process. As you roll over from month to month, you'd move the oldest data from "current" to "archive". Then your delete and reload process for the current data simply becomes a truncate and load.
You'd need to revise any queries that access the data to get it from both "archive" and "current" tables. That might be a considerable task. It depends on how many processes act upon all those rows of "archive" data.
July 17, 2018 at 9:23 am
quinn.jay - Monday, July 16, 2018 9:29 AMI have an SSIS pkg, where I have three tables to load, with the first step to perform a delete with a date set delete, right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints of indexes.
The delete performance seems to take forever, I've googled around and there seems to be a wide range of ways to improve performance, I'm looking for the easiest low hanging fruit approach to give these transactions a boost.
Thanks
You have 36.5M rows for each of the 3 tables and these tables have no indexes? I'm thinking that you've identified the tool you need to impart to make this process faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2018 at 11:40 am
right now deletes the current and previous month, where the table holds 2.5 yrs of data. These tables currently do not have any keys, constraints o[r] indexes.
Cluster the tables on that date. That will drastically speed up the DELETEs. You don't really need to partition the table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 17, 2018 at 12:32 pm
Better yet.... stop doing the deletes and learn how to do a proper "UPSERT" on a properly clustered/index table! Do you really think that many rows from the previous month are going to change where it's more effective to delete rather than to selectively update and insert?
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2018 at 12:43 pm
I ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise
July 17, 2018 at 12:50 pm
quinn.jay - Tuesday, July 17, 2018 12:43 PMI ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise
Cool. Thanks for the feedback.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 17, 2018 at 12:54 pm
quinn.jay - Tuesday, July 17, 2018 12:43 PMI ended up going with a tweak on the DELETE statement coupled with clustered and non clustered indexes on the tables, and drastically slashed the delete time. For this cube and size, this is livable, For the next cube where I'll copy this and mod, there is far greater volume of data. So I'm going to incorporate much of what's been recommended outside this easy solution, like partition the table, cluster/noncluster indexes, but also Upsert and avoid the Delete. Thank you all for the help and advise
Are you sure you need a non-clustered index, let alone multiple ones? If your query specifies a reasonable date range, you won't need a non-clus index to support it.
Are you on an edition of SQL that allows you to use data compression? If so, that is a huge help with large+ tables.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply