October 2, 2013 at 6:31 am
Hi,
The software we use often creates a lot of temporary tables that need to be deleted from time to time.
Depending on the usage, this can range from hundreds per month to thousands per week.
These temporary tables are all created with names that meet certain conventions, as such, they're quite easy to identify in the database.
Normally, when I've been dropping these tables, I've done it within a cursor.
The cursor identifies the names of the tables in the database, and then the loop process drops each table one at a time.
I'm often conscious of the performance overhead of using a cursor, so have been looking an alternative process for this.
As such, I've tried a new Drop Table command, which comma separates each of the table names and then passes to a single Drop Table command.
declare @DropObjectList varchar(max)
SELECT @DropObjectList = COALESCE(@DropObjectList+',' ,'') + TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
where TABLE_NAME like 'TEMP[_]TABLE[_]%'
exec ('drop table ' + @DropObjectList)
I've tried doing some basic performance testing on test data, and the results are inconclusive.
It appears to me that the Drop Table command has a higher overhead than looping through the cursor.
Therefore the performance isn't particularly different whether I run through a cursor or not.
Does anyone have any ideas about the fastest way to drop multiple tables whose names meet a certain naming convention, and whether there should be a performance difference whether a cursor is used or not?
Thank you
Steve
October 2, 2013 at 12:28 pm
Steve Smith-163358 (10/2/2013)
Hi,The software we use often creates a lot of temporary tables that need to be deleted from time to time.
Depending on the usage, this can range from hundreds per month to thousands per week.
If possible it would be helpful to include a DROP TABLE #<temp_table_name> statement in any stored procs that create them. This may be obvious to you but I see code with temp tables not cleaned up all the time.
...Normally, when I've been dropping these tables, I've done it within a cursor.
The cursor identifies the names of the tables in the database, and then the loop process drops each table one at a time.
I'm often conscious of the performance overhead of using a cursor, so have been looking an alternative process for this.
Cursors are bad for many reasons and should be avoided whenever possible (which is almost always). This article[/url] will help you learn to live without cursors, loops, rCTEs and other iterative, non set-based methods for solving this kind of problem.
Does anyone have any ideas about the fastest way to drop multiple tables whose names meet a certain naming convention, and whether there should be a performance difference whether a cursor is used or not?
You could do something like this:
DECLARE @DropObjectList varchar(8000)=''; -- if you can get away with varchar(8000) you will experience better performance
WITH tbls(tbl) AS
(SELECT 'drop table ['+TABLE_NAME+'];'--add brackets for tables with names like "xxx-yyy"
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME like 'TEMP[_]TABLE[_]%'
FOR XML PATH('')
)
SELECT @DropObjectList=tbl
FROM tbls
--PRINT @DropObjectList
EXEC(@DropObjectList)
-- Itzik Ben-Gan 2001
October 2, 2013 at 1:04 pm
Hi Alan,
Thank you for the reply.
I'll try that code and see how it performs against my other methods.
With regards to the temporary tables within the application, I accidentally used the wrong terminology when posting.
The tables are temporary within the application in as much as they need need to be retained.
However, they aren't actually SQL Server temporary tables (i.e. not # or ## tables)
Again, thank you for the advice.
Steve
October 3, 2013 at 2:49 am
Steve Smith-163358 (10/2/2013)
However, they aren't actually SQL Server temporary tables (i.e. not # or ## tables)
Ahhhh... That is actually i wanted to listen.
Here are my 2 cents .
Yes in my last company , we also had same kind of requirement. so what we did , we created another database to store these temp tables (actual physical tables) and then used them. and we also kept them on separate disk. in this way you dont put extra overhead on your sql server (on user db or tempdb) put cleaning/droppping tables will also not impact the peorformance.
So you need separate database.
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
October 3, 2013 at 7:40 am
Steve Smith-163358 (10/2/2013)
It appears to me that the Drop Table command has a higher overhead than looping through the cursor.Therefore the performance isn't particularly different whether I run through a cursor or not.
That's absolutely correct. In fact, this is what cursors were originally designed for... process control. Cursors got a really bad name simply because people took to using them for the wrong things.
Still, I avoid even "process control" cursors like this, especially since the advent of the MAX datatypes in 2005. I've gotten so used to using the methods similar to the code that Alan posted that I'd have to lookup how to write a cursor and use it. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2013 at 8:50 am
Thank you all very much for your replies, and thank you Jeff for the excellent article that Alan linked to earlier.
I had often read about the merits of set based v's cursors before, but never really grasped the concepts.
More to the point, I didn't realise that half of the code I wrote was set based and half wasn't 🙂
Alan, thank you for the code sample.
I've tried using it, but unfortunately I start hitting limits with the size of the data type.
In my test example that I'm using, I have 10,000 tables that I want to delete.
It appears, that due to the number of tables, I can't populate them all into the parameter, even using varchar(max)
With this many tables, I may be stuck with cursors or similar procedures
October 3, 2013 at 9:03 am
Steve Smith-163358 (10/3/2013)
With this many tables, I may be stuck with cursors or similar procedures
YOu can do this during Off- peak hours too , may be scehdule a JOB which runs daliy night .
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
October 3, 2013 at 9:06 am
If what you are doing occurs every so often, say, once a week, then consider creating a job that runs on the SQL Server Agent off hours. Additionally, if it were up to me, create a procedure that is robust enough to handle the 10k files that needs to be deleted. So if that requires the use of a cursor, so be it. By the way, you can get creative with a WHILE loop if you don't want to render a cursor. Additionally, you can get creative with a WHILE loop to do a batch of deletes over the span of a few evenings or over the span of several hours.
The main thing here is finding a solution that is going to do the job. I really don't think this discussion warrants the discussion of the pros/cons of cursor logic.
Just saying....
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 3, 2013 at 9:30 am
Hi,
thanks again for the replies.
At the moment, the process we have for dropping these tables does indeed run out of hours.
However, I've been tasked with rewriting it for a number of reasons and although it will still run out of hours, I kinda got stuck up on trying to make it as fast / efficient as possible.
while the actual performance impact won't matter because it's out of hours, it'd make me feel good if I could improve the performance.
After reading loads of posts about the performance impact of cursors etc, i wanted to try to drop multiple tables as efficiently as possible without a cursor
Thank you all for your help
October 3, 2013 at 9:43 am
Steve Smith-163358 (10/3/2013)
Hi,thanks again for the replies.
At the moment, the process we have for dropping these tables does indeed run out of hours.
However, I've been tasked with rewriting it for a number of reasons and although it will still run out of hours, I kinda got stuck up on trying to make it as fast / efficient as possible.
while the actual performance impact won't matter because it's out of hours, it'd make me feel good if I could improve the performance.
After reading loads of posts about the performance impact of cursors etc, i wanted to try to drop multiple tables as efficiently as possible without a cursor
Thank you all for your help
OK, understood. So what is your efficiency goals? How long is the process running now?
Also, a few years back I was tasked to write a high transaction volume load process. The company I was working for received an update file with several hundred million rows. It was my job to process this file within a 2 hour time window. If I processed the file directly the estimated time period was in the order of 23+ hours, not a solution. So what I did was wrote a process that spawned multiple SQL Server Agent jobs (I think there were 10 separate jobs) to process the update file. By running the 10 jobs concurrently I was able to get the entire process to run in just over 1 hour. Each of the 10 jobs processed a chunk of the incoming file. I was amazed that I didn't run into any kind of deadlocking issues... SQL Server handled the processing just fine.
With that in mind, instead of processing as a single thread, consider "going wide". Consider have multiple procedures that handles a chunk of these files to process at the same time.
Bottom line is you need to define your efficiency goal and sometimes think outside the box for a solution.
Hope this helps.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 3, 2013 at 10:25 am
Hi Kurt,
Thank you very much for the input and ideas.
There are no hard and fast goals that I have to meet regarding the performance.
This process will still run out of hours and as such is not going to have any impact on live processing.
As such, the performance / efficiency improvement is solely to make me happy.
At the moment, my test case that I'm using takes around 3 minutes to drop all of the tables.
If I found a way to drop all 10,000 tables in under 2 minutes (for example) I would feel like i'd achieved something.
I suppose that the bigger issue here, rather than my specific situation was for me to understand if it was at all possible for me to efficiently drop multiple tables.
If I could, there is potential for me to use a similar process with other tasks.
Given that the overall priority of this process is functionality, rather than performance, I'm not planning to spend considerable time attempting to make it more efficient.
I have just realised that in a previous post I had reached the size limit of the parameter.
After reviewing it, I don't believe I have.... the parameter is easily big enough to handle my table names.
I don't know why I thought i'd be hitting a varchar(max) limit.
Thanks for the help
Steve
October 3, 2013 at 11:39 am
I don't know if I should be starting a new post for this or not.
When I've been testing the performance of my drop table statements, I've been restarting the SQL Instance before running.
My feeling here was that I didn't want any cached data to be affecting my calculations.
Interestingly, when dropping all 10,000 tables after an Instance restart, the whole process takes about 3 minutes
If I don't restart the instance, and just run the process immediately after creating the tables, they delete in about 15 seconds.
Is this discrepancy in time to be expected?
Is the process faster if the instance isn't restarted due to cached data?
Or is the process slow after the instance has been restarted due to SQL also running other startup processes?
Thanks again for your help
Steve
October 3, 2013 at 1:28 pm
Steve Smith-163358 (10/3/2013)
Thank you all very much for your replies, and thank you Jeff for the excellent article that Alan linked to earlier.I had often read about the merits of set based v's cursors before, but never really grasped the concepts.
More to the point, I didn't realise that half of the code I wrote was set based and half wasn't 🙂
Alan, thank you for the code sample.
I've tried using it, but unfortunately I start hitting limits with the size of the data type.
In my test example that I'm using, I have 10,000 tables that I want to delete.
It appears, that due to the number of tables, I can't populate them all into the parameter, even using varchar(max)
With this many tables, I may be stuck with cursors or similar procedures
I don't believe that you're hitting the limits of the datatype (VARCHAR(MAX)). What you are hitting the limit of is how to display the code... normally 8K bytes (or less depending on your column width settings) even if the variable contains much more.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2013 at 1:39 pm
Jeff Moden (10/3/2013)
Steve Smith-163358 (10/3/2013)
Thank you all very much for your replies, and thank you Jeff for the excellent article that Alan linked to earlier.I had often read about the merits of set based v's cursors before, but never really grasped the concepts.
More to the point, I didn't realise that half of the code I wrote was set based and half wasn't 🙂
Alan, thank you for the code sample.
I've tried using it, but unfortunately I start hitting limits with the size of the data type.
In my test example that I'm using, I have 10,000 tables that I want to delete.
It appears, that due to the number of tables, I can't populate them all into the parameter, even using varchar(max)
With this many tables, I may be stuck with cursors or similar procedures
I don't believe that you're hitting the limits of the datatype (VARCHAR(MAX)). What you are hitting the limit of is how to display the code... normally 8K bytes (or less depending on your column width settings) even if the variable contains much more.
Good point Jeff....
Steve... can you post the original error that you were receiving when you thought you hit the limit of the varchar(max)? Thanks.
Kurt
Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY
http://www.linkedin.com/in/kurtwzimmerman
October 4, 2013 at 1:51 am
Steve Smith-163358 (10/3/2013)
Is this discrepancy in time to be expected?
NOt neccessarily ..depends
Steve Smith-163358 (10/3/2013)
Is the process faster if the instance isn't restarted due to cached data?
Yes ..this is the main reason and probable answer of all these 3 questions
Steve Smith-163358 (10/3/2013)
Or is the process slow after the instance has been restarted due to SQL also running other startup processes?
No
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply