Table Variables As An Alternative to Cursors

Question

Table Variables As An Alternative to Cursors

Jeremy Kemp

SSCertifiable

Points: 6754
More actions
April 3, 2003 at 3:30 am

#159711

I wasn't sure how to respond to a script so I started a new thread.
You can find the original script at http://www.sqlservercentral.com/scripts/contributions/736.asp
Cursors, rightly or wrongly, seem to have a bad press and I thought I would share my experiences with you.
In my application I get a number of different update files from another system. These are csv files with variable columns and each column enclosed in double quotes. There files contain new and changed records and I need to insert or update my tables with these records. These files can range from a few hundred rows to 3 million rows and performance is a important.
I use cursors extensively in processing these update files because that is what I know and also I have to do a fair amount of manipulation to get the actual data I need such as removing extra commas, single quotes, date validations etc.
The steps are:
1. Load file into table in tempdb
2. Declare cursor
3. Fetch row
4. Perform data manipulation/validation on numerous columns
5. Insert/update row in table based on whether the rows exists
6. Fetch next etc.
7. Drop table in tempdb
Following a discussion on another thread, I took one of my update routines and created a duplicate but changed the cursor bits into select statements as suggested in the script. In all other respects the update routine was identical - the only difference being how I retrieved the next row from the temporary table. As part of the update process I produce a simple report showing how many rows were processed, how many were inserted and how many updated and how many rows were processed per second.
I ran this test against an update file with 245,000 rows and I used DBCC DROP CLEANBUFFERS to clear out the cache to make the tests as consistent as possible.
The average rows/second processed by the cursor was 2084 compared to 2041 rows/second with the non cursor method. Both methods gave very similar results with a slight advantage to the cursor.
I did some investigation into why the alternative method was slower and discovered that loading the file into the table took longer as I needed an identity field to selected the and it also had to build an (non clustered) index on the column.
I also ran some tests for the non-cursor method where I started from the point at which the index was built. The non-cursor method then gave a processing rate of 2091 rows/second which is only fractionally faster than the cursor method.
To me, this shows that cursors do not deserve the bad reputation that they have got. It might be true that in older versions of SQL Server there were performance overheads but in SQL2K they don't seem to be any worse than the alternative suggested in the script.
I do acknowledge that in SQL Server, and probably all databases, set based processing is usually better than row based processing but if there are occasions when you need to do row based processing, you can use cursors with severe performance penalties.
I tried using a set based update query for my 245,000 rows but it took more than twice as long as using the cursor. It also took an exclusive lock on the table which prevents other users from reading the table - I try to run 24x7 availability so running a set based update with 3 million rows and locking the table for several hours is a non starter.
All comment welcome.
Jeremy

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

guarddata SSCertifiable Points: 6583 More actions · Answer 1

Good test Jeremy! My experience has been similar. The cursors of version 6.5 were very poor performers...so bad that I didn't even attempt to use them in version 7.0. Accepting a challenge from another program, we ran similar tests in SQL2K and found that cursors are very acceptable when the processing must be done a row at a time.

For mass updates, we have also used SET ROWCOUNT ###. When the updates are based on indexed values, this provides a cursor-like approach but still updates sets of data at once and keeps the transactions small. We find this especially useful in purging from our huge tables.

SET ROWCOUNT 1000

select count(*) from <TABLE>

WHILE @@ROWCOUNT > 0

BEGIN

BEGIN TRAN

DELETE FROM TABLE WHERE <Condition>

COMMIT TRAN

END

Guarddata-

ikramkahn SSC Journeyman Points: 87 More actions · Answer 2

I am thoroughly impressed Jeremy. I am a web developer with limited SQL programming experience, so do not know all the shortcuts yet. I do have a question for you , which might lay some arguments to rest. We noticed performance degradation when table variables are used and the tables are large. We also increased the memory on the server and that increased performance. Have you done any testing with your server memory maxed out?

Jeremy Kemp SSCertifiable Points: 6754 More actions · Answer 3

I haven't done any memory related stress testing.

I have done lots of concurrent runs of this type - each month we get around 20 update files which we run many of them concurrently. We find that when we run 3 or more concurrently we do see a reduction in throughput but I guess this is an I/O limitation rather than memory - we are running RAID 5 on a 4 processor Dell server and max 1.5 Gb RAM for SQL Server - the server is used for other processing as well.

Jeremy

jscontreras SSC-Addicted Points: 430 More actions · Answer 4

Hi Jeremy,

I'm currently working on SQL Server 7 using a cursor sp that is killing my DTS. I'm looking to rewrite this sp maybe using rowcounts, but before I do that do cursors kill performance on SQL Server 7? not 100% sure on this.

Thanks

JMC

tulcanla SSCommitted Points: 1541 More actions · Answer 5

I would have a different type of experience.

I have 4 possibilities

1 cursor

2 temp table created with "into" statement with no index

3 temp table created before with indexes

4 datatype table

I am dealing with a datawarehouse of 75 G

After experiments, for me the best was the datatype table, after that the already created temp table with indexes and at the and the last 2.

LMT