For those who are familiar with other RDBMS's, it may sound weird to hear from T-SQL developers that cursors should be avoided. Other languages even encourage their use.
Cursors treat data in a one-record-at-a-time fashion. Data will be kept in memory, so if you are careful enough when defining the cursor, data retrieval could be faster. But in case you are not, it might be disastrous to your server.
Personally I see cursors as a useful resource, but we do need to watch them closely. Let's take a look into two situations where cursors are used. And let's discuss if they could be replaced using other T-SQL features.
CASE 1 - When concatenating a string.
It may sound funny, but I bump into this kind of request (concatenating a string) over and over again. Consider when your application needs to show a person's phone numbers in one single field, as home phone number, business, mobile, fax, etc. To illustrate, let's consider the following table "dbo.tblPhone"
CREATE TABLE dbo.tblPhone (
codUser INT,
codPhoneType TINYINT,
PhoneNumber VARCHAR(20),
ListThisNumber TINYINT ,
PRIMARY KEY (codUser, codPhoneType)
)
GO INSERT INTO dbo.tblPhone VALUES ( 1, 1, '1 281 444 5555' , 1 )
GO
INSERT INTO dbo.tblPhone VALUES ( 1, 2, '55 11 4582 2752', 1)
GO
INSERT INTO dbo.tblPhone VALUES ( 1, 3, '1 471 333 1234', 0)
GO
INSERT INTO dbo.tblPhone VALUES ( 2, 1, '1 XXX XXX XXXXX', 1)
GO CREATE TABLE dbo.tblPhoneType (
codPhoneType TINYINT,
PhoneType VARCHAR(50),
PRIMARY KEY (codPhoneType)
)
GO INSERT INTO dbo.tblPhoneType VALUES ( 1, 'Residential')
GO
INSERT INTO dbo.tblPhoneType VALUES ( 2, 'Business')
GO
INSERT INTO dbo.tblPhoneType VALUES ( 3, 'Mobile')
GO
INSERT INTO dbo.tblPhoneType VALUES ( 4, 'Fax')
GO
To make things a bit more interesting, let's also consider our string has to deal with some business logic that demands conditional testing. In our example, let's say some users may not want to show their numbers in directory listings. The field that informs this preference is "dbo.tblPhone.ListThisNumber": when its value is 1, it means TRUE, list the number. In case it is 0, it means the number should not be listed.
For our purposes here, in the case where the number should not be listed, we would show "***********" instead of the actual number. To improve cursor performance, we will declare the cursor using the "FAST_FORWARD" option. Putting this all together, the cursor we are talking about would look like this when created using T-SQL:
DECLARE @AllPhones VARCHAR(1000)
DECLARE @PhoneNumber VARCHAR(20)
DECLARE @ListThisNumber TINYINT DECLARE curPersonalPhones CURSOR FAST_FORWARD FOR
SELECT PhoneNumber , ListThisNumber
FROM dbo.tblPhone
WHERE codUser = 1
OPEN curPersonalPhones
FETCH NEXT FROM curPersonalPhones
INTO @PhoneNumber, @ListThisNumber
SET @AllPhones = '' WHILE @@FETCH_STATUS = 0
BEGIN
IF @ListThisNumber = 0
SET @PhoneNumber = '***********' SET @AllPhones = @AllPhones + @PhoneNumber + ' & '
FETCH NEXT FROM curPersonalPhones
INTO @PhoneNumber, @ListThisNumber
END
CLOSE curPersonalPhones
DEALLOCATE curPersonalPhones SELECT @AllPhones
GO
To make the code simpler, I defined the cursor for one fixed User (codUser = 1) and also disregarded the fact the string will always end with an ampersand ("&").
But now we will see an easier way to do it in T-SQL. When concatenating variables, instead of using cursors, you could use a SELECT statement over the same table, storing the content of field (PhoneNumber) into a variable and concatenating this with the same variable.
This example is well described in many, many articles throughout the web. But how can this deal with the business logic? As the logic in this code is pretty simple, we can use a CASE function within the SELECT statement to have the same result. So, the new code will look like this:
DECLARE @AllPhones VARCHAR(1000) SET @AllPhones = '' SELECT @AllPhones = @AllPhones +
CASE WHEN P.ListThisNumber = 1
THEN P.PhoneNumber
ELSE '***********'
END
+ ' & '
FROM dbo.tblPhone P
WHERE codUser = 1 SELECT @AllPhones
GO
In this particular case, not only is the second code is simpler, but it is also much faster than the cod using a cursor. If you are curious enough, check the execution plan for each set of code. There you should look first for the Estimated Subtree Cost. (If you are not familiar to execution plans, keep in mind Estimated Subtree Cost represents the total cost for the query optimizer to execute the current "batch", combining CPU cost and also I/O cost).
You will see that, although we declare the cursor to be as fast as possible, that code has an Estimated Subtree Cost of 0.0032853 for each iteration, or in other words, each record in the cursor. In this example, we have 4 records, so we have to sum the cost of each iteration to know the cost of the whole loop.
In the other hand, for the second code, which does the exactly same output, the Estimated Subtree Cost is 0.0032859 , or roughly the same number we saw before. But notice this number represents the overall cost, as there is no loop! So, the second approach performed a lot better then the one using cursors. And it will be even better as the table we are querying grows bigger.
One important remark here: depending on the complexity of the business logic, it might turn out to be not possible to put it all together within a single SELECT statement. In those situations, we might have to keep the cursors in our code. This should not sound as a complaint. That's what cursors are for. We should be glad T-SQL has this feature so we can use it wisely.
CASE 2 - When using DML statements within Cursors.
DML (or Data Modification Language) statements include INSERT, UPDATE and DELETE statements. Some RDBMS vendors recommend using cursors together with DML statements to make sure they will run one record at a time. You can easily find examples of this when browsing the online documentation of many RDBMS's. As safe as it might sound, this can be tremendously slow. There are many situations where you can change those statements to handle several records at a time with no effort.
Let's see another example. We will use cursor to handle the data insertion into table "dbo.tblPhone". Consider the table "SouthHemisphere.tblPhone" has exactly the same structure as "dbo.tblPhone", although it collects data from customers who live in the South Hemisphere. We need to move this data to "dbo.tblPhone", which stores information for all customers through out the world. In this example, I tested the code moving 8 new records from "SouthHemisphere.tblPhone" to "dbo.tblPhone". Below you see the code:
DECLARE @codUser INT
DECLARE @codPhoneType TINYINT
DECLARE @PhoneNumber VARCHAR(20)
DECLARE @ListThisNumber TINYINT DECLARE curInsertion CURSOR FAST_FORWARD FOR
SELECT codUser, codPhoneType, PhoneNumber, ListThisNumber
FROM SouthHemisphere.tblPhone
OPEN curInsertion
FETCH NEXT FROM curInsertion
INTO @codUser, @codPhoneType, @PhoneNumber, @ListThisNumber
WHILE @@FETCH_STATUS = 0
BEGIN INSERT INTO dbo.tblPhone
SELECT @codUser, @codPhoneType, @PhoneNumber, @ListThisNumber
WHERE @PhoneNumber NOT IN (SELECT PhoneNumber FROM dbo.tblPhone) FETCH NEXT FROM curInsertion
INTO @codUser, @codPhoneType, @PhoneNumber, @ListThisNumber
END
CLOSE curInsertion
DEALLOCATE curInsertion
GO
I did not include a transaction in this piece of code to keep it as simple as possible. But, of course, inserting one record at a time will give you more control over the process, as you can commit or rollback each individual row. But analyzing its execution plan, you see there will be a cost for each fetch (Estimated Subtree Cost = 0.0032908) plus the cost for each insert (0.0132976). This cost will exist either we insert the new record or not. Finally, we will have a final fetch, which will define the end of the loop. This also have a cost.
Doing the math, we will have the estimated cost of for each record times the number of 8 records plus the cost of the final fetch. The overall estimate cost is 0.1460048. The other approach is to insert all records at once. So, if you have transaction in the code, either you commit all records or you rollback them all. The INSERT statement will look like this:
INSERT INTO dbo.tblPhone
SELECT S.codUser, S.codPhoneType, S.PhoneNumber, S.ListThisNumber
FROM SouthHemisphere.tblPhone S
WHERE S.PhoneNumber NOT IN (SELECT PhoneNumber FROM dbo.tblPhone)
Besides being a lot simpler, the cost for this statement to insert the same 8 records will be 0.0298601, much faster than the previous one. As pointed out, the cons in this approach is that we do not have any control in the record level. It works as a batch: either all records are successfully inserted or they all fail.
Putting it all together.
Well, I showed a lot of numbers and I do agree this get a bit confusing. Those are real numbers (in the mathematical sense also) and it is hard to believe we humans should be able to handle such a thing.
Below I show a table with the Estimated Subtree Cost for all 4 scripts. Numbers represent the total cost for the whole script, as I added the cost of each statement within the script.
c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|
Script | # of Affected Rows | Total Estimated Cost with Cursor | Total Estimated Cost without Cursor | Difference (c3-c4)/c4 % |
CASE 1: Concatenate string | 4 | 0.0131412 | 0.0032859 | 300% |
CASE 2: DML statements | 8 | 0.1460048 | 0.0298601 | 389% |
Please, remember you should not take those numbers for granted. They are valid solely for the scripts shown in this article. You should do the same kind of assessment for your own scripts and analyze the performance of those scripts.
Every Rule has an Exception.
OK, from the syntax standpoint, replacing cursors looks nice and neat. From the performance standpoint, replacing cursors may boost your performance. But when replacing cursors, we are letting go some important features that only cursors can provide to your code.
Besides, remember every rule has an exception. You should not take the avoid-cursors advice as rule, but as a hint. Each situation deserves attention, as you can have a fast and well-designed code using cursors. And, in the other hand, you can also find poor-performance codes specially created to avoid cursors.
Whenever you suspect your code is not running as expected, no matter it uses cursors or not, you should check your code's execution plan. This is the ultimate information you have to assess your code's performance.