Introduction
This article is for people who have just become a DBA. Almost all senior DBA's will and should know about this, but I thought it was worth sharing.
For the past nine years I have been working in industries where the number of transactions per second was quite high. The worst part was that I had to work with a DB that was badly designed for such a high amount of transactions per second. My challenge was to use the same DB Design but use it as efficiently as possible. (I could not change the DB design since all the applications too have to be re-designed)
To succeed, I had to do lots of research. One thing I found out was that fewer reads you require of the DB, the better the performance. So each additional read done by DB was very costly for us, and likely for you, too.
Tuning Transactions
As I was working on this issue, a new DBA came to our company. His name is Bill Wunder. He was a SQL Server MVP for the years 2004, 2005 and 2006. I, along with Bill and Christian (our production DBA), started looking at different approaches, and one of these was to do UPSERTS.
For a database to perform at its optimum, the main thing you can do is remove any unnecessary reads on any table. As I mention above our database gets a high number of transactions per second. So we decided to reduce the reads by doing UPSERTs wherever it could be applied.
An UPSERT is nothing more than two DMLs combined together: an Update and an Insert. Let us say that you have to write a stored proc that has to check if the row exists or not based on the key and if it exists, update some column. If it does not exist, then it inserts a new row. The best way to do it is by doing an UPSERT.
The common approach by the programmers is to do an IF EXISTS statement to find out if an Update or an Insert is necessary. There is actually no need to execute a select to determine if an update or insert is necessary. This will guarantee two I/Os for each call of the stored procedure. One to check if it exists and then the next to either update or insert.
Till now I have blabbed about UPSERT. Now we check to see if the UPSERT actually works. As a first step let us create a table.
CREATE TABLE [dbo].[tbTicket] ( [TicketNumber] [int] NOT NULL , [Customer] [varchar] (10) , [TicketDate] [datetime] NULL , [TicketFrom] [varchar] (50), [Comments] [varchar] (2500), [DiscrepancyFlag] [char] (1), [Status] [int] NULL ) GO ALTER TABLE [dbo].[tbTicketing] ADD CONSTRAINT [PK_tbTicketing] PRIMARY KEY CLUSTERED ( [TicketNumber] ) ON [PRIMARY] GO
Now let us populate the table with some rows of test data.
Insert into dbo.tbTicketing (TicketNumber , Customer, TicketDate, TicketFrom, Comments, DiscrepancyFlag, Status) select 1, 'Test1','12-Dec-2007 01:25:00', 'Service Dept', 'Problem using Internet','N',0 UNION ALL select 2, 'Test2','12-Dec-2007 01:35:00', 'Service Dept', 'Problem using Application','N',0 UNION ALL select 3, 'Test2','12-Dec-2007 01:45:00', 'Sales Dept', 'Need new Mouse','N',0
We now have table with data in it. We will do both UPSERT and IF EXISTS cases to find out which is more efficient. Let us start with UPSERT with a case for Update.
Set statistics io on GO declare @parm1 int, @parm2 char(1), @parm3 int set @parm1 = 1 set @parm2 = 'Y' set @parm3 = 3 UPDATE dbo.tbTicketing - Always Schema Qualify SET DiscrepancyFlag = @Parm2 , Status = @Parm1 WHERE TicketNumber = @Parm3 IF @@rowcount = 0 - Make sure you do this before executing any other statement INSERT dbo.tbTicketing ( TicketNumber ,Customer ,TicketDate ,TicketFrom ,Comments ,DiscrepancyFlag ,Status ) VALUES ( @parm3 ,'Test1' ,getDate() ,'Marketing' ,'Testing' , @parm2 , @parm1) Set statistics io off GO
Now let us take a look at the Messages tab to find the number of reads done on the table tbTicketing. You will see one statement like this.
Table 'tbTicketing'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
This means that it hit the table once to do an Update when using UPSERT.
Now let us do the same test using an IF EXISTS statement.
Set statistics io on GO declare @parm1 int, @parm2 char(1), @parm3 int set @parm1 = 5 set @parm2 = 'X' set @parm3 = 3 if EXISTS (SELECT * FROM tbTicketing WITH (READUNCOMMITTED) Where TicketNumber = @Parm3) Begin UPDATE dbo.tbTicketing SET DiscrepancyFlag = @Parm2 , Status = @Parm1 WHERE TicketNumber = @Parm3 END ELSE BEGIN INSERT dbo.tbTicketing ( TicketNumber ,Customer ,TicketDate ,TicketFrom ,Comments ,DiscrepancyFlag ,Status ) VALUES ( @parm3 ,'Test5' ,getDate() ,'Service Dept' ,'Testing' , @parm2 , @parm1) END Set statistics io off GO
If you look at the message tab of your SSMS execution window, you will see that the Query Engine had to do two hits on the table tbTicketing. One to find out if the rows exist and the second one to actually Update the data.
Table 'tbTicketing'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
(1 row(s) affected)
Table 'tbTicketing'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
Conclusion
For every update in the case mentioned above we are removing one additional read from the table if we use the UPSERT instead of EXISTS. Unfortunately for an Insert, both the UPSERT and IF EXISTS methods use the same number of reads on the table. Therefore the check for existence should only be done when there is a very valid reason to justify the additional I/O. The optimized way to do things is to make sure that you have little reads as possible on the DB.
The best strategy is to attempt the update. If no rows are affected by the update then insert. In most circumstances, the row will already exist and only one I/O will be required.
I hope you learned something new today. 75% of my knowledge was passed on to me by Bill, and now I want to pass it on to others what I have learned. I will talk about the advantages of Schema Qualification at another time.
Roy Ernest
A Newbee DBA