A fairly common problem or need you encounter is managing changes of data
stored in a table. Sometimes you simply need to be able to see that data has
changed, other times you need to know when it changed, how many times it's
changed etc. This article will discuss a couple of ways of handling issues like
this.
Requirements and functionality
First of all you must establish what the requirements are. Which data type to
use and how to use it depends on the requirements and how you want to implement
the necessary functionality. The simplest, and probably most common requirement
is to be able to check if the data has changed since you last checked (i.e. when
the data was read into an application). With this functionality you are able to
avoid having different users overwriting each other's data changes. Script 1
below show's an example of how things can go wrong if this is not handled.
-- Script 1 USE tempdb GO SET NOCOUNT ON IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'Messages') BEGIN DROP TABLE Messages END CREATE TABLE Messages ( MessageId int NOT NULL , Title varchar(50) NOT NULL , Body varchar(255) NOT NULL ) GO INSERT INTO Messages (MessageId, Title, Body) VALUES (1, 'A message', 'A message text') -- User A reads the data DECLARE @bodyA varchar(255) SELECT @bodyA = Body FROM Messages WHERE MessageId = 1 -- User B also reads the data DECLARE @bodyB varchar(255) SELECT @bodyB = Body FROM Messages WHERE MessageId = 1 -- After changing the text User A now updates with his new data SET @bodyA = 'Another text, written by User A' UPDATE Messages SET Body = @bodyA -- After this User B saves his changed version of the data, -- thereby overwriting the change that User A did earlier SET @bodyB = 'A third text, written by User B' UPDATE Messages SET Body = @bodyB SELECT * FROM Messages SET NOCOUNT OFF
Another way of handling this requirement is of course to use pessimistic
concurrency. This means that data is locked as soon as it is read and no other
user is able to read that data until the locking user releases it. The SQL
Server way of doing this is of course to use Update Locks instead of Shared
Locks when reading data. Normally though you don't want to lock data like this
until you are actually going to update it so that other users don't have to wait
for you to release the data, especially if it is not even sure that you will
change the data. The alternative is of course called optimistic concurrency,
which means that all users can read the same data. To avoid overwriting each
other's changes you instead check before updating that the data in the database
haven't been changed by someone else since you read it. There are several ways
to do this and the simplest way is also the most automatic one, but we'll wait
with that for last. Instead we'll start with some other variants.
Last changed
A common solution is to use a column of some date type. This column will hold
the date and time when the data on a row was last changed (i.e. either INSERTed
or UPDATEd). The actual updating of this datetime info is either handled
'manually' by the application that updates the database, or automatically by
triggers in the database. Script 2 shows an example of how to hande it with
triggers which is normally the way I would handle it, mainly because as a DBA I
want to make sure that the data in the database is always correct and up to
date. If the application is left to handle this there is always a possibility
that some other application (present or future) will not handle it and then you
will not have correct data.
-- Script 2: Demonstration of trigger for handling last updated column USE tempdb GO IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'Messages') BEGIN DROP TABLE Messages END CREATE TABLE Messages ( MessageId int NOT NULL CONSTRAINT pkMessages PRIMARY KEY CLUSTERED , Title varchar(50) NOT NULL , Body varchar(255) NOT NULL , ChangeDate datetime NOT NULL DEFAULT GETDATE() ) GO CREATE TRIGGER TrU_Messages ON Messages FOR UPDATE AS BEGIN UPDATE Messages SET ChangeDate = GETDATE() WHERE MessageID IN (SELECT MessageID FROM INSERTED) END GO SET NOCOUNT ON INSERT INTO Messages (MessageId, Title, Body) VALUES (1, 'A message', 'A message text') SELECT * FROM Messages -- Wait 5 seconds to see the difference in ChangeDate WAITFOR DELAY '00:00:05' UPDATE Messages SET Body = 'Another text' WHERE MessageId = 1 SELECT * FROM Messages SET NOCOUNT OFF
Timestamp
With the trigger in Script 2 we are now able to check when a row was last
changed. With this information we can now implement functionality to make sure
that User B is not able to overwrite the data that User A has already changed.
Again, the functionality can either be placed in the application where the
changes are done, or we can do it in the database with either a trigger or a
procedure. Since I like to restrict users from having direct access to tables
and instead giving them procedures to use Script 3 shows an example of a
procedure that updates the data, if it has not changed since it was read by the
user.
-- Script 3 USE tempdb GO -- Make sure that the table Messages and trigger TrU_Messages -- created in Script 2 does still exist, otherwise recreate them IF EXISTS (SELECT 1 FROM sysobjects WHERE [name] = 'MessagesUpd') BEGIN DROP PROCEDURE MessagesUpd END GO CREATE PROCEDURE MessagesUpd @messageId int , @body varchar(255) , @readDate datetime AS BEGIN SET NOCOUNT ON UPDATE Messages SET Body = @body WHERE MessageId = @messageId AND ChangeDate = @readDate IF (@@rowcount = 0) BEGIN PRINT 'The data had already been changed since read date.' END SET NOCOUNT OFF END GO DECLARE @body varchar(255), @readDate datetime, @otherDate datetime DECLARE @messageId int SET @messageId = 1 SELECT @body = Body, @readDate = ChangeDate FROM Messages WHERE MessageId = @messageId SET @otherDate = DATEADD(d, -1, @readDate) SET @body = 'A changed text' -- This will not work because the date is not the same as the one stored -- in table EXEC MessagesUpd @messageId, @body, @otherDate SELECT * FROM Messages -- This works EXEC MessagesUpd @messageId, @body, @readDate SELECT * FROM Messages
This technique is often called timestamp. By always retrieving the current
timestamp that the data has when SELECTing it, it is possible to check whether
the data has changed prior to updating it by comparing the timestamps, thereby
applying optimistic concurrency. If the row has a different timestamp than it
had when it was retrieved the user will have to SELECT the new version of the
data (including the new timestamp that is now the current one), decide whether
or not he still wants to do the change and if so try and update again.
A simpler stamp
If the requirements don't state that you actually need to see the specific date
and time when a row was changed but just need to see that it has been changed it
might be better to use an integer as a kind of simple stamp. When the data is
first INSERTed into the database it gets a default value (0 or 1 or whatever)
and then the trigger adds 1 to this each time the row is changed. In order to
update a row the user must supply the value that the stamp had when the data was
read, and then the procedure checks that the row does not have a different stamp
before updating it. The effect will be the same as using a datetime stamp, but
instead of being able to see when the row was changed it is instead possible to
see how many times it has been changed. It is only a matter of what the
requirements are. Also, note that an integer takes less space than a datetime
and it is impossible to have a 'collision' (two updates at almost the same time
could perhaps get the same timestamp), but you can of course run out of values
with the integer. I doubt that the same row will ever be changed more than
2,147,483,647 times though.
Rowversion
Very often though you don't need to know when the data was changed nor how many
times it's been changed. The only requirement is to make sure that the users
don't overwrite eachother's data changes. The simplest way to handle this is to
use the functionality that SQL Server has for this requirement, namely the
rowversion data type. Unfortunately this data type was earlier called timestamp
in SQL Server, which is unfortunate since it is not at all the same thing as the
timestamp that is described in the ANSI-SQL standard. The reason it was called
timestamp is because it existed in SQL Server before timestamp existed in the
standard. For backwards compatibility reasons it is still possible to use the
name timestamp for it, but just as most other things left only for backwards
compatibility in SQL Server it is recommended to use the new name rowversion
because the definition of timestamp might be changed in a future version of SQL
Server to adhere to the ANSI definition.
A rowversion in SQL Server is actually a binary(8) (unless it allows NULLs, in
which case it is a varbinary(8)), which is automatically changed when a row is
INSERTed or UPDATEd. A table can only have one rowversion column and it is not
possible to update it manually, it is always handled by SQL Server
automatically.
Just like when using a datetime or integer stamp you use it by retrieving the
current value when SELECTing data and then comparing it against the one stored
in the database when you want to update the row. With a column of data type
rowversion in a table it is also possible to use SQL Server's built-in
functionality for optimistic concurrency. This means that you can SELECT data to
a recordset in a client application and handle it locally using for example a
data grid. When you're finished changing the data in the grid you connect the
recordset to the database again and use Update or UpdateBatch (to update several
rows at once). SQL server will now automatically check if the row(s) have been
changed between the time when they where read and the update by comparing the
values in the rowversion column. It is actually possible to use automatic
optimistic concurrency control without a rowversion column but in that case SQL
Server needs to compare all the current data of all columns to the old values
they had when the recordset was originally opened. In that case the recordset
must always store both the original data and the new data that has been changed
locally.
Summary
Managing changes to data in SQL Server is as you can see not very complex, but
it can be very important. In the multi-user web applications of today it is not
normally possible to lock data while working with it so instead you must use
optimistic concurrency. If it is not handled users risk losing work and time,
and as you know, time is money.