How does the Indentity property work?
Introduction
If you are like most DBAs, you have probably implemented some type of hash function for some
of your tables at one time or another. The most common type of hash is a sequential integer type
sequence that functions as a primary key, without having to use some character value. I'm not debating
whether this is a good idea or not, but there is often a need for it. Usually for invoice numbers, order numbers,
etc. one is looking for a sequence of numbers.
The Problem
When I had first arrived at the this comapny, the programmers had implemented a scheme where the next
key value was stored in a table. Actually a number of key values for different items were stored in
this table. Whenever a process needed a new key bvalue, it ran a stored procedure that retreived the next
key and incremented the value in the table. As the system was beign developed, this was an easy way to
manage the incremental values.
This was not, however, one of my favorite methods of handling this. Instead, I have always preferred using
the IDENTITY property for each table that needs this. I immediately started to convert our system to use
the identity property where needed.
Of course, one of the developers pointed out that SQL Server must be performing the same process that
their system was doing, namely storing the "next" value somewhere, retrieving it, and then incrementing the
stored value. We proceeeded to debate which method was in fact better. As we argued the merits of each
method, I wondered how SQL server's identity property would perform under load.
The Solution
When in doubt, test.
This was exactly what I decided to do. My first step was to setup a table:
create table MyTest2 ( source char( 1) , MyID bigint identity( 1, 1) )
Then I setup some data files. These were simple files I generated in Excel and then saved
as text files. Each contains a single character representing the source file and a numeric field
that functions as a placeholder for the bcp utility. These are available for download here:
and loadc.txt (not very
original names, but they served the purpose). Once I had these data files, I decided to
conduct my identity test. Since I was going to use BCP and not include an identity value (since I
was testing the identity process), I included placeholder values to avoid generating format files.
I opened three command
windows and entered the following:
bcp "iqd_dev..MyTest2" in c:\loadc.txt -STank -T -c
substituting the correct file name in each window. Then I pressed <Enter> in one
window, quickly switched to the next, pressed <Enter> again, and repeated the process in the
third window. Here is a shot of my desktop after I ran these:
Of course, this took a few tries to get working correctly. The timing of the mouse clicks and <
Enter> presses took some practice and I dropped and recreated my table more than a few times.
The Possible Results
Before I reveal what happened, there were a couple of possible outcomes that I had anticipated. Either
the server would treat each BCP load as a transaction, in which case I would expect to find rows 1-
30,000 with a single source, rows 30,001-60,000 with a second source and 60,001-90,000 with the third
course. If I were implementing my own identity, this is likely what would happen (if each load were
its own transaction). The other alternative would be that despite each load being a transaction, the
server had some internal optimizations and I would receive a spread of sources among all identity values.
What I found in my results, after successfully having simultaneous loads (more or less), was that SQL
Server was able to generate these identity values extremely quickly. The load of these 90,000 rows
completed in less than 5 seconds on numerous occassions. Looking at a sample result set of data, you can
see my results:
Conclusions
I don't necesarily recommend identity values for all tables. There are many places that I have found
where a sequential, numeric key can be helpful, however, and for these situations, I do highly recommend
the identity property over any other type of sequence generator.
As always I welcome feedback on this article using the "Your Opinion" button below. Please also
rate this article.
Steve Jones
©dkRanch.net October 2001