December 29, 2004 at 8:36 am
Hi,
I am a programmer (VB) and a beginner when it comes to optimal database design, but I am always looking for improving my database design skills. I read in numerous posts that every table should have a field that uniquely identifies each record. I understand the underlying logic behind this, what I don't understand is what this unique identifer should be. Many people wrote that an autonumber field is a bad way of creating a unique field and I would like to understand why this is the case? Why is a composite key better than an autonumber field? I would like to design the optimal databases, so if you could point me in the right direction in regards to the unique identifer or any other design considerations I would greatly appreciate it.
December 29, 2004 at 8:52 am
Lots of good info here...
http://www.sql-server-performance.com/database_design.asp
I use this site a lot as well as SQL Server Central. Enjoy!!
--Lori
December 29, 2004 at 9:11 am
Beware of Large composite keys, that's why surrogates exists (the speed of joins will be very noticeable as the table grow large).
Like I said there is no ABSOLUTE in design it will always be: "depends"
HTH
* Noel
December 30, 2004 at 12:49 pm
Journeyman, perhaps you could shed light on the Identity column issue, or tell me where to find your "rants".
We have tables that need 3 or four fields to define identity. Example: iBatch (SHORT INT), jBatchDate (DATETIME), iEnvelope (SHORT INT), iPage (SHORT INT), iSeq (SHORT INT). Since we do a tremendous amount of joins, we opted to use an identity column so we only join on one column. Example: Envelope INNER JOIN Page on Envelope.idEnvelope = Page.idEnvelope.
Is our guess that a join on 5 fields has worse performance as a join on 1 field correct?
- John
December 30, 2004 at 2:05 pm
Joe, thanks for printing the rant, I see where you are comming from. I don't want to get into a debate, about the purity of compuer generated id fields vs combinations of existing fields for unique keys. Lets just say that I'm a DBA who is interested in performance over purity in many cases.
I have a question about performance. I just want to know if a multiple field key used in a join is lower performance than a single field key used in a join. To me it seems like using a single id field for a record would be a large performance gain over even two fields.
A INNER JOIN B ON A.id = B.id
vs
A INNER JOIN B ON A.x = B.x and A.y = B.y
Our database is queried millions of times in a 6 hour day and performance is critical. Sales of our system depend partially upon it not requiring a hugely expensive backend system. Unfortunately that trumps the purity of our DB. You could say I am an one of the id-iot$
- John
December 30, 2004 at 2:13 pm
Dragan, you might want to read a good on DB design that can help you. I found Mike Hernandez "Database design for mere mortals" an easy-to-read introduction to this topic. When you advance, there are other books like Joe's "Databases and concepts" (hope, I got the title somehow right) and/or Chris Date's "Introduction to database systems". The discussion about IDENTITY pro and contra is almost as old as this feature exists. Do a search here or in the Google groups, read them closely and make up your own mind. Using the IDENTITY property is better than having no Primary Key at all.
Joe, wow, that was quite a long post, but did it get truncated? Hey, my company is helping you paying your mortgage. They mistakenly ordered "Trees and hierarchies" when I wanted "Advanced SQL programming". Nice trick to cover all under "SQL for Smarties". Nevermind, they also ordered "Advanced SQL programming", too.
John, you can find the "Best of Joe Celko" on the Google Groups. Though personally I think some of the best did Joe on the Yahoo Groups.
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
December 31, 2004 at 9:04 am
2 integer fields vs 1 integer field. I know a DBA without a job who was using GUID's on each record! Sorry without identifying the fields it is not clear.
A INNER JOIN B ON A.id = B.id -- id is an INT
Vs
A INNER JOIN B ON A.x = B.x and A.y = B.y -- x,y are INT
My guess is that join 2 is slower, but not by much. The index would be twice as large on the disk, but the traversal would require approximately the same amount of reads. However we hit this type of query millions of times a day. We have ones with up to 5 unique keys which we hit the same amount of times.
We have another unique key that is 2 integers and 1 date. Because we believed it was faster to join using a single int and because it was impossible to communicate to each other and to our clients which row/record we were talking about, a single Id was a much better solution. Ex: a row/record represents an envelope that was scanned in the mail room. Each envelope has a batch sequence, a date, and an envelope sequence. Instead of having to always communicate by saying on day 1/23/2003, batch 234, envelope 45 we just say idEnvelope 189456. Sometimes we move an envelope to another batch, but it keeps all its pages, notes, etc, because they are tied to idEnvelope. I agree with Noeld - "it depends".
If you have a unique number that is part of the data and makes a handy way to reference the record/row, then use it as the key. This way you minimize DB size and maximize performace, because adding another id field would be redundant. if you have rows/records with multiple fields that form an id, you need to consider if you want to join the records always using multiple fields and if that is a performance hit you can accept. I don't think multiple INTs is a problem, but dates and strings as Ids could be a problem. If the key fields in record A can change, then you have problems because your links to other records/rows have been broken. I always think about minimizing the size of the DB too. If a unique ID is to be used in a join, it will have to be indexed. If it were not going to be indexed normally, then you just ate up a chunk of disk space and lost performance when adding rows/records.
Good luck. In any event, if you have doubts, do some tests.
- john
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply