January 15, 2004 at 2:01 am
Hello all,
At this moment my collegues and I are having a discussion (or is fight a better word ) about joining tables.
Suppose I have a country table with the names of all countries in the world and I also have a 2 letter abbreviation. In a contact table I want to add the country as a foreign key. I have to options:
- Create Unique ID of type int as primary key in the country table
- Use the 2 letter abreviation as primary key
Now, the discussion is if joining on an int datatype is better than joining on a string datatype. I suppose this all depends on the size of the string and the database type (varchar, char). I would say, if I use a char(4) instead of an int, this would have the same performance, since they both are 4 bytes long.
There is also a issue. I personally prefer to use the 2 letter abbreviation, because it makes the contacts table more readable. Of cource, this principle can be used in many other places and it would help make the database design (relations etc) more clear and 'readable'
I am very curious about the idea's of other database designers on this.
January 15, 2004 at 10:22 am
Even if the same number of bytes, char values will be slightly slower than int values when used like this (because of collation). The reasons for using them, including the one you give, far outweigh this performance issue, particularly as the joins may be unnecessary if the country is obvious from the code. Another excellent reason to use char(2) is that this is the international standard:
http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
--Jonathan
January 16, 2004 at 12:48 am
Jonathan, thanks for your reply. I already knew it was an iso standard. But I was wondering what other developers think about this from a design and/or performance view.
So what you are saying is that a char datatype is slower. If i'm sure that the collation of both fields are the same, does it still mean a (slight) loss in performance?
January 16, 2004 at 2:26 am
If you are considering purely the performance then I would say that an integer value is faster then a string value (char(n) or even wors varchar(n) or nchar, nvarchar)
The reason for this is that the integer value is one computational unit as a character string is an array of characters (or better said an array of bytes).
Therefore to make a join the RDBMS must compare the two strings character by character (making a loop) and if every characters are matching then it is considered as equal. So logically seen a join on a char(2) column is taking two machine cycles for comparison while on an integer (or smallint or tinyint) only one.
Where you would not see a difference is when you have a char(1) bacause internally it as a byte/tinyint, so no loop is required
Bye
Gabor
January 16, 2004 at 7:09 am
I was using the term collation more generally than just the meaning to SQL Server. Nyulg described how strings must be collated, as the sort orders for each character must be found and compared. This is relevant even to equijoins because 'aa' will equal 'AA' if the sort order is case insensitive.
The best design is not always that with the highest performance in one specific area; as I earlier wrote, if you use mnemonic abbreviations (like the ISO), then you may not even need to join in some of the queries, and that will definitely perform better than a join on integers.
--Jonathan
January 16, 2004 at 8:21 am
I don't remember the source immediately...so take this with a grain of salt.
The performance of string comparisons has been greatly enhanced to the point that there would be no significant advantage of the integer over the 2 character code. The document I read implied that strings even up to 10 characters are comparable. Yes - there may be some difference. But unless you are running millions of entries per day, you probably won't be hindered.
The only time I use an internal ID is when we allow the users to change the code. If that wouldn't be the case here, my preference would be to use the 2 character.
Guarddata-
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply