June 27, 2004 at 6:20 pm
I have heard that one will get better performance joining an int column to another int column versus joining, let's say, a char(5) column to another char(5) column.
If this is true can someone answer why?
Thanks
June 29, 2004 at 2:09 am
I couldn't see any reason why that would be the case (unless you were joining on a varchar(2000) column for example).
A join is fastest when the column is a primary key clustered index. SQL Server is able to join on a clustered index really quickly, because the rows are ordered (usually ascending) so it can jump ahead and join the rows without needing to read all the data in a table. But this performance would probably be the same for int or char(5).
Most normalised, relational databases use Primary Key -> Foreign Key relationships so that table data is related. Joining on those relationships is fast as the relationships are indexed.
Julian Kuiters
juliankuiters.id.au
June 29, 2004 at 4:54 am
June 29, 2004 at 5:36 am
But GUIDs are alphanumeric. The key for speed is the fixed length and size of data. Ultimately they are compared at the binary level.
June 29, 2004 at 6:42 am
Hi
First off, check if you need to index your joining cols.
ints are supposed to be faster in joins because int comparisons are quicker than text comparisons (less to compare, no collation to check / convert), especially across massive amounts of data.
More importantly make sure your data types match, ie the data is stored in cols of the same type and collation (if text) - joining a string to an int can be done but you will be doing tablescans even if fields are indexed.
June 29, 2004 at 6:59 am
Something else to consider when designing your joins and indexes (you should preferably be joining on indexed fields), is that an Int takes 4 bytes of physical space, thus your indexes will be smaller than the char(5) field you mentioned. Smaller indexes can offer better performance gains.
See below, and you will find that you can store many rows in your 4 bytes Int. If you use a char, you will need a char(10) to hold the same number of rows. This will result in a much bigger index.
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
June 29, 2004 at 7:49 am
A char(5) consumes 5 bytes, an int 4 bytes. A 32bit processor would be better at comparing 32 bit data than 40 bit data ?
If you have smaller data the data would also be spread among fewer pages in memory, so theoretically SQL Server would have to retrieve fewer pages from storage for comparing the data.
With a clustered index, more keys can be stored on one page if you use int contra say guid...
Having said this,
I say that:
I would never choose between Int or Char(5) for performance reasons.
Use the datatype that is appropriate for storing the data at hand.
/rockmoose
You must unlearn what You have learnt
June 29, 2004 at 7:56 am
... you will find that you can store many rows in your 4 bytes Int. If you use a char, you will need a char(10) to hold the same number of rows...
Ooops,
Wayne: isn't the number of possible values for char(5) something like:
no_of_permissible_characters^5
/rockmoose
You must unlearn what You have learnt
June 29, 2004 at 8:35 am
rockmoose: 🙂 I'm not so hot at maths, so you win! I was thinking along the lines of storing numeric keys in the char data. Thinking further, if you use hex values (00 - FF) in the char(10) the possible rows is huge (I won't pretend to know how to calculate that!).
June 29, 2004 at 9:14 am
Cheers Wayne,
select convert(bigint,0xFFFFFFFFFF)
-- 1099511627775
-- only a-z characters (26) char(10)
select power(convert(bigint,26),10)
-- 141167095653376
/rockmoose
You must unlearn what You have learnt
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply