Joins - Best Practice

Question

Joins - Best Practice

Mustang Sally

Newbie

Points: 5
More actions
June 27, 2004 at 6:20 pm

#86018

I have heard that one will get better performance joining an int column to another int column versus joining, let's say, a char(5) column to another char(5) column.
If this is true can someone answer why?
Thanks

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Julian Kuiters SSCertifiable Points: 6585 More actions · Answer 1

I couldn't see any reason why that would be the case (unless you were joining on a varchar(2000) column for example).

A join is fastest when the column is a primary key clustered index. SQL Server is able to join on a clustered index really quickly, because the rows are ordered (usually ascending) so it can jump ahead and join the rows without needing to read all the data in a table. But this performance would probably be the same for int or char(5).

Most normalised, relational databases use Primary Key -> Foreign Key relationships so that table data is related. Joining on those relationships is fast as the relationships are indexed.

Julian Kuiters
juliankuiters.id.au

Wayne-153714 Hall of Fame Points: 3536 More actions · Answer 2

I was told on a SQL cource I attended, that SQL Server 2000 is optimised for use of Int and GUID as Primary Keys, and for joining.

I haven't been able to verify this though. When in doubt, test, test, test!

When in doubt - test, test, test!

Wayne

Antares686 SSC Guru Points: 125444 More actions · Answer 3

But GUIDs are alphanumeric. The key for speed is the fixed length and size of data. Ultimately they are compared at the binary level.

Mark Cohen Old Hand Points: 321 More actions · Answer 4

Hi

First off, check if you need to index your joining cols.

ints are supposed to be faster in joins because int comparisons are quicker than text comparisons (less to compare, no collation to check / convert), especially across massive amounts of data.

More importantly make sure your data types match, ie the data is stored in cols of the same type and collation (if text) - joining a string to an int can be done but you will be doing tablescans even if fields are indexed.

Mark

Mark Cohen

Wayne-153714 Hall of Fame Points: 3536 More actions · Answer 5

Something else to consider when designing your joins and indexes (you should preferably be joining on indexed fields), is that an Int takes 4 bytes of physical space, thus your indexes will be smaller than the char(5) field you mentioned. Smaller indexes can offer better performance gains.

See below, and you will find that you can store many rows in your 4 bytes Int. If you use a char, you will need a char(10) to hold the same number of rows. This will result in a much bigger index.

Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).

When in doubt - test, test, test!

Wayne

rockmoose SSCarpal Tunnel Points: 4601 More actions · Answer 6

A char(5) consumes 5 bytes, an int 4 bytes. A 32bit processor would be better at comparing 32 bit data than 40 bit data ?

If you have smaller data the data would also be spread among fewer pages in memory, so theoretically SQL Server would have to retrieve fewer pages from storage for comparing the data.

With a clustered index, more keys can be stored on one page if you use int contra say guid...

Having said this,

I say that:

I would never choose between Int or Char(5) for performance reasons.

Use the datatype that is appropriate for storing the data at hand.

/rockmoose

You must unlearn what You have learnt

rockmoose SSCarpal Tunnel Points: 4601 More actions · Answer 7

... you will find that you can store many rows in your 4 bytes Int. If you use a char, you will need a char(10) to hold the same number of rows...

Ooops,

Wayne: isn't the number of possible values for char(5) something like:

no_of_permissible_characters^5

/rockmoose

You must unlearn what You have learnt

Wayne-153714 Hall of Fame Points: 3536 More actions · Answer 8

rockmoose: 🙂 I'm not so hot at maths, so you win! I was thinking along the lines of storing numeric keys in the char data. Thinking further, if you use hex values (00 - FF) in the char(10) the possible rows is huge (I won't pretend to know how to calculate that!).

When in doubt - test, test, test!

Wayne

rockmoose SSCarpal Tunnel Points: 4601 More actions · Answer 9

Cheers Wayne,

select convert(bigint,0xFFFFFFFFFF)

-- 1099511627775

-- only a-z characters (26) char(10)

select power(convert(bigint,26),10)

-- 141167095653376

/rockmoose

You must unlearn what You have learnt