Foreign Keys Question

  • Quick question on foreign keys: I have five primary tables and a dozen or so reference tables. The reference tables use varchar fields as a primary key--ranging in size from varchar(2) to varchar(25) .

    For example, my country reference table looks like this:

    CountryCode, varchar(5)

    CountryName, varchar(35)

    with the CountryCode field as the PK.

    In the address table (one of my primary tables), the "Country" field, varchar(5), references the CountryCode as a foreign key.

    Is there any advantage to adding a int field to each of the reference tables and setting that as the primary key and then updating the five primary tables to store the int field as as opposed to the varchar field as the foreign key?

    The database is ~1.4 billion records and growing at a rate of 300,000 new rows per day. I don't have any control over how I get the data, but I can alter it as it comes into my system. We would like to create a OLAP Cube using SQL Analysis server. For the Cube, will it matter if the foreign key is a varchar or Int? Is there any performance improvement? Or does it really not matter?

    Thanks

  • I don't think you well get any significant improvements, but what you will get is a higher level of maintanability.

    When you change the Country code, for example, you will need to change it in the referenced table also.

    If you use an identity in the Country table, then simply linking with this ID will pinpoint directly to the country code, and name you have asked.

    It's basically the same, but it's a lot harder to maintain using varchar columns, especially if the varchar columns can eventually change.

    And don't suppose some will not change, as sooner or later, someone will ask you to change them.

    Hope that answer suits you,

    Cheers,

    J-F

  • You're stepping into a minefield of sorts here. This is the long-standing discussion between natural keys and artificial keys. I don't want to kick up too much mud, so I'll just say this, when you're talking about a VARCHAR field, you need it to be somewhat selective so that the index that you're going to get on the PK and any you create on the FK's will work well. Really small variable length fields don't work well because there isn't enough data to get a really selective index and really long fields make an index key that's just too big. It really is easier to index an integer field with some degree of expectation of consistent performance.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I gathered from what I read that this issue is open for debt. I think I'll test it on one of the smaller primary tables in development to see if there is a performance increase.

    A sort of related follow-up question: should datetime fields be indexed?

    Thanks for the help.

  • I'm a fan of artificial keys for reasons already mentioned. Remember that SQL server defaults the pk to also be the clustered index so that data is also contained in any nc indexes. Not a problem with small character columns but when you get larger ones you quickly add space. Also as the clustered index you need to be aware that you could introduce more fragmentation.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply