To ID, or Not to ID, That is the Question...

  • I'm screwing around with a few different data sources, and started normalizing the data by country/State/County/City/Zipcode. Eventually, I need to tie detailed items to counties and cities, so I wanted a nice global hierarchy...

    I started with a Country Table, with CountryID,CountryCode and CountryName.

    Then started making a State table which are sub divisions of a Country, with StateID,CountryID,StateCode,StateName.

    On to County, which are sub divisions of State,with CountyID,StateID,CountyCode,CountyName.

    When i got that far, I started thinking, why should The Id's be abstractions of the data? why not drop the ID's and just put a PK on the CountryCode/StateCode/CountyCodes instead? If I did it the other way with ID's, I'd probably end up with a bunch of views that pulled everything together the same way anyway.

    if you looked at child data, it might look like CountryCode,StateCode,CountyName,CityName,

    so for typical US data, it might be 'US','FL','BROWARD','Fort Lauderdale' as a unique constraint at the city level. much easier from the application end to display than a buch of ID's that you either need to look up or get a view that pulls it together, I think.

    So the question: I seem to see advantages to PK's on varchar fields in this case, so what benefit am i missing If I abstract every description to ID's in the hierarchy of foreign keys? If you were fiddling with data like this, how would you do it?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I've done it both ways. The one recommendation I can make is to make this consistent with the rest of the system. If the system is using artificial and invisible keys (identity fields or GUID's) then do the same thing with these lookup tables. If the rest of the system is using artificial or natural keys as actual key values defined in the Primary & Foriegn key constraints of the tables, then follow that path. There are short comings and issues with either approach you take, but if the approach you take is different than the rest of the design, you won't be happy.

    One other consideration, we've built most of our larger systems using the ID approach (for good or for ill, not something I want to argue). So we generally push that same design down to smaller projects or new projects because we've already built the mechanisms for managing the data in that fashion. Again, consistency has rewards and inconsistency causes problems.

    Neither approach is a total win or a total loss. Hopefully I won't get too much mud tossed at me for almost taking a side in this religious war.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I agree with Grant. Make it consistent. Either way can work.

  • If you have the chance then use international/national standards for your geographic codes. You would be surprised by how many different every day entities DO have an ISO standard.

    Obviously surrogate keys work but the use of standards means (as you have pointed out) the codes themselves have meaning in their own right.

    If you have to exchange data with other parties then everyone knows the DE is Germany, but CountryID 23 only means something to your own internal audience.

    The other thing is that many of the codes are actually fixed lengths and therefore you can use CHAR rather than VARCHAR. You won't notice the difference until you really stress the system but CHARs do perform better than VARCHAR.

  • It is better to go with the ID approach. If one of the codes changes, you only have to update a single row in one table. For example, Canada made changes not that long ago to some of the province codes.

  • It's about weighing the benefits versus the cost, in two separate areas (compound keys vs single-column keys, and natural vs surrogate keys). It's all a matter of how that data plays for you.

    Natural keys change in my world, so I tend not to be a big fan. But - again you may find it works for you.

    On the other hand - having the compound, multi-level keys allows for pretty efficient "by-pass" queries, even if it can also have a bigger maintenance component if say, a town is transferred from one county to another.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • If it's gotta be a hierarchy and it's gotta be fast, use Joe Celko's "Nested Sets"... Cool thing is, it fits all in one nice table...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The first reply was spot on - keep it consistent with the rest of the system. I suppose you should also think of how beneficial a list of countries, states, etc would be to your application. For example..

    1. Do you need to show a list of countries, states, etc for a user to choose from.

    2. Should the list be hardcoded in your app or will you add more countries, states, etc over time

    3. Do you have several other databases with which you'll interface? If so, what do they do.

  • Lowell...

    You don't need to worry about other tables, FK's, and all of those concerns it you use the Celko's nested set model... you also don't have to worry about the time it takes to do recurssion, etc, etc. Take a look at it...

    http://www.intelligententerprise.com/001020/celko.jhtml?_requestid=1266295

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply