Ranges as a component of a primary key

  • Say that you want to create a table for tracking where individuals were living at different times. I can see a couple of ways to design such a table, and I was wondering if either (or something else) was generally considered better.

    Method #1: Periods defined by start date alone

    CREATE TABLE Directory

    (

    personID INT,

    startDate DATE,

    cityName VARCHAR(64)

    );

    personID/startDate makes an obvious candidate key. The problem with this design is that lookups are more tedious than they seem like they should be. If you want to know where everyone lived on a particular date, you need to use either a NOT EXISTS clause or partitioning to make sure that in addition to startDate being before the target date, that there isn't a later startDate for that individual that is still before the target date.

    Method #2: Periods defined by Start Date and End Date

    CREATE TABLE Directory

    (

    personID INT,

    startDate DATE,

    endDate DATE,

    cityName VARCHAR(64)

    );

    Lookups are much easier with this design, since you only have to check that the target date is between startDate and endDate. However, while personID/startDate and personID/endDate are both candidate keys, in order to enforce consistency, you need to add a check constraint on startDate and endDate to ensure that there are no overlapping timespans for any individual.

    Is either of these approaches generally considered better form? Or is there another way?

    Thanks!

  • Say that you want to create a table for tracking where individuals were living at different times. I can see a couple of ways to design such a table, and I was wondering if either (or something else) was generally considered better.

    Method #1: Periods defined by start date alone

    CREATE TABLE Directory( personID INT, startDate DATE, cityName VARCHAR(64));

    personID/startDate makes an obvious candidate key. The problem with this design is that lookups are more tedious than they seem like they should be. If you want to know where everyone lived on a particular date, you need to use either a NOT EXISTS clause or partitioning to make sure that in addition to startDate being before the target date, that there isn't a later startDate for that individual that is still before the target date.

    Method #2: Periods defined by Start Date and End Date

    CREATE TABLE Directory( personID INT, startDate DATE, endDate DATE, cityName VARCHAR(64));

    Lookups are much easier with this design, since you only have to check that the target date is between startDate and endDate. However, while personID/startDate and personID/endDate are both candidate keys, in order to enforce consistency, you need to add a check constraint on startDate and endDate to ensure that there are no overlapping timespans for any individual.

    Is either of these approaches generally considered better form? Or is there another way?

    Thanks!

    Because I am not complete sure what you want to accomplish here. I would do something more like this statement:

    CREATE TABLE Directory(

    personID int IDENTITY(1,1) not null, -- use as a PK(primary Key)

    startDate DATE,

    endDate DATE,

    cityName VARCHAR(64));

    This will give a unique id for everyone beyond that create the proper indexes depending on how you decide to query against the table.

    :cool:

    "There are no problems! Only solutions that have yet to be discovered!"

  • bopeavy (7/14/2011)Because I am not complete sure what you want to accomplish here. I would do something more like this statement:

    CREATE TABLE Directory(

    personID int IDENTITY(1,1) not null, -- use as a PK(primary Key)

    startDate DATE,

    endDate DATE,

    cityName VARCHAR(64));

    This will give a unique id for everyone beyond that create the proper indexes depending on how you decide to query against the table.

    Careful, now... 🙂 The code above will only allow each person to live in one place... ever.

    😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IMHO, Method #2 is definitely the better way go for a lot of the reasons you stated. It's called a "Type 2 Slowly Changing Dimension".

    When I do such things, I also add a trigger to automatically end-date old rows for each person being inserted into the table. This is also one of the very few places where I'll use a '9999-12-31' date for an EndDate for whatever the current row is for each person. It makes queries a whole lot easier to write and easily allows for Index Seeks should lookups by date-range ever crop up.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    Careful, now... The code above will only allow each person to live in one place... ever.

    So true Jeff thats why I stated not sure what he really wanted to do. I agree if you really wanted to get in to it the City would be a dimension of the fact. So possibly would be the people too. The fact would be the date which would have a location dimention and a person dimention and so on..... + 3 tables so far.

    :cool:

    "There are no problems! Only solutions that have yet to be discovered!"

  • Jeff Moden (7/14/2011)


    IMHO, Method #2 is definitely the better way go for a lot of the reasons you stated. It's called a "Type 2 Slowly Changing Dimension".

    When I do such things, I also add a trigger to automatically end-date old rows for each person being inserted into the table. This is also one of the very few places where I'll use a '9999-12-31' date for an EndDate for whatever the current row is for each person. It makes queries a whole lot easier to write and easily allows for Index Seeks should lookups by date-range ever crop up.

    Yes, I think I will end up doing that. Thanks for pointing out the "slowly changing dimension" term, I was able to learn more based on that.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply