Data Model Advice Needed

  • Hey everyone,

    I have a collection of integer values I need to maintain within a table designed to store said values for a given [Date] over a range of business hours for each [LocationID] (I’ll shorten the example for clarity)…

    Option #1:

    [Date] [LocationID] [8to9am] [9to10am] ...

    05/15/09 100 5 7

    05/15/09 110 9 35

    This is how it looks at present, but gathering this data is part of a large, nightly ETL process and it currently has no dependencies at all, so I can store it in whatever format I want.

    I’ve debated breaking down the metadata and using a FK on the HourRange…

    Option #2:

    [Date] [LocationID] [HourRangeID] [Count]

    05/15/09 100 1 5

    05/15/09 100 2 7

    05/15/09 110 1 9

    05/15/09 110 2 35

    And also timestamping the rows with DATETIME columns…

    Option #3:

    [Date] [LocationID] [PeriodStart] [PeriodEnd] [Count]

    05/15/09 100 2009-05-16 08:00:00.000 2009-05-16 08:59:59.999 5

    05/15/09 100 2009-05-16 09:00:00.000 2009-05-16 09:59:59.999 7

    05/15/09 110 2009-05-16 08:00:00.000 2009-05-16 08:59:59.999 9

    05/15/09 110 2009-05-16 09:00:00.000 2009-05-16 09:59:59.999 35

    So my ultimate question is, which way should I go with this? I’d lean towards the first option, just because it simplifies access to the info, but I know it’s not a very “SQL” solution and may cause problems down the road depending on the dataset format required by dependent reports.

    I know Option #2 is the truly "SQL" way, and therefore probably the way to go, but I thought I’d pass this by the – highly respected – folks here and just get a bit of feedback. I understand how it should be done, but I'd be hard pressed to make a solid argument if someone decided Option #1 or Option #3 was the way it should be.

    Thanks in advance for all replies!

    Nathan

  • Personally, I'd be inclined towards option 3.

    With the first one, all it takes is someone saying, "We need to change the first period so it starts at 6 AM", and the whole scheme breaks. The same is true, to a lesser extent, with the second option.

    The third one, the data is in the column, which is where it really should be per relational theory. Plus, in this case, it means if you need to add or modify a period, you don't have to rebuild any tables, do any weird date/time math, or anything else, you just change the data being inserted into the table.

    It also makes reports and such very easy to generate. If someone needs to know all the totals for mornings vs totals for afternoons, for example, you can query "where PeriodEnd <= 12" kind of stuff, instead of "Col1 + Col2 + Col3" kind of stuff. It's much more flexible for that kind of thing.

    All things considered, I'd actually be somewhat inclined to drop the PeriodStart column, depending on what you'd use that for. I'd have to know more of the use-case and business rules before I'd make that call.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hey G, thanks a lot for the reply.

    Thinking about it from the perspective you related really makes sense: the only necessary predicates for this data are the date, the location, and the hour of the day the number relates to, so Option #3 (minus [PeriodStart]) really does fit the bill. Why make it more complicated than necessary? Kind of where I was going with Option #1 (I wasn't really concerned with the long-term viability of the model (years of policy backing the ranges)), but why introduce that risk? Good point.

    I need to focus somewhat on not assuming the structure of every query I write needs a join or it isn't good SQL.

    Thanks again for the advice. Any recommendations for good books on relational theory?

  • On relational theory, start with Wikipedia, work your way up from there. As usual, it's a good starting place. (As always, just don't use it as a single-source-of-information.)

    Another thing to look up there is the definitions of the various Normal Forms. Start researching those, you'll see why I suggested what I did. Most of the time, Third Normal Form (3NF) helps in a lot of aspects of database design and use.

    If you get bogged down in any of that, ask here. Plenty of people who can help out on that kind of thing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • The only other change I would make to Option #3 is to make the EndPeriod an "open end" on the range. Example:

    StartPeriod EndPeriod

    2009-09-15 08:00:00.000 2009-09-15 09:00:00.000

    2009-09-15 09:00:00.000 2009-09-15 10:00:00.000

    Your queries would use the following:

    StartPeriod @ATime

    Make sense?

  • Ah, but normalization is what made me suggest Option #2... I'll keep reading 8)

  • It does make sense, but I'm going simpler still. I'm just throwing away both [PeriodStart] and [PeriodEnd] and DATEADDing the source [Date] timestamp (00:00:00.000) to the relevant hour.

    I can't think of any reason this model isn’t perfectly sufficient:

    [Date] [LocationID] [Count]

    2009-05-15 08:00:00.000 100 5

    2009-05-15 09:00:00.000 100 7

    2009-05-15 08:00:00.000 110 9

    2009-05-15 09:00:00.000 110 35

    Hopefully I have plenty of time for a fix if I'm eventually proven wrong...

    Thanks to both you guys for the advice! I really appreciate it.

  • That sounds like a good way to do this. You could probably make the Date and LocationID columns into a composite primary key for the table.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Exactly, I just wish there was some way to enforce a FK constraint on a column serving as part of a composite PK (LocationID), though a check constraint might work...

    Thanks again for all your help with this, G. This site is such a great resource to have access to.

    Nathan

  • Looks like you are going the right direction.

    As far as I know you can make LocationID the child portion of a FK even though it is part of the composite primary key.

  • You can, actually; I spoke too soon (misinterpreted an error I received on FK creation).

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply