May 18, 2009 at 12:32 pm
Hey everyone,
I have a collection of integer values I need to maintain within a table designed to store said values for a given [Date] over a range of business hours for each [LocationID] (I’ll shorten the example for clarity)…
Option #1:
[Date] [LocationID] [8to9am] [9to10am] ...
05/15/09 100 5 7
05/15/09 110 9 35
This is how it looks at present, but gathering this data is part of a large, nightly ETL process and it currently has no dependencies at all, so I can store it in whatever format I want.
I’ve debated breaking down the metadata and using a FK on the HourRange…
Option #2:
[Date] [LocationID] [HourRangeID] [Count]
05/15/09 100 1 5
05/15/09 100 2 7
05/15/09 110 1 9
05/15/09 110 2 35
And also timestamping the rows with DATETIME columns…
Option #3:
[Date] [LocationID] [PeriodStart] [PeriodEnd] [Count]
05/15/09 100 2009-05-16 08:00:00.000 2009-05-16 08:59:59.999 5
05/15/09 100 2009-05-16 09:00:00.000 2009-05-16 09:59:59.999 7
05/15/09 110 2009-05-16 08:00:00.000 2009-05-16 08:59:59.999 9
05/15/09 110 2009-05-16 09:00:00.000 2009-05-16 09:59:59.999 35
So my ultimate question is, which way should I go with this? I’d lean towards the first option, just because it simplifies access to the info, but I know it’s not a very “SQL” solution and may cause problems down the road depending on the dataset format required by dependent reports.
I know Option #2 is the truly "SQL" way, and therefore probably the way to go, but I thought I’d pass this by the – highly respected – folks here and just get a bit of feedback. I understand how it should be done, but I'd be hard pressed to make a solid argument if someone decided Option #1 or Option #3 was the way it should be.
Thanks in advance for all replies!
Nathan
May 18, 2009 at 12:52 pm
Personally, I'd be inclined towards option 3.
With the first one, all it takes is someone saying, "We need to change the first period so it starts at 6 AM", and the whole scheme breaks. The same is true, to a lesser extent, with the second option.
The third one, the data is in the column, which is where it really should be per relational theory. Plus, in this case, it means if you need to add or modify a period, you don't have to rebuild any tables, do any weird date/time math, or anything else, you just change the data being inserted into the table.
It also makes reports and such very easy to generate. If someone needs to know all the totals for mornings vs totals for afternoons, for example, you can query "where PeriodEnd <= 12" kind of stuff, instead of "Col1 + Col2 + Col3" kind of stuff. It's much more flexible for that kind of thing.
All things considered, I'd actually be somewhat inclined to drop the PeriodStart column, depending on what you'd use that for. I'd have to know more of the use-case and business rules before I'd make that call.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 18, 2009 at 1:08 pm
Hey G, thanks a lot for the reply.
Thinking about it from the perspective you related really makes sense: the only necessary predicates for this data are the date, the location, and the hour of the day the number relates to, so Option #3 (minus [PeriodStart]) really does fit the bill. Why make it more complicated than necessary? Kind of where I was going with Option #1 (I wasn't really concerned with the long-term viability of the model (years of policy backing the ranges)), but why introduce that risk? Good point.
I need to focus somewhat on not assuming the structure of every query I write needs a join or it isn't good SQL.
Thanks again for the advice. Any recommendations for good books on relational theory?
May 18, 2009 at 1:36 pm
On relational theory, start with Wikipedia, work your way up from there. As usual, it's a good starting place. (As always, just don't use it as a single-source-of-information.)
Another thing to look up there is the definitions of the various Normal Forms. Start researching those, you'll see why I suggested what I did. Most of the time, Third Normal Form (3NF) helps in a lot of aspects of database design and use.
If you get bogged down in any of that, ask here. Plenty of people who can help out on that kind of thing.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 18, 2009 at 2:07 pm
The only other change I would make to Option #3 is to make the EndPeriod an "open end" on the range. Example:
StartPeriod EndPeriod
2009-09-15 08:00:00.000 2009-09-15 09:00:00.000
2009-09-15 09:00:00.000 2009-09-15 10:00:00.000
Your queries would use the following:
StartPeriod @ATime
Make sense?
May 18, 2009 at 3:21 pm
Ah, but normalization is what made me suggest Option #2... I'll keep reading 8)
May 18, 2009 at 3:32 pm
It does make sense, but I'm going simpler still. I'm just throwing away both [PeriodStart] and [PeriodEnd] and DATEADDing the source [Date] timestamp (00:00:00.000) to the relevant hour.
I can't think of any reason this model isn’t perfectly sufficient:
[Date] [LocationID] [Count]
2009-05-15 08:00:00.000 100 5
2009-05-15 09:00:00.000 100 7
2009-05-15 08:00:00.000 110 9
2009-05-15 09:00:00.000 110 35
Hopefully I have plenty of time for a fix if I'm eventually proven wrong...
Thanks to both you guys for the advice! I really appreciate it.
May 19, 2009 at 6:53 am
That sounds like a good way to do this. You could probably make the Date and LocationID columns into a composite primary key for the table.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 19, 2009 at 7:36 am
Exactly, I just wish there was some way to enforce a FK constraint on a column serving as part of a composite PK (LocationID), though a check constraint might work...
Thanks again for all your help with this, G. This site is such a great resource to have access to.
Nathan
May 19, 2009 at 7:47 am
Looks like you are going the right direction.
As far as I know you can make LocationID the child portion of a FK even though it is part of the composite primary key.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 19, 2009 at 8:03 am
You can, actually; I spoke too soon (misinterpreted an error I received on FK creation).
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply