May 31, 2017 at 6:27 am
RonKyle - Tuesday, May 30, 2017 5:49 PMI have a data warehouse with some accumulating snapshot type fact tables, all of which have dates and several of which have times at different levels of granularity, with minute being the smallest.First, use a smallint as the PK. Even with the introduction of the time data type (which is nice). As mine is down to minutes, key 0 is my midnight, with the numbers incrementing by 1 for each minute. Each row is representing by the number indicating how many minutes past midnight that it is. That helps with determining time elapsed because I just subtract the keys to get the number of minutes.
Second, keep the dates and times in separate dimensions. Both time dimensions have a date counterpart. Combining them would be far too unwieldy. We had some consultants do that with some work that had to be done while I was away doing military service. I had to rip that portion of the work up. Having them separate makes for a very easy pivot like table with days as the columns and time as the rows.
The time dimension also has columns for the quarter hours and hours. So all the midnight to 00:14 entries have a 1 for 12:00-12:14:59, and so on. I create views on these columns, and these hook to the fact tables where the granularities are for quarter hours and hours.
Been using this system for many years, mostly in SQL 2005. Am in the middle of upgrading all to SQL 2016. Love the time type, but will not give up the Kimball system of independent keys. I also use these for my date dimension. It makes date range calculations easier, too. Besides, some of the dimensions are 2SCD, and the independent keys are indispensable.
Agree 100% that surrogate keys are indispensable -- I'm a Kimball model follower for my DW designs as well.
I'm curious if anyone has worked with a DW where the data is "high speed" -- maybe a stock exchange, for example. Where you might need to slice the data down to milliseconds. I've just never had that need in my professional career in the past.
Thanks,
Rob
May 31, 2017 at 7:40 am
robert.gerald.taylor - Wednesday, May 31, 2017 6:24 AMJason, I agree with you about using date/time in an OLTP system. In a star schema DW, I'm of the same opinion as Ron.I was just curious as to the arguments for using a Time data type as the primary key for a Time dimension in a data warehouse. So far, I've seen nothing that sways my opinion in that direction.
Thanks,
Rob
The advantage doesn't necessarily come from having the time dimension having a date/time key. The advantage comes from all of the foreign keys that would relate to it.
Let's say we have a fact table that has several date attributes and we need to know how much time has elapsed between each of those dates...
If all the dates in the fact table are stored as INTs, you're forced to make multiple joins to dimDate (once for every "DateKey" column you have in that table), and then perform your date calculations against the various joined dimDate tables.
If, on the other hand, the dates in your fact tables are actually stored as dates, you can do all or your date calculations directly against the fact table data w/o making unnecessary joins.
So... There are some clear advantages to having date/time data stored as date/time data, at least in the fact tables...
I know, I know... This is an OLAP db not an OLTP system and star schemas & snow flakes and things & stuff... You have to join to the date & time dimension tables any way... Right?
Okay, fine... Please tell me where you see an advantage of joining INT "date/time keys" to other "date/time keys" as opposed to joining actual date/time data to other actual date/time data?
I'll be honest, I've had this conversation several times in the past, and not one single person has EVER been able to demonstrate ANY performance advantage to joining on INTs as opposed to joining on DATEs or TIMEs... And... No one has been able to demonstrate that the use of INT date/time keys providing any functionality that isn't supplied by the use of actual date/time keys... None, zip, ziltch...
At the end of the day, the only argument that the Kimball method adherents seem to be left with is, "Kimball said so"... Sorry, I don't think that's a good enough answer...
Hopefully, I've adequately answered your question and provided some evidence that should "sway" you... If you are still unswayed, please let me know where you're seeing an advantage to using INT keys.
May 31, 2017 at 7:52 am
No one has been able to demonstrate that the use of INT date/time keys providing any functionality that isn't supplied by the use of actual date/time keys... None, zip, ziltch...
At the end of the day, the only argument that the Kimball method adherents seem to be left with is, "Kimball said so"... Sorry, I don't think that's a good enough answer...
The fact tables are not meant to be human readable. I really don't understand the issue with joins. Anyone who can design an OLAP model should be able to make a simple join. That said, when I need to determine the gap between create and complete dates or create and ship dates, I simply need to subtract the integers. Much easier that doing DATEDIFF calculations, especially in MDX.
This may not sway you, but at least don't say all you ever hear in the end is that Kimball said so. I started that way because Kimball said so. I continue that way because after more than a decade of experience in BI work, I remain convinced it is the better way. I won't go so far to say that the other is wrong, but in my view is that it's less than ideal.
May 31, 2017 at 8:29 am
RonKyle - Wednesday, May 31, 2017 7:52 AMThe fact tables are not meant to be human readable. I really don't understand the issue with joins. Anyone who can design an OLAP model should be able to make a simple join. That said, when I need to determine the gap between create and complete dates or create and ship dates, I simply need to subtract the integers. Much easier that doing DATEDIFF calculations, especially in MDX.
This may not sway you, but at least don't say all you ever hear in the end is that Kimball said so. I started that way because Kimball said so. I continue that way because after more than a decade of experience in BI work, I remain convinced it is the better way. I won't go so far to say that the other is wrong, but in my view is that it's less than ideal.
No tables in any type of schema are intended to be "human readable". If anything the flattened structures of most OLAP models are far closer to being human readable than a 3rd or 4th NF OLTP modeled database. Human readability has nothing do do with it.
I'm not saying that writing extra joins is, in any way, difficult from a code writing perspective, I'm saying that they are an unneeded expense. After all, I though the whole point of using flattened/denormalized table structures was to avoid/reduce the number of joins needed.
I don't write MDX queries so I won't comment on what is easy v/s not easy but performing DATEADD/DATEDIFF calculations based on INT representations seems sketchy at best.
Also, I'm not saying that the Kimball method doesn't work. Of course it works. If it didn't work it would have been abandoned long ago and we wouldn't be having a conversation about it. I'm saying it doesn't offer any advantage over using actual date/time data types BUT it does lack the advantages of being able to use the supplied (and incredibly useful) date/time functions.
You say "better" and "ideal" but you don't provide any evidence to support those claims. I'm simply asking for an example, that I can test myself, where using an INT provides better performance results than using a date/time data type.
May 31, 2017 at 8:38 am
I'm simply asking for an example, that I can test myself, where using an INT provides better performance results than using a date/time data type.
Try adding a linear regression formula to your cube. I don't think it can be done with dates. The only reason I could get it to work is because I used integers and could use a particular date as an anchor point. It was hard enough with integers. The Grace Hopper quote is not relevant because I didn't say it's always been done that way. I told you the MDX was easier, a point you commented that you couldn't address. I could my gunnery instructor. The last slide for his ballistics class said that "Those who know how something works will always work for those who know why something works." MDX is the beating heart of the cube. It's the place where you can do some great things. That is the reason for using integers as surrogate keys across the board. I'd be interested to see the calc scripts of those who don't use integers for their designs. My guess would be that they're not as extensive. I could be wrong, as I've never worked in a shop where the permanent designs didn't use surrogate keys. For flush and fill systems that might not be possible. But for systems using incremental loads, I stand my Kimball not out of misplaced ignorance but because of tried and true results.
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply