May 31, 2010 at 9:24 pm
Comments posted to this topic are about the item Sins of SQL: The Time Table
June 1, 2010 at 12:16 am
When I executed this statement
SELECT * FROM [MyDB].[dbo].[fn_GetTimestampRange] (DateAdd(YY,-10,GETDATE()), GETDATE(), 1)
it returned 8726399 records in 2.29 mins and then gave error as
"The statement terminated. The maximum recursion 100 has been exhausted before statement completion."
Any idea why this error cropped up?
June 1, 2010 at 12:18 am
Just a brief note.
This would be ok until you look at the "special" cases we find everyday with clients such as :
- wanting weekday/nonworking day (including national holidays) per country,
- end of month days being decided by the company not actual calendar,
- requiring more than 100 days (maxrecursion option accepted),
- being able to change the information for a day easily using values not code,
On the maintainance side not really a problem if you design the cube with 2 dimension tables (date and time). In one table all dates, in the other all times.
Fitz
June 1, 2010 at 12:26 am
If you use a "logical" surrogate key in the fact table, for example 20100601, you can still have the time table AND fast searching and simple SQL queries against the fact table. (This method also has its drawbacks, but it works for me)
Furthermore, in some companies, the time table contains much more than just the time, but also holiday information, fiscal information et cetera.
So, I'm not convinced to drop my time tables 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 1, 2010 at 1:05 am
Using a recursive CTE to generate what is basically a dynamic tally table except returning a dynamic calendar/time table is not scalable.
For an article on dynamic tally tables, please read this one: The Dynamic Tally or Numbers Table[/url].
You can easily modify it to meet your needs as detailed in the article, and it will be much more scalable as it won't be using the RBAR approach inherent in recursive CTE's.
June 1, 2010 at 1:06 am
eshandeepak (6/1/2010)
When I executed this statementSELECT * FROM [MyDB].[dbo].[fn_GetTimestampRange] (DateAdd(YY,-10,GETDATE()), GETDATE(), 1)
it returned 8726399 records in 2.29 mins and then gave error as
"The statement terminated. The maximum recursion 100 has been exhausted before statement completion."
Any idea why this error cropped up?
It does say in the article: "(Note that if you want to produce more than 100 distinct days in a single query, you'll need to append OPTION(MAXRECURSION numberOfDays) to your query, where numberOfDays is the number of days you wish to loop through. This can be hidden away in the stored procedures, and SQL Server allows you to specify a MAXRECURSION of 0, which does not limit the amount of days that can be processed."
June 1, 2010 at 1:15 am
Like others here I'd need a bit more to convince me to remove my table or use this as an alternative. Like most, I also us the time table to build a time dimension and provide multiple hierarchies to users 12 hour view, 24 hour view, short time descriptions long time descriptions ect without further work on coding in descriptions this wouldn't meet all my needs. This seems like a good option for using in a data retrieval query if you dont already have a time table and your resultset was small to medium, but as a replacement, not for me.
June 1, 2010 at 1:17 am
Its lack of scalability is its biggest problem. A recursive CTE is still RBAR (a Modenism for Row By Agonizing Row).
June 1, 2010 at 1:24 am
da-zero (6/1/2010)
If you use a "logical" surrogate key in the fact table, for example 20100601, you can still have the time table AND fast searching and simple SQL queries against the fact table. (This method also has its drawbacks, but it works for me)
And me 🙂
So, I'm not convinced to drop my time tables 🙂
The SQL Server team added all sorts of cool optimisations to the database engine for star-schema processing.
Do I want to defeat that by replacing a small, efficient table with a recursive CTE? Not really!
One star.
June 1, 2010 at 2:01 am
Some remarks:
1) Dimensional modelling means denormalized dimensions and normalized facts, so the argument of duplicate data is not valid
2) Treat Date and Time as different dimensions. For the Date dimension (day up to year plus any extra values based upon them like holidays etc) you have a Date_key (no lookup needed when a date is included in your facts). For the Time dimension (for example seconds up to hour plus any extra values based upon them such as specific time frames) there is a Time_key which also does not require a lookup. Based upon a time (for instance a timestamp) in your fact, you can calculate a unique key on forehand (14:32:21 results in 14 x 3600 plus 32 x 60 plus 21 = 52341) in your facttable. The Time dimension can be precalculated based on the same method (results in 86400 rows = seconds in a day)
Since dimensions are in principle used for analysis the specific second of a time dimension is usually not interesting, however one wants to discover predictive patterns based on intra-day values. If one needs to know what happened the last 45 minutes (how many transactions for some specific call option for some stock with an exercise price of X and expiration of Y) one uses the timestamp of the transaction which is included in the fact table.
3) Using stored procedures can perhaps be necessary (calculate a value based on spot prices in the stock market during a day), however since dimensional modelling involves ETL and a datamodel, one needs to avoid as much as possible complexity on the query side. Introducing stored procedures there as well does decrease the maintainability
4) Query performance is only an issue when you query on the fly. Using a cube you can pre-calculate the values, so this should not be an issue in the first place
June 1, 2010 at 3:32 am
I couldn't disagree with an article more. The author should read Chapter 2 of Kimball's The Data Warehouse Toolkit for a proper way to handle dates and times in a dimensional model. If I could give zero stars, I would.
June 1, 2010 at 5:20 am
Thank you! Just what I was telling myself while reading this (CTE/Scalability).
Also, when you design a warehouse for a customer that is not necessarily very versed in T-SQL, CTEs etc. telling them that you can have a dimension that describes the specifics of a date and use that as a FK can be very helpful. Seasonal activity, weekday/weekend activity etc. is much easier I think with such a concept. I try to have the best of both worlds: I keep the full datetime in my fact table and I also link it to a date dimension and to a time dimension. This way it's easy to query either way 🙂
June 1, 2010 at 5:31 am
I agree with the comment that the date and time dimension need to be seperated. I've seen the combined design, and it rapidly becomes unwieldy. It also makes it harder to cross join the date and time to produce a very readable pivot of times as the days go along. I would not recommend following this method.
June 1, 2010 at 6:49 am
The advantage of this method is also that the dimensions are very small (I generate 15 years of days and time, well, by the hour has 24 rows 😉 )
June 1, 2010 at 7:55 am
I'm glad I'm not the only one to balk at this article.
I tend to use the 2 dimension approach: One for date only and another for time (when needed) and never run into the unwieldy queries he's describing.
Still, it's nice to know the approach if a problem requiring that solution is ever called for.
Viewing 15 posts - 1 through 15 (of 83 total)
You must be logged in to reply to this topic. Login to reply