June 4, 2010 at 11:49 am
Is there any value to change date and time fields into integers.
June 4, 2010 at 12:09 pm
what do you mean by a value?? Are you looking for something similar to this SQL statement below in ssis.
Select Convert(Varchar,Getdate(),109)
June 4, 2010 at 12:12 pm
depends on what "kind" of integer you want:
select convert(int,getdate())
--40332 days since 01-01-1990, SQL's internal start date
select YEAR(getdate()) * 10000
+ MONTH(getdate()) * 100
+ DAY(getdate())
--an integer "20100604"
Lowell
June 4, 2010 at 12:13 pm
advantage. Is it easier to use? Take up less space? Create better indexes?
so instead of storing '06/04/2010 14:06' as a date time or small date time or date and time.
converting the 06/04/2010 to 20100604 and putting it into an int column and 14:06 to 1406 and placing it in a int column.
June 4, 2010 at 12:15 pm
lowell the second one 20100604 as an integer over just a datetime column
June 4, 2010 at 12:16 pm
there is absolutely no advantage to storing dates as anything except dates.
never store datetime as any other datetype except datetime.
what is the date of the integer 20105404?
internally, SQL stores it as numbers anyway...how it is displayed is up to your regional settings.
if you save it as anything except datetime, you'll have to convert back and forth, and it will not be easy to use the builtin tools like DATEADD DATEDIFF and the BETWEEN operator without converting stuff.
Lowell
June 4, 2010 at 12:17 pm
You really don't need to turn you dates into integer "date ids". SQL Server is highly optimized to use dates as indexes. Internally, they're stored as integers anyways, so the comparison is very quick. Any conversion you add is just extra manipulation that will have to occur.
--J
June 4, 2010 at 12:18 pm
our bi team insists on converting all dates to integers and they say they are easier to use. I as the DBA thought what you just wrote but I wondered if I was missing something.
Our bi is based on Kimball Method.
June 4, 2010 at 12:30 pm
The Kimball method is great (and it's probably what most of us use), but it's general purpose and system independant. SQL Server has been optimized to the point where this is not necessary. Trust me, there's nothing easy about using date ids. The fact that you're on a forum trying to figure out how to do it supports this.
Personally, I used date ids in a data warehouse I set up some time ago, and I regret it. Not only is it a hassle on the ETL side, it's also a hassle to use. I'm sure it can't possibly be more efficient to have to join just to get a date. Also, I do sometime query the warehouse directly. When I do, I have to use conversion functions on my dates before making comparisons (and pretty much every query against a data warehouse has a date in the WHERE clause). If I had it to do over again, I'd use smalldatetime.
As we said, dates are stored as integers any ways. Furthermore, they're stored as sequential numbers; so tomorrow will be stored as a larger number than today. Internally, SQL Server can perform comparison operations and BETWEEN without converting the data in any way.
I think you should make the case that they're not necessary. Do a little research and I think you'll find a lot of information out there that supports what we are saying. You and your team will be much happier if you just use the datetime or smalldatetime fields.
--J
June 4, 2010 at 3:53 pm
It's real simple.... a thousand professional opinions aren't worth a hoot. A single well written test will tell all. Have the BI "experts" prove their hypothesis using a small million row example. Only the code knows the truth. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 7, 2010 at 12:02 pm
That was my first reaction - show me the benchmark.
😎
June 8, 2010 at 2:23 am
jvanderberg (6/4/2010)
The Kimball method is great (and it's probably what most of us use), but it's general purpose and system independant. SQL Server has been optimized to the point where this is not necessary. Trust me, there's nothing easy about using date ids. The fact that you're on a forum trying to figure out how to do it supports this.Personally, I used date ids in a data warehouse I set up some time ago, and I regret it. Not only is it a hassle on the ETL side, it's also a hassle to use. I'm sure it can't possibly be more efficient to have to join just to get a date. Also, I do sometime query the warehouse directly. When I do, I have to use conversion functions on my dates before making comparisons (and pretty much every query against a data warehouse has a date in the WHERE clause). If I had it to do over again, I'd use smalldatetime.
As we said, dates are stored as integers any ways. Furthermore, they're stored as sequential numbers; so tomorrow will be stored as a larger number than today. Internally, SQL Server can perform comparison operations and BETWEEN without converting the data in any way.
I think you should make the case that they're not necessary. Do a little research and I think you'll find a lot of information out there that supports what we are saying. You and your team will be much happier if you just use the datetime or smalldatetime fields.
--J
What if your date is actually a surrogate key (for example, a surrogate key in a fact table that links with the time dimension). Do you still use the datetime datatype? Or just a random integer? (Or a meaningful integer, which is the topic of this thread)
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
June 9, 2010 at 3:31 pm
[font="Courier New"]Hello,
I have found storing the date as an integer to be useful as an integer surrogate key to a date dimension table with attributes you can not easily get from tsql, for example business events tied to specific dates. (ex. a HolidayFlag or something descriptive text about the day) (Kimball Method as mentioned by others above)
I concede that the date storage is as efficient as an integer, but propose that converting to an integer (Year(getDate())*100000+month(getDate()*100+day(getdate()) is not any more difficult than truncating the date to reflect a consistent date key surrogate value (dateadd(dd,0, datediff(dd,0, getDate())).
The consistency of always using integer surrogate keys for dimension tables seems beneficial...
One practice is the use of -1 as a key for an 'Unknown' date...
There is additional value in using a smart surrogate key for a date dimension as documented by Kimball et al.
Are there any opposing views on this usage?
Thank you
[/font]
June 9, 2010 at 5:40 pm
Patrick Seegers (6/9/2010)
[font="Courier New"]Are there any opposing views on this usage?Thank you
[/font]
Sure... read the rest of the posts on this thread.
Hmmmm.... "unknown date".... sounds like the perfect use for NULL unless it's an EndDate in which case the "unknown date" should be 9999-12-31. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
June 18, 2010 at 8:40 am
There is additional value in using a smart surrogate key for a date dimension as documented by Kimball et al.
Are there any opposing views on this usage?
Patrick, on the internet there are always opposing views, but I'm with you on the integer surrogate key practice. All of the benefits that the native functions bestow are pretty much irrelevant when you're talking about a key field, and in a date dimension you'll also have date fields where those functions can be used.
Cris E
[font="Arial"]Are you lost daddy? I asked tenderly.
Shut up he explained.[/font]
- Ring Lardner
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply