Separating SSIS Datetime

  • Jeff Moden - Wednesday, September 5, 2018 11:36 AM

    RonKyle - Wednesday, September 5, 2018 11:17 AM

    Jeff Moden - Wednesday, September 5, 2018 10:46 AM

    RonKyle - Wednesday, September 5, 2018 7:45 AM

    Jeff Moden - Monday, September 3, 2018 7:29 PM

    Don't do it.  Don't split date from time and put them in separate columns.  It will cause you more problems than you can shake a stick at.  If you absolutely must, create computed columns in the table to do the separation but keep that original column together.

    There have been several posts after this, but there can be good reasons to split this.  If it's going into a DW environment you have to.  Date and time information need to be in separate dimensions.  Otherwise the whole thing becomes unwieldy.  You may have run into situations where this caused issues.  But the addition of the date and time data types has been a huge help for me.  Having this as an absolute rule is not a good idea.  In this case (unlike all too many normalization issues) it really does depend.

    If you do need to have it split, don't do it at the expense of not having the complete date and time.  Import the whole date/time and let a persisted computed column do the split so that when you need it, you actually end up with something that's both SARGable and is capable of temporal accuracy then it comes to "wrap arounds" with dates.  For example, things like shifts (an other similar things) that straddle a day play hell on the separated columns.

    I don't run into any of these issues when splitting it.  Both the date and time have value in themselves and the two columns are easily combined.  I understand why the source database combines them.  Normally that makes sense in an operational database.  But without further questioning, I don't think it's a good idea to say it's automatically a bad idea.

     Both the date and time have value in themselves and the two columns are easily combined

    Ok.  What does the code look like when you do combine the columns?  Also, do you ever have to combine the columns in a WHERE clause?

    Cast both values into the new data type.  I don't need to ever use them in a WHERE clause.  As it's part of a data warehouse ETL, the only value in the WHERE clause is the lineage value or key IS Null in the case of new records.  That is an interesting comment, however.  As I do use the date added date updated columns in the extraction WHERE, it's good that they are not separated there.  That said, it gives me no issues in the staging area and in a DW the date and time dimensions absolutely should not be combined.

  • RonKyle - Wednesday, September 5, 2018 11:43 AM

    Jeff Moden - Wednesday, September 5, 2018 11:36 AM

    RonKyle - Wednesday, September 5, 2018 11:17 AM

    Jeff Moden - Wednesday, September 5, 2018 10:46 AM

    RonKyle - Wednesday, September 5, 2018 7:45 AM

    Jeff Moden - Monday, September 3, 2018 7:29 PM

    Don't do it.  Don't split date from time and put them in separate columns.  It will cause you more problems than you can shake a stick at.  If you absolutely must, create computed columns in the table to do the separation but keep that original column together.

    There have been several posts after this, but there can be good reasons to split this.  If it's going into a DW environment you have to.  Date and time information need to be in separate dimensions.  Otherwise the whole thing becomes unwieldy.  You may have run into situations where this caused issues.  But the addition of the date and time data types has been a huge help for me.  Having this as an absolute rule is not a good idea.  In this case (unlike all too many normalization issues) it really does depend.

    If you do need to have it split, don't do it at the expense of not having the complete date and time.  Import the whole date/time and let a persisted computed column do the split so that when you need it, you actually end up with something that's both SARGable and is capable of temporal accuracy then it comes to "wrap arounds" with dates.  For example, things like shifts (an other similar things) that straddle a day play hell on the separated columns.

    I don't run into any of these issues when splitting it.  Both the date and time have value in themselves and the two columns are easily combined.  I understand why the source database combines them.  Normally that makes sense in an operational database.  But without further questioning, I don't think it's a good idea to say it's automatically a bad idea.

     Both the date and time have value in themselves and the two columns are easily combined

    Ok.  What does the code look like when you do combine the columns?  Also, do you ever have to combine the columns in a WHERE clause?

    Cast both values into the new data type.  I don't need to ever use them in a WHERE clause.  As it's part of a data warehouse ETL, the only value in the WHERE clause is the lineage value or key IS Null in the case of new records.  That is an interesting comment, however.  As I do use the date added date updated columns in the extraction WHERE, it's good that they are not separated there.  That said, it gives me no issues in the staging area and in a DW the date and time dimensions absolutely should not be combined.

    Cast both values into what datatype and then what?  Are you talking about for display purposes only? 

    Also, you just spoke in absolutes yourself.  You said that date and time dimensions absolutely should not be combined. What if you're working with data that is of the datetime datatype?  Are you going to actually convert that (in all the places that it could be used) to an INT (or some such) as the classic Date_Dim column rather than just using a datetime column that has dates with a midnight time? 

    I've worked with such things before and find that it's a real PITA, especially when troubleshooting, to go find out what date some Date_Dim integer actually is.  If they based in on the actual underlying datetime serial number, it wouldn't be so bad.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, September 5, 2018 12:19 PM

    RonKyle - Wednesday, September 5, 2018 11:43 AM

    Jeff Moden - Wednesday, September 5, 2018 11:36 AM

    RonKyle - Wednesday, September 5, 2018 11:17 AM

    Jeff Moden - Wednesday, September 5, 2018 10:46 AM

    RonKyle - Wednesday, September 5, 2018 7:45 AM

    Jeff Moden - Monday, September 3, 2018 7:29 PM

    Don't do it.  Don't split date from time and put them in separate columns.  It will cause you more problems than you can shake a stick at.  If you absolutely must, create computed columns in the table to do the separation but keep that original column together.

    There have been several posts after this, but there can be good reasons to split this.  If it's going into a DW environment you have to.  Date and time information need to be in separate dimensions.  Otherwise the whole thing becomes unwieldy.  You may have run into situations where this caused issues.  But the addition of the date and time data types has been a huge help for me.  Having this as an absolute rule is not a good idea.  In this case (unlike all too many normalization issues) it really does depend.

    If you do need to have it split, don't do it at the expense of not having the complete date and time.  Import the whole date/time and let a persisted computed column do the split so that when you need it, you actually end up with something that's both SARGable and is capable of temporal accuracy then it comes to "wrap arounds" with dates.  For example, things like shifts (an other similar things) that straddle a day play hell on the separated columns.

    I don't run into any of these issues when splitting it.  Both the date and time have value in themselves and the two columns are easily combined.  I understand why the source database combines them.  Normally that makes sense in an operational database.  But without further questioning, I don't think it's a good idea to say it's automatically a bad idea.

     Both the date and time have value in themselves and the two columns are easily combined

    Ok.  What does the code look like when you do combine the columns?  Also, do you ever have to combine the columns in a WHERE clause?

    Cast both values into the new data type.  I don't need to ever use them in a WHERE clause.  As it's part of a data warehouse ETL, the only value in the WHERE clause is the lineage value or key IS Null in the case of new records.  That is an interesting comment, however.  As I do use the date added date updated columns in the extraction WHERE, it's good that they are not separated there.  That said, it gives me no issues in the staging area and in a DW the date and time dimensions absolutely should not be combined.

    Cast both values into what datatype and then what?  Are you talking about for display purposes only? 

    Also, you just spoke in absolutes yourself.  You said that date and time dimensions absolutely should not be combined. What if you're working with data that is of the datetime datatype?  Are you going to actually convert that (in all the places that it could be used) to an INT (or some such) as the classic Date_Dim column rather than just using a datetime column that has dates with a midnight time? 

    I've worked with such things before and find that it's a real PITA, especially when troubleshooting, to go find out what date some Date_Dim integer actually is.  If they based in on the actual underlying datetime serial number, it wouldn't be so bad.

    I would not say there are no absolutes.  Kimball says they should be combined, and I've never seen anything to remotely suggest otherwise.  The dimension tables would quickly get unwieldy.  The two dimensions also work very well opposite each other.  The dates are the columns and the times the rows.  It's very handy for seeing the day to day changes (e.g. in my case work order creation date and times).  Regarding the combining, you asked what I'd do.  But I've never had to recombine them. 

    While even Kimball relented on the date from his original work, he also didn't do much work with accumulated snapshot tables.  Yet that is my bread and butter.  Measuring days between dates in that kind of system is crucial.  I can guarantee that 4687-4623 produces a more useable result that 180905-180205.  The join to the date dimension table is simply too easy to make and saves a lot of other issues.

  • RonKyle - Wednesday, September 5, 2018 12:32 PM

    ... Kimball says they should be combined, and I've never seen anything to remotely suggest otherwise...

    ... I can guarantee that 4687-4623 produces a more useable result that 180905-180205.  The join to the date dimension table is simply too easy to make and saves a lot of other issues.

    So I'm a little confused here. First you say you absolutely should not combine them, but then you state Kimball says they should be combined as if that supports your argument. Could you clarify that?

    Second... Are you subtracting these values or is this a X date to Y date kind of thing? Because if it's the later, I don't know any code that that works backwards from September 5th 2018 to February 5th 2018 very well. Usually people write the dates forward such as 180205 to 180905 instead of 180905 to 180205. Could you clarify how you're using them if you're doing this backwards?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • RonKyle - Wednesday, September 5, 2018 12:32 PM

    Jeff Moden - Wednesday, September 5, 2018 12:19 PM

    RonKyle - Wednesday, September 5, 2018 11:43 AM

    Jeff Moden - Wednesday, September 5, 2018 11:36 AM

    RonKyle - Wednesday, September 5, 2018 11:17 AM

    Jeff Moden - Wednesday, September 5, 2018 10:46 AM

    RonKyle - Wednesday, September 5, 2018 7:45 AM

    Jeff Moden - Monday, September 3, 2018 7:29 PM

    Don't do it.  Don't split date from time and put them in separate columns.  It will cause you more problems than you can shake a stick at.  If you absolutely must, create computed columns in the table to do the separation but keep that original column together.

    There have been several posts after this, but there can be good reasons to split this.  If it's going into a DW environment you have to.  Date and time information need to be in separate dimensions.  Otherwise the whole thing becomes unwieldy.  You may have run into situations where this caused issues.  But the addition of the date and time data types has been a huge help for me.  Having this as an absolute rule is not a good idea.  In this case (unlike all too many normalization issues) it really does depend.

    If you do need to have it split, don't do it at the expense of not having the complete date and time.  Import the whole date/time and let a persisted computed column do the split so that when you need it, you actually end up with something that's both SARGable and is capable of temporal accuracy then it comes to "wrap arounds" with dates.  For example, things like shifts (an other similar things) that straddle a day play hell on the separated columns.

    I don't run into any of these issues when splitting it.  Both the date and time have value in themselves and the two columns are easily combined.  I understand why the source database combines them.  Normally that makes sense in an operational database.  But without further questioning, I don't think it's a good idea to say it's automatically a bad idea.

     Both the date and time have value in themselves and the two columns are easily combined

    Ok.  What does the code look like when you do combine the columns?  Also, do you ever have to combine the columns in a WHERE clause?

    Cast both values into the new data type.  I don't need to ever use them in a WHERE clause.  As it's part of a data warehouse ETL, the only value in the WHERE clause is the lineage value or key IS Null in the case of new records.  That is an interesting comment, however.  As I do use the date added date updated columns in the extraction WHERE, it's good that they are not separated there.  That said, it gives me no issues in the staging area and in a DW the date and time dimensions absolutely should not be combined.

    Cast both values into what datatype and then what?  Are you talking about for display purposes only? 

    Also, you just spoke in absolutes yourself.  You said that date and time dimensions absolutely should not be combined. What if you're working with data that is of the datetime datatype?  Are you going to actually convert that (in all the places that it could be used) to an INT (or some such) as the classic Date_Dim column rather than just using a datetime column that has dates with a midnight time? 

    I've worked with such things before and find that it's a real PITA, especially when troubleshooting, to go find out what date some Date_Dim integer actually is.  If they based in on the actual underlying datetime serial number, it wouldn't be so bad.

    I would not say there are no absolutes.  Kimball says they should be combined, and I've never seen anything to remotely suggest otherwise.  The dimension tables would quickly get unwieldy.  The two dimensions also work very well opposite each other.  The dates are the columns and the times the rows.  It's very handy for seeing the day to day changes (e.g. in my case work order creation date and times).  Regarding the combining, you asked what I'd do.  But I've never had to recombine them. 

    While even Kimball relented on the date from his original work, he also didn't do much work with accumulated snapshot tables.  Yet that is my bread and butter.  Measuring days between dates in that kind of system is crucial.  I can guarantee that 4687-4623 produces a more useable result that 180905-180205.  The join to the date dimension table is simply too easy to make and saves a lot of other issues.

    If the dates are stored as dates rather than integers, a simple DATEDIFF(dd,StartDate,EndDate) works just fine and saves on many other issues above and beyond what an abstracted integer will.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the dates are stored as dates rather than integers, a simple DATEDIFF(dd,StartDate,EndDate) works just fine and saves on many other issues above and beyond what an abstracted integer will.                       

    Have you tried that in MDX?  If so, how did that work?

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply