Dates provide as string YYYYMM: Is best practice for storing as date (adding 01 as day) or as varchar(6) then manipulate to do date math?

  • I am in the enviable position (no sarcasm) of being able to start from scratch with designing tables for SSRS charts for all our company's departments (they used to just use Excel to display the data and email it around or put the workbooks on a shared drive).  The date values they most often provide in Excel are for a month with year, without any day granularity needed. If I store them as varchar(6), then I need to do a lot of manipulation if I want to use date functions to, for example, display the months as "Mar-2018" on a chart.  Googling led me to an example of someone adding "01" to the date like this 201806 becoming 20180601 stored as a date datatype  but I have a bad feeling about that because it implies a precision that isn't real even though it's not shown in the chart.

    I am hoping someone else can tell me how they handle this. Is it bad practice to store the string they provide as a date by adding "01" as the day?

  • pharmkittie - Wednesday, June 13, 2018 12:08 PM

    I am in the enviable position (no sarcasm) of being able to start from scratch with designing tables for SSRS charts for all our company's departments (they used to just use Excel to display the data and email it around or put the workbooks on a shared drive).  The date values they most often provide in Excel are for a month with year, without any day granularity needed. If I store them as varchar(6), then I need to do a lot of manipulation if I want to use date functions to, for example, display the months as "Mar-2018" on a chart.  Googling led me to an example of someone adding "01" to the date like this 201806 becoming 20180601 stored as a date datatype  but I have a bad feeling about that because it implies a precision that isn't real even though it's not shown in the chart.

    I am hoping someone else can tell me how they handle this. Is it bad practice to store the string they provide as a date by adding "01" as the day?

    Storing the values as dates is good practice, despite your (understandable) reservations. The benefits outweigh the negatives.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin - Wednesday, June 13, 2018 12:16 PM

    Storing the values as dates is good practice, despite your (understandable) reservations. The benefits outweigh the negatives.

    Unless the requirement is only to report based on monthly reporting periods - and no date functionality is needed.

    I would store the period as a CHAR(6) - and persist a derived column as: datefromparts(left(MonthPeriod, 4), right(MonthPeriod, 2), 1).  This would give you the date column to be used when you need date functionality and the period column for grouping when needed.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 - Wednesday, June 13, 2018 1:35 PM

    Phil Parkin - Wednesday, June 13, 2018 12:16 PM

    Storing the values as dates is good practice, despite your (understandable) reservations. The benefits outweigh the negatives.

    Unless the requirement is only to report based on monthly reporting periods - and no date functionality is needed.

    I would store the period as a CHAR(6) - and persist a derived column as: datefromparts(left(MonthPeriod, 4), right(MonthPeriod, 2), 1).  This would give you the date column to be used when you need date functionality and the period column for grouping when needed.

    Well, the original post does contain this

    ... I need to do a lot of manipulation if I want to use date functions to, for example, display the months as "Mar-2018" on a chart


    which heavily implies the need for date functionality, in my opinion.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I'm with Phil.  Better to store the dates as dates.  If you store them as character data (whether CHAR or VARCHAR) it's too easy for "bad" data to slip in, say 201813 or 2019O1.  That being said, I'd also suggest having a chat with the various people involved in this, with a list of the pros and cons of both options.

    After all, if you start showing the dates as "01-Mar-2018" (CONVERT(date, '20180301', 106)) instead of what they're expecting to see (Mar-2018) they may get a touch upset.

  • jasona.work - Wednesday, June 13, 2018 1:47 PM

    I'm with Phil.  Better to store the dates as dates.  If you store them as character data (whether CHAR or VARCHAR) it's too easy for "bad" data to slip in, say 201813 or 2019O1.  That being said, I'd also suggest having a chat with the various people involved in this, with a list of the pros and cons of both options.

    After all, if you start showing the dates as "01-Mar-2018" (CONVERT(date, '20180301', 106)) instead of what they're expecting to see (Mar-2018) they may get a touch upset.

    I agree that storing dates as dates is correct - but YYYYMM in reporting is not a date.  It is a period - which could indicate the posting period (this invoice was posted in period 201804) which is not necessarily 2018-04-nn.  In fact - in some cases the posting period could be 201804 and the created date could be 2018-03-25.

    You can insure a valid period with a simple check constraint - so I wouldn't be too worried about 'invalid' data in a period column.  Since this is all related to reporting - I would assume that the period value is going to be generated anyways and if you are creating tables for this (instead of just creating queries) - then having a period column and a date column would make sense.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • jasona.work - Wednesday, June 13, 2018 1:47 PM

    I'm with Phil.  Better to store the dates as dates.  If you store them as character data (whether CHAR or VARCHAR) it's too easy for "bad" data to slip in, say 201813 or 2019O1.  That being said, I'd also suggest having a chat with the various people involved in this, with a list of the pros and cons of both options.

    After all, if you start showing the dates as "01-Mar-2018" (CONVERT(date, '20180301', 106)) instead of what they're expecting to see (Mar-2018) they may get a touch upset.

    To be honest the people who'll see the charts will never have any interest in seeing how it's stored.  If I started talking to them about that they might start avoiding me 😉 

    I will display the data exactly how they want to see it using the date functions.

  • Jeffrey Williams 3188 - Wednesday, June 13, 2018 1:35 PM

    Phil Parkin - Wednesday, June 13, 2018 12:16 PM

    Storing the values as dates is good practice, despite your (understandable) reservations. The benefits outweigh the negatives.

    Unless the requirement is only to report based on monthly reporting periods - and no date functionality is needed.

    I would store the period as a CHAR(6) - and persist a derived column as: datefromparts(left(MonthPeriod, 4), right(MonthPeriod, 2), 1).  This would give you the date column to be used when you need date functionality and the period column for grouping when needed.

    I have to disagree with that because there's always a call to do some date math somewhere.  If not now, then later.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's a good point, however, that a company's financial periods don't necessarily align exactly with months. In some companies there is a period 13, for example.

  • You need to consider what it means to convert a YYYYMM to a date.  Do you consider the "equivalent date" of the posting period to be the first of the month, last of the month, or something else?

    The data says the posting period is YYYYMM = 201806 this month.  But that doesn't tell you a thing about when an item was posted.  It could be today, yesterday, or 2 weeks ago.

    I suspect that when an item is posted, whatever that means in the business, there is a specific day that it happened.  Maybe that day (and possibly time of day as well) was stored somewhere.  Now you have been given some transformation of the data that says the posting period is YYYYMM.  Can you go back to the source and find out if the actual posting date exists?

  • I have been in discussions' with developers in the past, who have used character columns to store dates and it is a pain beyond belief. Their argument was that it was in the format that needed to be displayed. My argument was that there is no possibility of data manipulation, using standard SQL functions and the SQL code has to have conversions throughout to be of any use. They had never heard of non-SARGable conditions and didn't really care - they wanted it to be easy from their side, even though I could show it wouldn't be in the long run.
    Anyway, I lost that fight.
    So, when the report fell over because within the data there was a month of '62' I just referred to the email chain and left them to it.
    Store data in the type best suited to that data. The 'front-end' display of the data shouldn't be a concern of the database.

    Bought myself a t-shirt to remind myself of this.

  • BrainDonor - Thursday, June 14, 2018 5:46 AM

    I have been in discussions' with developers in the past, who have used character columns to store dates and it is a pain beyond belief. Their argument was that it was in the format that needed to be displayed. My argument was that there is no possibility of data manipulation, using standard SQL functions and the SQL code has to have conversions throughout to be of any use. They had never heard of non-SARGable conditions and didn't really care - they wanted it to be easy from their side, even though I could show it wouldn't be in the long run.
    Anyway, I lost that fight.
    So, when the report fell over because within the data there was a month of '62' I just referred to the email chain and left them to it.
    Store data in the type best suited to that data. The 'front-end' display of the data shouldn't be a concern of the database.

    Bought myself a t-shirt to remind myself of this.

    So they used a hammer to design the system, eh? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Wednesday, June 13, 2018 3:25 PM

    I have to disagree with that because there's always a call to do some date math somewhere.  If not now, then later.

    The problem is that we don't know whether or not the users are using YYYYMM to represent a date - or a period.  If the users are using financial periods - and you convert that to a date just so you can display the data as MMM-YYYY (May-2018) then you have altered the meaning of the data.  

    I am not saying you should not have a date value available for date math - if needed.  I am saying that you should not change the values without understanding what those values actually mean to the users.  As someone earlier stated - what if the period was 201813?  This can be a valid period but it will never be a valid date...in some systems this period would be used for adjustments during the process of closing the fiscal/calendar year.  In other systems they have 16 periods - where periods 13 through 16 are special periods for closing the fiscal/calendar year.

    If these are financial periods - and there can and will be periods greater than 12 then converting the values to a date is going to run into problems when you hit that 13th period.

    You have to confirm with the users the source of the data...and what it actually means.  Is it an arbitrary value the users assign - or the posting period - or the service/sale date or something else?  Can a previous period be assigned after that month is passed?  If so - what business process is driving the assignment of that value?

    With that said - if the users are using these as dates, then you store them as dates - of course.  But you also verify with the business owners what is valid...is it the 1st, 15th or last day of the month?  It may be a better option to store the value as the last day of the month so your date math can represent the closing of the month - or it could be storing it as the 15th for GL related processes - or it may even be better to store both the start and end dates of the month.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams 3188 - Thursday, June 14, 2018 10:54 AM

    Jeff Moden - Wednesday, June 13, 2018 3:25 PM

    I have to disagree with that because there's always a call to do some date math somewhere.  If not now, then later.

    The problem is that we don't know whether or not the users are using YYYYMM to represent a date - or a period.  If the users are using financial periods - and you convert that to a date just so you can display the data as MMM-YYYY (May-2018) then you have altered the meaning of the data.  

    I am not saying you should not have a date value available for date math - if needed.  I am saying that you should not change the values without understanding what those values actually mean to the users.  As someone earlier stated - what if the period was 201813?  This can be a valid period but it will never be a valid date...in some systems this period would be used for adjustments during the process of closing the fiscal/calendar year.  In other systems they have 16 periods - where periods 13 through 16 are special periods for closing the fiscal/calendar year.

    If these are financial periods - and there can and will be periods greater than 12 then converting the values to a date is going to run into problems when you hit that 13th period.

    You have to confirm with the users the source of the data...and what it actually means.  Is it an arbitrary value the users assign - or the posting period - or the service/sale date or something else?  Can a previous period be assigned after that month is passed?  If so - what business process is driving the assignment of that value?

    With that said - if the users are using these as dates, then you store them as dates - of course.  But you also verify with the business owners what is valid...is it the 1st, 15th or last day of the month?  It may be a better option to store the value as the last day of the month so your date math can represent the closing of the month - or it could be storing it as the 15th for GL related processes - or it may even be better to store both the start and end dates of the month.

    If you reread the original post, you will see that the OP refers to 'month with year', not 'period with year'. It seems you doubt that the OP knows their own requirement.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin - Thursday, June 14, 2018 11:22 AM

    If you reread the original post, you will see that the OP refers to 'month with year', not 'period with year'. It seems you doubt that the OP knows their own requirement.

    Not doubting the OP - but he did state that he did not want to follow up with the users...in fact, his response was something to the effect of 'if i started talking to them about that - they would probably avoid me'.

    Personally - I would never assume that a value in an Excel spreadsheet that is YYYYMM was in fact a date for all departments in the organization.  For financial departments - that could really cause a lot of problems for the users, especially if they are using financial reporting periods and FY dates.

    What I originally stated was that storing the value as a date is preferred - unless these are actually periods.  I also stated that I would store the original value - and create a persisted derived column as a date.  This way I maintain the actual data the users are providing and also have the ability to use a date math - if needed.

    I would never advocate storing dates as character data - and that is not what I have stated...even though it appears that is how it was received.  All I have stated is that you need to validate that these values are in fact dates before assuming you can convert them to a date.  If they are dates and the users really don't care what day of the month - then by all means - convert them to a date, but if you are just assuming they are dates because they 'look' like a date then you haven't done your due diligence.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply