explicite alternative to FORMAT() concerning DATE

  • Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to big queries.

    How can I still want the result to be a date (NOT STRING)

    But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000)

    Leaving a year/month only date field (2020-11)

    Thanks

  • SQL does not have a Year-Month only data type

  • 2020-11 is not a date. 202011 as an INT is, however, easy enough.

    You say that you don't want to return a string, but that is exactly what FORMAT returns, which you appear to be somewhat fond of.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • replacement for format to return just yyyy-mm-dd (or yyyy-mm) is easy - convert(char(10), datefield, 121) - change the char(10) to char(7) to return just yyyy-mm - but as mentioned all these functions return a string.

    or if you just wish to remove the time portion of a datetime field you can do convert(date, datefield) - this will give you just the date portion as a date datatype (instead of string as the other examples, including FORMAT)

  • DesNorton wrote:

    SQL does not have a Year-Month only data type

     

    ok

  • Phil Parkin wrote:

    2020-11 is not a date. 202011 as an INT is, however, easy enough.

    You say that you don't want to return a string, but that is exactly what FORMAT returns, which you appear to be somewhat fond of.

    alright

  • frederico_fonseca wrote:

    replacement for format to return just yyyy-mm-dd (or yyyy-mm) is easy - convert(char(10), datefield, 121) - change the char(10) to char(7) to return just yyyy-mm - but as mentioned all these functions return a string.

    or if you just wish to remove the time portion of a datetime field you can do convert(date, datefield) - this will give you just the date portion as a date datatype (instead of string as the other examples, including FORMAT)

     

    Thanks

  • You could just use the 1st for the day, which would mean it was still a valid date:

    CONVERT(varchar(10), DATEADD(MONTH, DATEDIFF(MONTH, 0, Tab.date), 0), 120)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • And, just to make you feel better about abandoning FORMAT, I've not yet seen it where it's not at least 20 times slower than code like what Frederico suggested even on modern machines with nasty fast SSDs and a shedload of memory, etc.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The test is simple and straight forward

    set statistics time , io on
    go

    select FORMAT(datecol, 'yyyy-MM-dd') as Datetext
    from #TallyDate
    --order by DateCol

    /*
    (10000 rows affected)
    Table '#TallyDate__________________________________________________________________________________________________________000000000008'. Scan count 1, logical reads 22, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 156 ms, elapsed time = 226 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.
    */
    select convert(char(10), dateadd(dd, datediff(dd,0,datecol), 0), 121 ) as Datetext
    from #TallyDate

    /*

    (10000 rows affected)
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table '#TallyDate__________________________________________________________________________________________________________000000000008'. Scan count 1, logical reads 22, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 16 ms, elapsed time = 136 ms.

    */

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • @Johan ... Do you have the code where you created and populated the #TallyDate table so that newbies can run your test?

    Also, your good code doesn't need the DATEADD/DATEDIFF thing.  And change the CHAR(10) to CHAR(7) to give the op the YYYY-MM format he wanted.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheNewbie wrote:

    Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to big queries.

    How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)

    Thanks

    Just to remind people of what the OP is really asking for...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheNewbie wrote:

    Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to BIG QUERIES.

    How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)

    Thanks

    As the others have said, there is no temporal data type to hold just the Year and Month.

    Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping".  I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.

    "Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.

    So, what I recommend is posting one of the queries that is "slow because of FORMAT".  Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place.  What's the "wrong place"?  Anything (like aggregations) that should be done purely in the "data layer".  Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL.  While I agree with the premise, I don't agree with the absolute nature of that recommendation.

    So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    @Johan ... Do you have the code where you created and populated the #TallyDate table so that newbies can run your test?

    Also, your good code doesn't need the DATEADD/DATEDIFF thing.  And change the CHAR(10) to CHAR(7) to give the op the YYYY-MM format he wanted.

    ref: "The "Numbers" or "Tally" Table: What it is and how it replaces a loop"

    Formatting "yyyy-MM" must be done front end. Non date data cannot be stored in date/datetime(2) data types.

    Date values must be stored in date/datetime(2) data types with a single exception: your proprietary DateSeries table.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Jeff Moden wrote:

    TheNewbie wrote:

    Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to BIG QUERIES.

    How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)

    Thanks

    As the others have said, there is no temporal data type to hold just the Year and Month.

    Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping".  I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.

    "Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.

    So, what I recommend is posting one of the queries that is "slow because of FORMAT".  Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place.  What's the "wrong place"?  Anything (like aggregations) that should be done purely in the "data layer".  Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL.  While I agree with the premise, I don't agree with the absolute nature of that recommendation.

    So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.

     

    Hi,

    Thanks for all this information. I really appreciate and found that helpful.

    here is an (simplified) example below

    SELECT tab1.a      
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t

    FROM table1 tab1
    RIGHT JOIN table2 tab2
    ON tab1.a = tab2.c
    INNER JOIN table3 tab3
    ON tab1.b= tab3.e

    WHERE tab1.g >= '2012-04-01' -- or '2012-04'

    and I have to convert them all into date format in PowerBI

    • This reply was modified 2 years, 5 months ago by  TheNewbie.

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply