explicite alternative to FORMAT() concerning DATE

  • The FORMAT function return type is NVARCHAR , which is a string.

    https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver16

    You can get the same result by replacing FORMAT with

    CONVERT(varchar(10), [YourDateField], 120)
  • Syntax like FORMAT(tab1.g, 'yyyy-MM-dd') suggests that you are "rounding" a date and time value to just the date; you would therefore be much better off just casting/converting the value to a date, as a date has no time portion: CONVERT(date, tab1.g).

    Those JOINs seem odd too. You RIGHT JOIN to table2 but then INNER JOIN to table3 from table1. This means that all the joins are implicit INNER JOINs. Especially when you then add the requirement in the WHERE that tab1.g must have a non-NULL value.

    This is what you have at the moment:

    FROM table1 tab1
    RIGHT JOIN table2 tab2 ON tab1.a = tab2.c
    INNER JOIN table3 tab3 ON tab1.b = tab3.e
    WHERE tab1.g >= '2012-04-01';

    If we rewrite it as a LEFT JOIN this might be a bit more obvious:

    FROM table2 tab2
    LEFT JOIN table1 tab1 ON tab2.c = tab1.A
    INNER JOIN table3 tab3 ON tab1.b = tab3.E
    WHERE tab1.g >= '2012-04-01'

    So you "start" at table2 and then if there isn't a matching row in table1 want the row from table2 to still be returned. You then have both an INNER JOIN that references tab1 and your WHERE does too, without handling NULL values. This means that both tab1.b and tab1.g must have a non-NULL value, which is impossible if not related row in table1 was found (as the columns will have the value NULL).

     

    • This reply was modified 2 years, 5 months ago by  Thom A.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • DesNorton wrote:

    The FORMAT function return type is NVARCHAR , which is a string.

    https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver16

    You can get the same result by replacing FORMAT with

    CONVERT(varchar(10), [YourDateField], 120)

    Again, though, if you look again at the Op's original post near the end, you'll need to change the VARCHAR(10) to VARCHAR(7).  Since it will ALWAYS return just 7 characters, CHAR(7) seems like it might be the better choice, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheNewbie wrote:

    Jeff Moden wrote:

    TheNewbie wrote:

    Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to BIG QUERIES.

    How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)

    Thanks

    As the others have said, there is no temporal data type to hold just the Year and Month.

    Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping".  I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.

    "Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.

    So, what I recommend is posting one of the queries that is "slow because of FORMAT".  Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place.  What's the "wrong place"?  Anything (like aggregations) that should be done purely in the "data layer".  Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL.  While I agree with the premise, I don't agree with the absolute nature of that recommendation.

    So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.

    Hi,

    Thanks for all this information. I really appreciate and found that helpful.

    here is an (simplified) example below

    SELECT tab1.a      
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t

    FROM table1 tab1
    RIGHT JOIN table2 tab2
    ON tab1.a = tab2.c
    INNER JOIN table3 tab3
    ON tab1.b= tab3.e

    WHERE tab1.g >= '2012-04-01' -- or '2012-04'

    and I have to convert them all into date format in PowerBI

    What happened to "I want to discard the days" in that query, which includes the days?

    I agree... FORMAT is definitely NOT the think to use in the presence if if the many rows you stated.  If you actually do want the days as the code seems to indicate, then do like the others have suggest... change the FORMAT.  Before you do that, though, please run the following code and let us know what the output is.  In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.

     SELECT  LangName = @@LANGUAGE
    ,DateFormat = CONVERT(DATE,GETDATE())
    ;

    The other thing is, where is the output of your query being directed to?  If it has millions of rows and it's being directed to ANY screen output, there's about 80% of your performance issue.  We can help with the FORMAT problem (especially if you do what I asked above) but returning millions of rows to a screen will always be a slow evolution.

    If to a table, we can help there.  Or is this query going to be executed by Power BI?

    If you REALLY want some performance, the don't let Power BI have all the fun.  Do some of the aggregations you actually want in T-SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • What is the data type of "tab1.g, h, i, j, k, l?

    If it is a datetime, then CONVERT(date, tab1.g) is all you might need, depending upon the regional setting of your servers

    Also, in the WHERE clause, you should not be using something like '2012-04'. Use the date for the first of the month, again assuming it's a datetime data type

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Like I said...

    In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.

    SELECT LangName = @@LANGUAGE

    ,DateFormat = CONVERT(DATE,GETDATE())

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since Tally Table of dates was mentioned, here is how to construct it:

    1. create fnTally - table of numbers, by Jeff Moden, here  https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
    2. use the function to generate table of dates
    ; WITH TallyDates
    AS (SELECT
    t.N
    , TallyDate = Cast(Dateadd(dd, t.N, '1900-01-01') AS DATE)
    FROM dbo.Fntally(0, 73550) t)
    SELECT
    SeqNumber = TD.N
    , TallyDAte
    , Year(TAllyDate) AS [Year]
    , Month(TallyDAte) AS [Month]
    , Day(tallydate) AS [DAy]
    , DateName (dw,TallyDAte) AS WekDay
    FROM TallyDates AS TD
    WHERE TallyDate BETWEEN '19500101' AND '21001231'
    ;

    Table of dates simplifies date calculations big time.

    🙂

    Zidar's Theorem: The best code is no code at all...

  • Zidar wrote:

    Since Tally Table of dates was mentioned, here is how to construct it:

    1. create fnTally - table of numbers, by Jeff Moden, here  https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
    2. use the function to generate table of dates
    ; WITH TallyDates
    AS (SELECT
    t.N
    , TallyDate = Cast(Dateadd(dd, t.N, '1900-01-01') AS DATE)
    FROM dbo.Fntally(0, 73550) t)
    SELECT
    SeqNumber = TD.N
    , TallyDAte
    , Year(TAllyDate) AS [Year]
    , Month(TallyDAte) AS [Month]
    , Day(tallydate) AS [DAy]
    , DateName (dw,TallyDAte) AS WekDay
    FROM TallyDates AS TD
    WHERE TallyDate BETWEEN '19500101' AND '21001231'
    ;

    Table of dates simplifies date calculations big time.

    🙂

    Sorry, but I'm not seeing it.  How would such a thing be used to improve the performance of the query the OP posted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Q: Sorry, but I'm not seeing it.  How would such a thing be used to improve the performance of the query the OP posted?

    Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".

    For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks.  My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.

    I assumed that something like this

    SELECT  <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
    FROM SourceTAble
    JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDate

    would at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.

     

    Zidar's Theorem: The best code is no code at all...

  • Zidar wrote:

    My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.

    FORMAT is not 'tricky', just awfully slow.

    Here is a code sample which gets to YYMM format in both character and integer forms. I don't think it's particularly tricky, either:

    DECLARE @Somedate DATETIME = GETDATE();

    SELECT YYMMChar = CONVERT(CHAR(4), @Somedate, 12)
    ,YYMMInt = (YEAR(@Somedate) % 100) * 100 + MONTH(@Somedate);

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Zidar wrote:

    Q: Sorry, but I'm not seeing it.  How would such a thing be used to improve the performance of the query the OP posted?

    Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".

    For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks.  My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.

    I assumed that something like this

    SELECT  <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
    FROM SourceTAble
    JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDate

    would at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.

    Totally understood on all of that and thanks for trying to help.

    Even the original post for the OP talked about slowness but that kind of got lost in the "need" for formatting as Leaving a year/month only date field (2020-11) and then also saying they wanted "Leaving a year/month only date field (2020-11)" .

     

    Eventually, they coughed up an example query at the following URL.

    https://www.sqlservercentral.com/forums/topic/explicite-alternative-to-format-concerning-date#post-4053128

    That made me think (and I still think it) we're chasing the "wong wabbit" (as Elmer Fudd would say). 😀  The Op needs whole dates and FORMAT is one of the worst ways in the world to do such a thing but the Op also mention millions of rows in the source table and the query they posted could have issues with indexes to support the joins and the mountain of columns to be returned, etc, to be used for "BI" of one form or another.  I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way.

    The correct answer is that no formatting is required.  Just use CONVERT(DATE,somecolumn).  There's nothing faster for such a thing but all of that appears to have gotten lost because the Op made it sound like some sort of character based representation was necessary.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheNewbie wrote:

    here is an (simplified) example below

    SELECT tab1.a      
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t

    FROM table1 tab1
    RIGHT JOIN table2 tab2
    ON tab1.a = tab2.c
    INNER JOIN table3 tab3
    ON tab1.b= tab3.e

    WHERE tab1.g >= '2012-04-01' -- or '2012-04'

    and I have to convert them all into date format in PowerBI

    You never came back with an answer to my question but let's forget that for now.  Here's what I believe your first step should be according to what you posted above.

     SELECT tab1.a,
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    Date_tab1g = CONVERT(DATE,tab1.g), --Of course, you need to change the alias column name
    Date_tab1h = CONVERT(DATE,tab1.h), --Of course, you need to change the alias column name
    Date_tab1i = CONVERT(DATE,tab1.i), --Of course, you need to change the alias column name
    Date_tab2j = CONVERT(DATE,tab2.j), --Of course, you need to change the alias column name
    Date_tab2k = CONVERT(DATE,tab2.k), --Of course, you need to change the alias column name
    Date_tab2l = CONVERT(DATE,tab2.l), --Of course, you need to change the alias column name
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t
    FROM dbo.table1 tab1 --Note the use of the 2 part naming convention.
    RIGHT JOIN dbo.table2 tab2 ON tab1.a = tab2.c --Note the use of the 2 part naming convention.
    INNER JOIN dbo.table3 tab3 ON tab1.b = tab3.e --Note the use of the 2 part naming convention.
    WHERE tab1.g >= '2012-04-01' -- or '2012-04'
    ;

    The real issues are...

    If you need 10 years of data for analysis/reporting, does it REALLY need to be at the single row level of detail?  Surely, they can tell you what aggregates they need to analyze and do that pre-aggregation for them.  That would save a  whole lot on how much data you have to dump somewhere else for them to use AND it would seriously speed up their analysis/reporting efforts.

    Another concern is the indexing to support the query.  Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables because the optimizer threw up its hands because it recognized that it was going to have to do individual row lookups even if non-clustered indexes that supported the criteria for both the WHERE and the JOINs existed?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    TheNewbie wrote:

    Jeff Moden wrote:

    TheNewbie wrote:

    Hi all,

    I have been using FORMAT()

    FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',

    but been told it would be slow when it comes to BIG QUERIES.

    How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)

    Thanks

    As the others have said, there is no temporal data type to hold just the Year and Month.

    Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping".  I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.

    "Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.

    So, what I recommend is posting one of the queries that is "slow because of FORMAT".  Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place.  What's the "wrong place"?  Anything (like aggregations) that should be done purely in the "data layer".  Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL.  While I agree with the premise, I don't agree with the absolute nature of that recommendation.

    So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.

    Hi,

    Thanks for all this information. I really appreciate and found that helpful.

    here is an (simplified) example below

    SELECT tab1.a      
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t

    FROM table1 tab1
    RIGHT JOIN table2 tab2
    ON tab1.a = tab2.c
    INNER JOIN table3 tab3
    ON tab1.b= tab3.e

    WHERE tab1.g >= '2012-04-01' -- or '2012-04'

    and I have to convert them all into date format in PowerBI

    What happened to "I want to discard the days" in that query, which includes the days?

    I agree... FORMAT is definitely NOT the think to use in the presence if if the many rows you stated.  If you actually do want the days as the code seems to indicate, then do like the others have suggest... change the FORMAT.  Before you do that, though, please run the following code and let us know what the output is.  In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.

     SELECT  LangName = @@LANGUAGE
    ,DateFormat = CONVERT(DATE,GETDATE())
    ;

    The other thing is, where is the output of your query being directed to?  If it has millions of rows and it's being directed to ANY screen output, there's about 80% of your performance issue.  We can help with the FORMAT problem (especially if you do what I asked above) but returning millions of rows to a screen will always be a slow evolution.

    If to a table, we can help there.  Or is this query going to be executed by Power BI?

    If you REALLY want some performance, the don't let Power BI have all the fun.  Do some of the aggregations you actually want in T-SQL.

    Hi,

    sorry for my late reply

    "What happened to "I want to discard the days" in that query, which includes the days?" -> using FORMAT, I dont need to write them down again but as I said the rsult will be string not DATE anymore so I had to convert them again to DATE format.

    code result -> LangName= us_english, DateFormat= 2022-06-27

    "If to a table, we can help there.  Or is this query going to be executed by Power BI?" -> both but PowerBI should just "receive" finished/transformed data

    And yes I might have millions of rows

  • Michael L John wrote:

    What is the data type of "tab1.g, h, i, j, k, l?

    If it is a datetime, then CONVERT(date, tab1.g) is all you might need, depending upon the regional setting of your servers

    Also, in the WHERE clause, you should not be using something like '2012-04'. Use the date for the first of the month, again assuming it's a datetime data type

    data type is datetime

  • Phil Parkin wrote:

    Zidar wrote:

    My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.

    FORMAT is not 'tricky', just awfully slow.

    Here is a code sample which gets to YYMM format in both character and integer forms. I don't think it's particularly tricky, either:

    DECLARE @Somedate DATETIME = GETDATE();

    SELECT YYMMChar = CONVERT(CHAR(4), @Somedate, 12)
    ,YYMMInt = (YEAR(@Somedate) % 100) * 100 + MONTH(@Somedate);

    interesting but the results are 2206 and not 202206

    • This reply was modified 2 years, 5 months ago by  TheNewbie.

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply