explicite alternative to FORMAT() concerning DATE

  • Jeff Moden wrote:

    Zidar wrote:

    Q: Sorry, but I'm not seeing it.  How would such a thing be used to improve the performance of the query the OP posted?

    Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".

    For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks.  My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.

    I assumed that something like this

    SELECT  <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
    FROM SourceTAble
    JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDate

    would at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.

    Totally understood on all of that and thanks for trying to help.

    Even the original post for the OP talked about slowness but that kind of got lost in the "need" for formatting as Leaving a year/month only date field (2020-11) and then also saying they wanted "Leaving a year/month only date field (2020-11)" .

    Eventually, they coughed up an example query at the following URL.

    https://www.sqlservercentral.com/forums/topic/explicite-alternative-to-format-concerning-date#post-4053128

    That made me think (and I still think it) we're chasing the "wong wabbit" (as Elmer Fudd would say). 😀  The Op needs whole dates and FORMAT is one of the worst ways in the world to do such a thing but the Op also mention millions of rows in the source table and the query they posted could have issues with indexes to support the joins and the mountain of columns to be returned, etc, to be used for "BI" of one form or another.  I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way.

    The correct answer is that no formatting is required.  Just use CONVERT(DATE,somecolumn).  There's nothing faster for such a thing but all of that appears to have gotten lost because the Op made it sound like some sort of character based representation was necessary.

    Thanks for this resume 🙂

    you are right  when saying

    "I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way."

    but I mentionned the performance because I was using FORMAT AND the alternative I was looking for was to still have a date type but with the possibility to split it into 2022-06 or 2022-06-01 (without hours etc)

    "The correct answer is that no formatting is required.  Just use CONVERT(DATE,somecolumn)." -> OK

  • Jeff Moden wrote:

    TheNewbie wrote:

    here is an (simplified) example below

    SELECT tab1.a      
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
    FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
    FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t

    FROM table1 tab1
    RIGHT JOIN table2 tab2
    ON tab1.a = tab2.c
    INNER JOIN table3 tab3
    ON tab1.b= tab3.e

    WHERE tab1.g >= '2012-04-01' -- or '2012-04'

    and I have to convert them all into date format in PowerBI

    You never came back with an answer to my question but let's forget that for now.  Here's what I believe your first step should be according to what you posted above.

     SELECT tab1.a,
    tab1.b,
    tab2.c,
    tab3.d,
    tab3.e,
    tab1.f,
    Date_tab1g = CONVERT(DATE,tab1.g), --Of course, you need to change the alias column name
    Date_tab1h = CONVERT(DATE,tab1.h), --Of course, you need to change the alias column name
    Date_tab1i = CONVERT(DATE,tab1.i), --Of course, you need to change the alias column name
    Date_tab2j = CONVERT(DATE,tab2.j), --Of course, you need to change the alias column name
    Date_tab2k = CONVERT(DATE,tab2.k), --Of course, you need to change the alias column name
    Date_tab2l = CONVERT(DATE,tab2.l), --Of course, you need to change the alias column name
    tab1.m,
    tab1.n,
    tab1.o,
    tab1.p,
    tab1.q,
    tab1.r,
    tab1.s,
    tab1.t
    FROM dbo.table1 tab1 --Note the use of the 2 part naming convention.
    RIGHT JOIN dbo.table2 tab2 ON tab1.a = tab2.c --Note the use of the 2 part naming convention.
    INNER JOIN dbo.table3 tab3 ON tab1.b = tab3.e --Note the use of the 2 part naming convention.
    WHERE tab1.g >= '2012-04-01' -- or '2012-04'
    ;

    The real issues are...

    If you need 10 years of data for analysis/reporting, does it REALLY need to be at the single row level of detail?  Surely, they can tell you what aggregates they need to analyze and do that pre-aggregation for them.  That would save a  whole lot on how much data you have to dump somewhere else for them to use AND it would seriously speed up their analysis/reporting efforts.

    Another concern is the indexing to support the query.  Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables because the optimizer threw up its hands because it recognized that it was going to have to do individual row lookups even if non-clustered indexes that supported the criteria for both the WHERE and the JOINs existed?

    "does it REALLY need to be at the single row level of detail?" not necessarily

    "they can tell you what aggregates they need to analyze and do that pre-aggregation for them" of course

    "Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables" yes sir and its the whole of 3 tables/views

  • Thom A wrote:

    Syntax like FORMAT(tab1.g, 'yyyy-MM-dd') suggests that you are "rounding" a date and time value to just the date; you would therefore be much better off just casting/converting the value to a date, as a date has no time portion: CONVERT(date, tab1.g).

    Those JOINs seem odd too. You RIGHT JOIN to table2 but then INNER JOIN to table3 from table1. This means that all the joins are implicit INNER JOINs. Especially when you then add the requirement in the WHERE that tab1.g must have a non-NULL value.

    This is what you have at the moment:

    FROM table1 tab1
    RIGHT JOIN table2 tab2 ON tab1.a = tab2.c
    INNER JOIN table3 tab3 ON tab1.b = tab3.e
    WHERE tab1.g >= '2012-04-01';

    If we rewrite it as a LEFT JOIN this might be a bit more obvious:

    FROM table2 tab2
    LEFT JOIN table1 tab1 ON tab2.c = tab1.A
    INNER JOIN table3 tab3 ON tab1.b = tab3.E
    WHERE tab1.g >= '2012-04-01'

    So you "start" at table2 and then if there isn't a matching row in table1 want the row from table2 to still be returned. You then have both an INNER JOIN that references tab1 and your WHERE does too, without handling NULL values. This means that both tab1.b and tab1.g must have a non-NULL value, which is impossible if not related row in table1 was found (as the columns will have the value NULL).

    Thanks for this explanation

  • TheNewbie wrote:

    "does it REALLY need to be at the single row level of detail?" not necessarily

    "they can tell you what aggregates they need to analyze and do that pre-aggregation for them" of course

    "Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables" yes sir and its the whole of 3 tables/views

    On the first 2 parts, those were, of course, suggestions only because you said you were a newbie.

    On that last part, someone needs to do some analysis of the code (like the DBA's that correctly spoke of FORMAT as being bad) and add some indexes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 31 through 33 (of 33 total)

You must be logged in to reply to this topic. Login to reply