June 27, 2022 at 7:27 am
Zidar wrote:Q: Sorry, but I'm not seeing it. How would such a thing be used to improve the performance of the query the OP posted?
Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".
For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks. My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.
I assumed that something like this
SELECT <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
FROM SourceTAble
JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDatewould at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.
Totally understood on all of that and thanks for trying to help.
Even the original post for the OP talked about slowness but that kind of got lost in the "need" for formatting as Leaving a year/month only date field (2020-11) and then also saying they wanted "Leaving a year/month only date field (2020-11)" .
Eventually, they coughed up an example query at the following URL.
That made me think (and I still think it) we're chasing the "wong wabbit" (as Elmer Fudd would say). 😀 The Op needs whole dates and FORMAT is one of the worst ways in the world to do such a thing but the Op also mention millions of rows in the source table and the query they posted could have issues with indexes to support the joins and the mountain of columns to be returned, etc, to be used for "BI" of one form or another. I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way.
The correct answer is that no formatting is required. Just use CONVERT(DATE,somecolumn). There's nothing faster for such a thing but all of that appears to have gotten lost because the Op made it sound like some sort of character based representation was necessary.
Thanks for this resume 🙂
you are right when saying
"I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way."
but I mentionned the performance because I was using FORMAT AND the alternative I was looking for was to still have a date type but with the possibility to split it into 2022-06 or 2022-06-01 (without hours etc)
"The correct answer is that no formatting is required. Just use CONVERT(DATE,somecolumn)." -> OK
June 27, 2022 at 7:48 am
TheNewbie wrote:here is an (simplified) example below
SELECT tab1.a
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM table1 tab1
RIGHT JOIN table2 tab2
ON tab1.a = tab2.c
INNER JOIN table3 tab3
ON tab1.b= tab3.e
WHERE tab1.g >= '2012-04-01' -- or '2012-04'and I have to convert them all into date format in PowerBI
You never came back with an answer to my question but let's forget that for now. Here's what I believe your first step should be according to what you posted above.
SELECT tab1.a,
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
Date_tab1g = CONVERT(DATE,tab1.g), --Of course, you need to change the alias column name
Date_tab1h = CONVERT(DATE,tab1.h), --Of course, you need to change the alias column name
Date_tab1i = CONVERT(DATE,tab1.i), --Of course, you need to change the alias column name
Date_tab2j = CONVERT(DATE,tab2.j), --Of course, you need to change the alias column name
Date_tab2k = CONVERT(DATE,tab2.k), --Of course, you need to change the alias column name
Date_tab2l = CONVERT(DATE,tab2.l), --Of course, you need to change the alias column name
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM dbo.table1 tab1 --Note the use of the 2 part naming convention.
RIGHT JOIN dbo.table2 tab2 ON tab1.a = tab2.c --Note the use of the 2 part naming convention.
INNER JOIN dbo.table3 tab3 ON tab1.b = tab3.e --Note the use of the 2 part naming convention.
WHERE tab1.g >= '2012-04-01' -- or '2012-04'
;The real issues are...
If you need 10 years of data for analysis/reporting, does it REALLY need to be at the single row level of detail? Surely, they can tell you what aggregates they need to analyze and do that pre-aggregation for them. That would save a whole lot on how much data you have to dump somewhere else for them to use AND it would seriously speed up their analysis/reporting efforts.
Another concern is the indexing to support the query. Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables because the optimizer threw up its hands because it recognized that it was going to have to do individual row lookups even if non-clustered indexes that supported the criteria for both the WHERE and the JOINs existed?
"does it REALLY need to be at the single row level of detail?" not necessarily
"they can tell you what aggregates they need to analyze and do that pre-aggregation for them" of course
"Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables" yes sir and its the whole of 3 tables/views
June 27, 2022 at 8:00 am
Syntax like
FORMAT(tab1.g, 'yyyy-MM-dd')
suggests that you are "rounding" a date and time value to just the date; you would therefore be much better off just casting/converting the value to adate
, as adate
has no time portion:CONVERT(date, tab1.g)
.Those
JOIN
s seem odd too. YouRIGHT JOIN
totable2
but thenINNER JOIN
totable3
fromtable1
. This means that all the joins are implicitINNER JOIN
s. Especially when you then add the requirement in theWHERE
thattab1.g
must have a non-NULL
value.This is what you have at the moment:
FROM table1 tab1
RIGHT JOIN table2 tab2 ON tab1.a = tab2.c
INNER JOIN table3 tab3 ON tab1.b = tab3.e
WHERE tab1.g >= '2012-04-01';If we rewrite it as a
LEFT JOIN
this might be a bit more obvious:FROM table2 tab2
LEFT JOIN table1 tab1 ON tab2.c = tab1.A
INNER JOIN table3 tab3 ON tab1.b = tab3.E
WHERE tab1.g >= '2012-04-01'So you "start" at
table2
and then if there isn't a matching row intable1
want the row fromtable2
to still be returned. You then have both anINNER JOIN
that referencestab1
and yourWHERE
does too, without handlingNULL
values. This means that bothtab1.b
andtab1.g
must have a non-NULL
value, which is impossible if not related row intable1
was found (as the columns will have the valueNULL
).
Thanks for this explanation
June 27, 2022 at 12:59 pm
"does it REALLY need to be at the single row level of detail?" not necessarily
"they can tell you what aggregates they need to analyze and do that pre-aggregation for them" of course
"Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables" yes sir and its the whole of 3 tables/views
On the first 2 parts, those were, of course, suggestions only because you said you were a newbie.
On that last part, someone needs to do some analysis of the code (like the DBA's that correctly spoke of FORMAT as being bad) and add some indexes.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 31 through 33 (of 33 total)
You must be logged in to reply to this topic. Login to reply