June 22, 2022 at 7:52 am
The FORMAT function return type is NVARCHAR , which is a string.
https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver16
You can get the same result by replacing FORMAT with
CONVERT(varchar(10), [YourDateField], 120)
June 22, 2022 at 8:22 am
Syntax like FORMAT(tab1.g, 'yyyy-MM-dd')
suggests that you are "rounding" a date and time value to just the date; you would therefore be much better off just casting/converting the value to a date
, as a date
has no time portion: CONVERT(date, tab1.g)
.
Those JOIN
s seem odd too. You RIGHT JOIN
to table2
but then INNER JOIN
to table3
from table1
. This means that all the joins are implicit INNER JOIN
s. Especially when you then add the requirement in the WHERE
that tab1.g
must have a non-NULL
value.
This is what you have at the moment:
FROM table1 tab1
RIGHT JOIN table2 tab2 ON tab1.a = tab2.c
INNER JOIN table3 tab3 ON tab1.b = tab3.e
WHERE tab1.g >= '2012-04-01';
If we rewrite it as a LEFT JOIN
this might be a bit more obvious:
FROM table2 tab2
LEFT JOIN table1 tab1 ON tab2.c = tab1.A
INNER JOIN table3 tab3 ON tab1.b = tab3.E
WHERE tab1.g >= '2012-04-01'
So you "start" at table2
and then if there isn't a matching row in table1
want the row from table2
to still be returned. You then have both an INNER JOIN
that references tab1
and your WHERE
does too, without handling NULL
values. This means that both tab1.b
and tab1.g
must have a non-NULL
value, which is impossible if not related row in table1
was found (as the columns will have the value NULL
).
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
June 22, 2022 at 5:11 pm
The FORMAT function return type is NVARCHAR , which is a string.
https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver16
You can get the same result by replacing FORMAT with
CONVERT(varchar(10), [YourDateField], 120)
Again, though, if you look again at the Op's original post near the end, you'll need to change the VARCHAR(10) to VARCHAR(7). Since it will ALWAYS return just 7 characters, CHAR(7) seems like it might be the better choice, though.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 22, 2022 at 5:28 pm
Jeff Moden wrote:TheNewbie wrote:Hi all,
I have been using FORMAT()
FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',but been told it would be slow when it comes to BIG QUERIES.
How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)
Thanks
As the others have said, there is no temporal data type to hold just the Year and Month.
Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping". I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.
"Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.
So, what I recommend is posting one of the queries that is "slow because of FORMAT". Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place. What's the "wrong place"? Anything (like aggregations) that should be done purely in the "data layer". Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL. While I agree with the premise, I don't agree with the absolute nature of that recommendation.
So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.
Hi,
Thanks for all this information. I really appreciate and found that helpful.
here is an (simplified) example below
SELECT tab1.a
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM table1 tab1
RIGHT JOIN table2 tab2
ON tab1.a = tab2.c
INNER JOIN table3 tab3
ON tab1.b= tab3.e
WHERE tab1.g >= '2012-04-01' -- or '2012-04'and I have to convert them all into date format in PowerBI
What happened to "I want to discard the days" in that query, which includes the days?
I agree... FORMAT is definitely NOT the think to use in the presence if if the many rows you stated. If you actually do want the days as the code seems to indicate, then do like the others have suggest... change the FORMAT. Before you do that, though, please run the following code and let us know what the output is. In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.
SELECT LangName = @@LANGUAGE
,DateFormat = CONVERT(DATE,GETDATE())
;
The other thing is, where is the output of your query being directed to? If it has millions of rows and it's being directed to ANY screen output, there's about 80% of your performance issue. We can help with the FORMAT problem (especially if you do what I asked above) but returning millions of rows to a screen will always be a slow evolution.
If to a table, we can help there. Or is this query going to be executed by Power BI?
If you REALLY want some performance, the don't let Power BI have all the fun. Do some of the aggregations you actually want in T-SQL.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 22, 2022 at 6:43 pm
What is the data type of "tab1.g, h, i, j, k, l?
If it is a datetime, then CONVERT(date, tab1.g) is all you might need, depending upon the regional setting of your servers
Also, in the WHERE clause, you should not be using something like '2012-04'. Use the date for the first of the month, again assuming it's a datetime data type
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
June 22, 2022 at 8:08 pm
Like I said...
In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.
SELECT LangName = @@LANGUAGE
,DateFormat = CONVERT(DATE,GETDATE())
--Jeff Moden
Change is inevitable... Change for the better is not.
June 23, 2022 at 10:43 pm
Since Tally Table of dates was mentioned, here is how to construct it:
; WITH TallyDates
AS (SELECT
t.N
, TallyDate = Cast(Dateadd(dd, t.N, '1900-01-01') AS DATE)
FROM dbo.Fntally(0, 73550) t)
SELECT
SeqNumber = TD.N
, TallyDAte
, Year(TAllyDate) AS [Year]
, Month(TallyDAte) AS [Month]
, Day(tallydate) AS [DAy]
, DateName (dw,TallyDAte) AS WekDay
FROM TallyDates AS TD
WHERE TallyDate BETWEEN '19500101' AND '21001231'
;
Table of dates simplifies date calculations big time.
🙂
Zidar's Theorem: The best code is no code at all...
June 24, 2022 at 1:17 am
Since Tally Table of dates was mentioned, here is how to construct it:
- create fnTally - table of numbers, by Jeff Moden, here https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
- use the function to generate table of dates
; WITH TallyDates
AS (SELECT
t.N
, TallyDate = Cast(Dateadd(dd, t.N, '1900-01-01') AS DATE)
FROM dbo.Fntally(0, 73550) t)
SELECT
SeqNumber = TD.N
, TallyDAte
, Year(TAllyDate) AS [Year]
, Month(TallyDAte) AS [Month]
, Day(tallydate) AS [DAy]
, DateName (dw,TallyDAte) AS WekDay
FROM TallyDates AS TD
WHERE TallyDate BETWEEN '19500101' AND '21001231'
;Table of dates simplifies date calculations big time.
🙂
Sorry, but I'm not seeing it. How would such a thing be used to improve the performance of the query the OP posted?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 26, 2022 at 5:48 pm
Q: Sorry, but I'm not seeing it. How would such a thing be used to improve the performance of the query the OP posted?
Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".
For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks. My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.
I assumed that something like this
SELECT <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
FROM SourceTAble
JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDate
would at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.
Zidar's Theorem: The best code is no code at all...
June 26, 2022 at 6:24 pm
My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.
FORMAT is not 'tricky', just awfully slow.
Here is a code sample which gets to YYMM format in both character and integer forms. I don't think it's particularly tricky, either:
DECLARE @Somedate DATETIME = GETDATE();
SELECT YYMMChar = CONVERT(CHAR(4), @Somedate, 12)
,YYMMInt = (YEAR(@Somedate) % 100) * 100 + MONTH(@Somedate);
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 26, 2022 at 9:23 pm
Q: Sorry, but I'm not seeing it. How would such a thing be used to improve the performance of the query the OP posted?
Valid point, chances are good it does not help with performance. To my defense, I was not trying to help with performance. I offered a solution to help with formatting date to "YYMM".
For me there is not enough information about actual circumstances to provide performance targeted advice. I myself am not very good in playing with execution plans. A few people here are very good at that, so I leave those things for more knowledgeable and experienced folks. My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.
I assumed that something like this
SELECT <Whatever columns from Sourcetable>, TableOfDates.Year, TableOfDates.Month
FROM SourceTAble
JOIN TableOfDates ON SourceTable.DateCoumn = TableOfDates.TallyDatewould at least solve the problem of formatting. Nothing more than that. Requester may have mentioned problem with speed of execution, but I spotted another one, just as annoying as a slow query.
Totally understood on all of that and thanks for trying to help.
Even the original post for the OP talked about slowness but that kind of got lost in the "need" for formatting as Leaving a year/month only date field (2020-11) and then also saying they wanted "Leaving a year/month only date field (2020-11)" .
Eventually, they coughed up an example query at the following URL.
That made me think (and I still think it) we're chasing the "wong wabbit" (as Elmer Fudd would say). 😀 The Op needs whole dates and FORMAT is one of the worst ways in the world to do such a thing but the Op also mention millions of rows in the source table and the query they posted could have issues with indexes to support the joins and the mountain of columns to be returned, etc, to be used for "BI" of one form or another. I think the OP used FORMAT because they didn't know of any other way to convert a date/time to a whole date any other way.
The correct answer is that no formatting is required. Just use CONVERT(DATE,somecolumn). There's nothing faster for such a thing but all of that appears to have gotten lost because the Op made it sound like some sort of character based representation was necessary.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 26, 2022 at 9:40 pm
here is an (simplified) example below
SELECT tab1.a
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM table1 tab1
RIGHT JOIN table2 tab2
ON tab1.a = tab2.c
INNER JOIN table3 tab3
ON tab1.b= tab3.e
WHERE tab1.g >= '2012-04-01' -- or '2012-04'and I have to convert them all into date format in PowerBI
You never came back with an answer to my question but let's forget that for now. Here's what I believe your first step should be according to what you posted above.
SELECT tab1.a,
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
Date_tab1g = CONVERT(DATE,tab1.g), --Of course, you need to change the alias column name
Date_tab1h = CONVERT(DATE,tab1.h), --Of course, you need to change the alias column name
Date_tab1i = CONVERT(DATE,tab1.i), --Of course, you need to change the alias column name
Date_tab2j = CONVERT(DATE,tab2.j), --Of course, you need to change the alias column name
Date_tab2k = CONVERT(DATE,tab2.k), --Of course, you need to change the alias column name
Date_tab2l = CONVERT(DATE,tab2.l), --Of course, you need to change the alias column name
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM dbo.table1 tab1 --Note the use of the 2 part naming convention.
RIGHT JOIN dbo.table2 tab2 ON tab1.a = tab2.c --Note the use of the 2 part naming convention.
INNER JOIN dbo.table3 tab3 ON tab1.b = tab3.e --Note the use of the 2 part naming convention.
WHERE tab1.g >= '2012-04-01' -- or '2012-04'
;
The real issues are...
If you need 10 years of data for analysis/reporting, does it REALLY need to be at the single row level of detail? Surely, they can tell you what aggregates they need to analyze and do that pre-aggregation for them. That would save a whole lot on how much data you have to dump somewhere else for them to use AND it would seriously speed up their analysis/reporting efforts.
Another concern is the indexing to support the query. Have you done any analysis there to see if you're scanning 10 years of all the data in a Clustered Index or maybe the whole of the 3 tables because the optimizer threw up its hands because it recognized that it was going to have to do individual row lookups even if non-clustered indexes that supported the criteria for both the WHERE and the JOINs existed?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 27, 2022 at 7:06 am
TheNewbie wrote:Jeff Moden wrote:TheNewbie wrote:Hi all,
I have been using FORMAT()
FORMAT(Tab.date, 'yyyy-MM-dd') AS 'date_begin',but been told it would be slow when it comes to BIG QUERIES.
How can I still want the result to be a date (NOT STRING) But I want to 'discard' the Days, Hours, etc (2020-11-01 00:00:00.0000000) Leaving a year/month only date field (2020-11)
Thanks
As the others have said, there is no temporal data type to hold just the Year and Month.
Considering the "big queries" you speak of, if you're trying to do aggregates by year and month, the usual recommendation is to reduce each date down to being the first of the month (year included) as a DATE datatype for the given date of each row OR do a "Range Grouping". I've even made separate columns on some tables that keep both the original date and the first of the month date if the reporting requirements demand such an optimization.
"Formatted" dates should never be used to form such aggregations because of all the bloody implicit casts (performance) and the fact that it will certainly make the criteria for you queries "Non-SARGable" resulting in full scans all the time.
So, what I recommend is posting one of the queries that is "slow because of FORMAT". Yes, FORMAT absolutely sucks for performance but so does converting things to the character based YYYY-MM format using ANY method if you use that method in the wrong place. What's the "wrong place"? Anything (like aggregations) that should be done purely in the "data layer". Only the results should be formatted and there are many that will tell you that shouldn't EVER be done in T-SQL. While I agree with the premise, I don't agree with the absolute nature of that recommendation.
So let's see one of the queries where you used FORMAT before and let's see if we can show you a way to "double-prong" the performance using techniques like "pre-aggregation" to do the calculations and then format them for the output you're directed to provide.
Hi,
Thanks for all this information. I really appreciate and found that helpful.
here is an (simplified) example below
SELECT tab1.a
tab1.b,
tab2.c,
tab3.d,
tab3.e,
tab1.f,
FORMAT(tab1.g, 'yyyy-MM-dd'), -- date
FORMAT(tab1.h, 'yyyy-MM-dd'), -- date
FORMAT(tab1.i, 'yyyy-MM-dd'), -- date
FORMAT(tab2.j, 'yyyy-MM-dd'), -- date
FORMAT(tab2.k, 'yyyy-MM-dd'), -- date
FORMAT(tab2.l, 'yyyy-MM-dd'), -- date
tab1.m,
tab1.n,
tab1.o,
tab1.p,
tab1.q,
tab1.r,
tab1.s,
tab1.t
FROM table1 tab1
RIGHT JOIN table2 tab2
ON tab1.a = tab2.c
INNER JOIN table3 tab3
ON tab1.b= tab3.e
WHERE tab1.g >= '2012-04-01' -- or '2012-04'and I have to convert them all into date format in PowerBI
What happened to "I want to discard the days" in that query, which includes the days?
I agree... FORMAT is definitely NOT the think to use in the presence if if the many rows you stated. If you actually do want the days as the code seems to indicate, then do like the others have suggest... change the FORMAT. Before you do that, though, please run the following code and let us know what the output is. In fact, this may actually contain a much better answer especially since the target consumer will actually be Power BI.
SELECT LangName = @@LANGUAGE
,DateFormat = CONVERT(DATE,GETDATE())
;The other thing is, where is the output of your query being directed to? If it has millions of rows and it's being directed to ANY screen output, there's about 80% of your performance issue. We can help with the FORMAT problem (especially if you do what I asked above) but returning millions of rows to a screen will always be a slow evolution.
If to a table, we can help there. Or is this query going to be executed by Power BI?
If you REALLY want some performance, the don't let Power BI have all the fun. Do some of the aggregations you actually want in T-SQL.
Hi,
sorry for my late reply
"What happened to "I want to discard the days" in that query, which includes the days?" -> using FORMAT, I dont need to write them down again but as I said the rsult will be string not DATE anymore so I had to convert them again to DATE format.
code result -> LangName= us_english, DateFormat= 2022-06-27
"If to a table, we can help there. Or is this query going to be executed by Power BI?" -> both but PowerBI should just "receive" finished/transformed data
And yes I might have millions of rows
June 27, 2022 at 7:14 am
What is the data type of "tab1.g, h, i, j, k, l?
If it is a datetime, then CONVERT(date, tab1.g) is all you might need, depending upon the regional setting of your servers
Also, in the WHERE clause, you should not be using something like '2012-04'. Use the date for the first of the month, again assuming it's a datetime data type
data type is datetime
June 27, 2022 at 7:17 am
Zidar wrote:My goal was to help present dates in form "YYMM". Existing functions such as Convert and Format apparently are tricky to use to achieve requested format.
FORMAT is not 'tricky', just awfully slow.
Here is a code sample which gets to YYMM format in both character and integer forms. I don't think it's particularly tricky, either:
DECLARE @Somedate DATETIME = GETDATE();
SELECT YYMMChar = CONVERT(CHAR(4), @Somedate, 12)
,YYMMInt = (YEAR(@Somedate) % 100) * 100 + MONTH(@Somedate);
interesting but the results are 2206 and not 202206
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply