May 11, 2016 at 11:58 am
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
May 11, 2016 at 12:51 pm
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 11, 2016 at 1:04 pm
Now that I'm at a computer....
Not knocking Luis' solution but calendar tables are perfect for this kind of thing as they are highly reusable and even more simple than a tally table (for those that find tally tables overly complex 😉 )
Here's a simplified one indexed for this kind of requirement:
USE tempdb
GO
-- sample calendar table (with only the required columns for this requirement)
-- a datekey is the prefered cluster key but I'm keeping it as simple as possible
IF OBJECT_ID('tempdb..#dim_date') IS NOT NULL DROP TABLE #dim_date;
CREATE TABLE #dim_date
(
calDate date primary key,
calDay tinyint not null
);
INSERT #dim_date
SELECT TOP (800)
CAST(DATEADD(DAY,number,'20150101') AS date),
DAY(DATEADD(DAY,number,'20150101'))
FROM master..spt_values
WHERE type = 'p';
-- filtered index for first of the month
CREATE NONCLUSTERED INDEX nc_FirstOfMonth ON #dim_date(calDate,calDay)
WHERE calDay = 1;
With a calendar table in place it's this easy...
-- a scalable parameterized solution resolved using a seek against a nonclustered index
DECLARE
@months tinyint = 12, -- how many months back would you like?
@inputdate date = '20160510'; -- whats the date to start with
SELECT TOP (@months) calDate
FROM #dim_date
WHERE calDay = 1 AND calDate <= @inputdate
ORDER BY calDate DESC;
-- Itzik Ben-Gan 2001
May 11, 2016 at 1:45 pm
TheSQLGuru (5/11/2016)
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
declare @n int = 3;
select top (@n) cast(convert( char(7), dateadd(month,(-row_number()
over (order by msglangid, error))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
order by msglangid, error;
2016-05-01
2016-04-01
2016-03-01
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
May 11, 2016 at 2:10 pm
Alan.B: a calendar table is a poor match here, and would perform significantly worse as well. What if it didn't have the proper date range available? What if it DID have a sufficient date range (i.e. LOTs of rows to cover all ranges --> even worse performance)? If it is by day, you would be touching N days per month just to get out the individual month entries the OP asked for and having to do something to wheedle them down to distinct months, etc.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 11, 2016 at 2:20 pm
Eric M Russell (5/11/2016)
TheSQLGuru (5/11/2016)
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
declare @n int = 3;
select top (@n) cast(convert( char(7), dateadd(month,(-row_number()
over (order by msglangid, error))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
order by msglangid, error;
2016-05-01
2016-04-01
2016-03-01
Actually, an inline table-valued function based on a table value constructor (like in Luis' script above) is blazing fast. Much faster than any physical table, because it requires no reads (not even logical reads).
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 11, 2016 at 2:23 pm
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
I would confidently argue the exact opposite. Ranking functions are built into T-sql and are thus not a hack. A Tally table would require custom creation on every sql server you choose to use it on and would be a dependency in whatever SELECT you're doing. So far in 26 years I haven't found a situation in which to use a Tally and suspect I never will.
Windowing functions are blurring the purity of SET processing already; to ask Microsoft to build in a Tally table would be crossing the line into having them creating end-user solutions instead of creating a pure development environment.
May 11, 2016 at 2:40 pm
Bill Talada (5/11/2016)
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
I would confidently argue the exact opposite. Ranking functions are built into T-sql and are thus not a hack. A Tally table would require custom creation on every sql server you choose to use it on and would be a dependency in whatever SELECT you're doing. So far in 26 years I haven't found a situation in which to use a Tally and suspect I never will.
Windowing functions are blurring the purity of SET processing already; to ask Microsoft to build in a Tally table would be crossing the line into having them creating end-user solutions instead of creating a pure development environment.
Although I agree that this kind of feature is something you are not likely to see packaged with SQL I don't believe it should be discounted. In certain situations a Tally table works really well and I for one am glad to have learned the technique.
May 11, 2016 at 2:42 pm
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
TheSQLGuru (5/11/2016)
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
declare @n int = 3;
select top (@n) cast(convert( char(7), dateadd(month,(-row_number()
over (order by msglangid, error))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
order by msglangid, error;
2016-05-01
2016-04-01
2016-03-01
Actually, an inline table-valued function based on a table value constructor (like in Luis' script above) is blazing fast. Much faster than any physical table, because it requires no reads (not even logical reads).
Drew
Performance, Purity, Simplicity, Flexibility: It looks like we've collectively presented the OP with a handful of approaches to choose from.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
May 11, 2016 at 3:17 pm
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
TheSQLGuru (5/11/2016)
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
declare @n int = 3;
select top (@n) cast(convert( char(7), dateadd(month,(-row_number()
over (order by msglangid, error))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
order by msglangid, error;
2016-05-01
2016-04-01
2016-03-01
Actually, an inline table-valued function based on a table value constructor (like in Luis' script above) is blazing fast. Much faster than any physical table, because it requires no reads (not even logical reads).
Drew
That sounds good in theory but do you have a performance test you want to post. Calendar table vs in line tally for dates?
-- Itzik Ben-Gan 2001
May 11, 2016 at 3:59 pm
Alan.B (5/11/2016)
drew.allen (5/11/2016)
Actually, an inline table-valued function based on a table value constructor (like in Luis' script above) is blazing fast. Much faster than any physical table, because it requires no reads (not even logical reads).Drew
That sounds good in theory but do you have a performance test you want to post. Calendar table vs in line tally for dates?
Sure. Here is the code for the tally function:
CREATE FUNCTION dbo.Tally(@n INT)
RETURNS TABLE
AS
RETURN
WITH e(n) AS (
SELECT n
FROM ( VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1) ) t(n)
)
SELECT TOP(@n) ROW_NUMBER()OVER(ORDER BY ( SELECT NULL) ) n
FROM e a, e b, e c, e d, e e, e f;
Here is the code for the Tally table:
CREATE TABLE TallyT(
n INT NOT NULL PRIMARY KEY
)
INSERT TallyT(n)
SELECT *
FROM dbo.Tally(1000000)
Here is the code for the test procedure. The costs of calculating the dates will be the same regardless of source, so I'm just selecting the top n results and writing them to a temp table.
DECLARE @n INT = 1,
@start DATETIME2,
@fn_runtime_ns INT,
@tbl_runtime_ns INT;
DECLARE @results TABLE(
n int,
fn_runtime_ns INT,
tbl_runtime_ns INT);
WHILE @n < 1000001
BEGIN
SET @start = SYSDATETIME();
SELECT *
INTO #temp
FROM dbo.Tally(@n) t;
SET @fn_runtime_ns = DATEDIFF(ns, @start, SYSDATETIME());
DROP TABLE #temp;
SET @start = SYSDATETIME();
SELECT TOP(@n) *
INTO #temp2
FROM TallyT tt;
SET @tbl_runtime_ns = DATEDIFF(ns, @start, SYSDATETIME());
DROP TABLE #temp2;
INSERT @results(n, fn_runtime_ns, tbl_runtime_ns)
VALUES(@n, @fn_runtime_ns, @tbl_runtime_ns);
SET @n = @n * 10;
END
SELECT *
FROM @results r
And here are the results (of course much of the cost is writing the values to the temp table):
n function table
1 0 0
10 0 0
100 0 0
1000 0 0
10000 0 15,626,000
100000 31,251,900 46,877,400
1000000 312,518,800 531,628,900
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 11, 2016 at 5:03 pm
Bill Talada (5/11/2016)
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
... So far in 26 years I haven't found a situation in which to use a Tally and suspect I never will.
Windowing functions are blurring the purity of SET processing already...
Wow. Just wow.
For the first part, you are either exceptionally lucky in that you get to do simple stuff all the time (and get paid for it), have no clue about the power of a series from 1-n for efficient data processing or are exceptionally close-minded. I suppose for one of the most common use cases you use a CLR function to split a delimited string into a table, so I could spot you that one.
As for windowing functions and set processing, they are both part of the ANSI Standard and allow for often EXCEPTIONALLY EFFICIENT SET-BASED SOLUTIONS to a WIDE RANGE of data processing needs!!
Again I say wow ...
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 11, 2016 at 5:06 pm
Drew.Allen: if you want to better test server performance on a given query that does lots of rows SELECT, just put the column(s) into variable(s). Voila - no more timing temp table operations or spooling large numbers of rows to a client when that timing isn't germane to the test.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 11, 2016 at 8:07 pm
Eric M Russell (5/11/2016)
TheSQLGuru (5/11/2016)
Eric M Russell (5/11/2016)
drew.allen (5/11/2016)
Eric M Russell (5/11/2016)
For what it's worth, this one can be paramaterized in terms of the number of dates returned, and it doesn't require a pre-existing tally table, which might be useful in a scenario where you are querying a database for which you arn't the owner. However, leveraging a system table and ranking function to perform ad-hoc the job of a tally table is a performance hit and makes the code confusing to read.Which is why Luis' solution uses the table value constructor to create the sequence.
Drew
But there may be a need to paramaterize the number of dates returned. For example, it may not always be 12.
uh, WHERE n something??
Sure, the TOP clause accepts a parameter, just in case someone wants to make this a table valued function for returning 3, 12, or 300 dates. BTW, I do think that it would be best to have a tally table driving this rather than a system table / ranking hack. One thing for the SQL Server wish list would be a system provided tally table for just such a purpose, so it's available globally.
declare @n int = 3;
select top (@n) cast(convert( char(7), dateadd(month,(-row_number()
over (order by msglangid, error))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
order by msglangid, error;
2016-05-01
2016-04-01
2016-03-01
I would be careful here, the internal SYSERRORS table value function is producing 1/4 Million rows with a sort spilling into tempdb. Much better to skip / bypass the ordering of the row_number and only sort the output if needed.
😎
declare @ND int = 12;
;WITH BASE_DATA AS
(
select top (@ND) cast(convert( char(7), dateadd(month,(-row_number()
over (order by (select null)))+1, getdate() ), 121 )+'-01' as date) d
from master.sys.sysmessages
)
SELECT
BD.d
FROM BASE_DATA BD
order by BD.d;
May 11, 2016 at 8:34 pm
drew.allen (5/11/2016)
Alan.B (5/11/2016)
drew.allen (5/11/2016)
Actually, an inline table-valued function based on a table value constructor (like in Luis' script above) is blazing fast. Much faster than any physical table, because it requires no reads (not even logical reads).Drew
That sounds good in theory but do you have a performance test you want to post. Calendar table vs in line tally for dates?
Sure. Here is the code for the tally function:
CREATE FUNCTION dbo.Tally(@n INT)
RETURNS TABLE
AS
RETURN
WITH e(n) AS (
SELECT n
FROM ( VALUES(1), (1), (1), (1), (1), (1), (1), (1), (1), (1) ) t(n)
)
SELECT TOP(@n) ROW_NUMBER()OVER(ORDER BY ( SELECT NULL) ) n
FROM e a, e b, e c, e d, e e, e f;
Here is the code for the Tally table:
CREATE TABLE TallyT(
n INT NOT NULL PRIMARY KEY
)
INSERT TallyT(n)
SELECT *
FROM dbo.Tally(1000000)
Here is the code for the test procedure. The costs of calculating the dates will be the same regardless of source, so I'm just selecting the top n results and writing them to a temp table.
DECLARE @n INT = 1,
@start DATETIME2,
@fn_runtime_ns INT,
@tbl_runtime_ns INT;
DECLARE @results TABLE(
n int,
fn_runtime_ns INT,
tbl_runtime_ns INT);
WHILE @n < 1000001
BEGIN
SET @start = SYSDATETIME();
SELECT *
INTO #temp
FROM dbo.Tally(@n) t;
SET @fn_runtime_ns = DATEDIFF(ns, @start, SYSDATETIME());
DROP TABLE #temp;
SET @start = SYSDATETIME();
SELECT TOP(@n) *
INTO #temp2
FROM TallyT tt;
SET @tbl_runtime_ns = DATEDIFF(ns, @start, SYSDATETIME());
DROP TABLE #temp2;
INSERT @results(n, fn_runtime_ns, tbl_runtime_ns)
VALUES(@n, @fn_runtime_ns, @tbl_runtime_ns);
SET @n = @n * 10;
END
SELECT *
FROM @results r
And here are the results (of course much of the cost is writing the values to the temp table):
n function table
1 0 0
10 0 0
100 0 0
1000 0 0
10000 0 15,626,000
100000 31,251,900 46,877,400
1000000 312,518,800 531,628,900
Drew
First, if you need a tally table function feel free to use mine[/url]. It will perform a little better than what you posted when used correctly (note the functions comment section).
Next, what I said was:
...do you have a performance test you want to post. Calendar table vs inline tally for dates?
I think all the tally table fans here already know that a cte tally will outperform a permanent tally table. (That said, always include a unique nonclustered index on your permanent tally table).
Okay, now for a test; note my comments.
/****************************************************************************************
1. Create and populate sample calendar table date range 2010 to 2017 (simplified)
****************************************************************************************/;
IF OBJECT_ID('tempdb.dbo.dim_date') IS NOT NULL DROP TABLE dbo.dim_date;
CREATE TABLE dbo.dim_date
(
calDate date primary key,
calDay tinyint not null
);
INSERT dbo.dim_date
SELECT TOP (2557) -- enough days to get me from 2010 to 2017
CAST(DATEADD(DAY,number,'20100101') AS date),
DAY(DATEADD(DAY,number,'20100101'))
FROM master..spt_values
WHERE type = 'p';
-- filtered covering index for first of the month
CREATE NONCLUSTERED INDEX nc_FirstOfMonth ON dbo.dim_date(calDate,calDay)
WHERE calDay = 1;
GO
/****************************************************************************************
2. Create the functions
****************************************************************************************/;
-- Function #1: Inline Tally table solution
CREATE FUNCTION dbo.LastNMonth_Luis (@inputDate date)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT Mo = DATEADD(MM, DATEDIFF(MM, 0, @inputDate)-n, 0)
FROM (VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11))x(n);
GO
-- Function #2: Same functionality using a calendar table
CREATE FUNCTION dbo.LastNMonth_Alan (@inputDate date)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT TOP (12) calDate AS mo
FROM dbo.dim_date
WHERE calDay = 1 AND calDate <= @inputdate
ORDER BY calDate DESC;
GO
--SELECT * FROM dbo.LastNMonth_Luis(getdate());
--SELECT * FROM dbo.LastNMonth_Alan(getdate());
;
/****************************************************************************************
3. Populate #tmp with sample data (and ID and a date)
****************************************************************************************/;
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp;
CREATE TABLE #tmp
(
id int primary key,
SomeDate date not null
);
INSERT #tmp
SELECT TOP (100000)
DateID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
SomeDate = CAST(DATEADD(DAY,CHECKSUM(newid())%500,'20130702') AS date)
FROM sys.all_columns a, sys.all_columns b;
GO
/****************************************************************************************
4. Performance test - run the function for each ID
****************************************************************************************/;
PRINT 'inline tally:'+char(10)+REPLICATE('-',50)
GO
DECLARE @id int, @mo date, @st datetime = getdate();
SELECT @id = id, @mo = mo
FROM #tmp
CROSS APPLY dbo.LastNMonth_Luis(SomeDate);
PRINT DATEDIFF(MS,@st,getdate());
GO 5
PRINT char(10)+'Datedim:'+char(10)+REPLICATE('-',50)
GO
DECLARE @id int, @mo date, @st datetime = getdate();
SELECT @id = id, @mo = mo
FROM #tmp
CROSS APPLY dbo.LastNMonth_Alan(SomeDate);
PRINT DATEDIFF(MS,@st,getdate());
GO 5
inline tally:
--------------------------------------------------
Beginning execution loop
480
410
420
413
403
Batch execution completed 5 times.
Datedim:
--------------------------------------------------
Beginning execution loop
443
443
446
453
443
Batch execution completed 5 times.
So the cte tally outperformed the calendar table by a marginal amount; it got the job done in 400ms vs 440ms. "Much faster than any physical table" would not be a true statement. Marginally better is more like it. Lastly - don't hate on the permanent tally table. If you want me to show you some examples where it blows the doors off a cte tally I'll be happy to oblige. :-P:-P:-P
TheSQLGuru (5/11/2016)
Alan.B: a calendar table is a poor match here, and would perform significantly worse as well. What if it didn't have the proper date range available? What if it DID have a sufficient date range (i.e. LOTs of rows to cover all ranges --> even worse performance)? If it is by day, you would be touching N days per month just to get out the individual month entries the OP asked for and having to do something to wheedle them down to distinct months, etc.
I think my performance test proves that a calendar table is a viable high-performing alternative. In the end I would go with a tally table solution because of the proper date range situation. I presented a calendar table solution because it is also a viable option and a great tool for the set-based thinker.
-- Itzik Ben-Gan 2001
Viewing 15 posts - 16 through 30 (of 37 total)
You must be logged in to reply to this topic. Login to reply