June 18, 2013 at 12:27 pm
sagesmith (6/18/2013)
Please forgive my lack of etiquette.I am humbly submitting this code for review by the gurus here on this forum. I think it might show that the CURSOR might be unfairly cast as the villain for this problem. By using the CURSOR in a similar way and just changing the INNER WHILE loop I am able to halve the time it takes the rCTE on my box. Is the CURSOR, as it is used in the example, the bottleneck?
Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.
Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?
June 18, 2013 at 12:48 pm
Yes, but I would like to point out that your cursor is only selecting 3 rows from a table with only 3 rows. So the overall processing time of the cursor is tiny.
Would your solution scale out if the number of rows in table abc were massively increased as in the real case given by Suresh Kumar?
You're absolutely right. I am working with the data as provided in the example (I've taken the liberty to up the Month count to a ridiculous level like ChrisM@Work). Now that we are learning more about the problem then what was in the original article, the analysis might make a little more sense. As written it seemed to be a confusing indictment of cursors.
Regardless, the 3 rows are the same for the rCTE being used, so on a relative basis I think the point stands.
June 18, 2013 at 1:55 pm
999,999 rows in abc to preserve the inline CTE method as is. Enforced the max 12 month count we now know as well as the insert.
results on my machine:
Inline Tally = 12.97 seconds
rCTE = 187.983 seconds
Cursor/Classic Tally= 69.66 seconds
on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.
--If the table exists, drop it.
IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results
go
--If the table exists, drop it.
IF OBJECT_ID('tempdb..#Numbers') IS NOT NULL DROP TABLE #Numbers
go
--If the table exists, drop it.
if exists (select * from sys.tables
where name = 'abc'
and Schema_Name(schema_id) = 'dbo'
and [type] = 'U')
drop table dbo.abc
go
--Populate and create the source table, dbo.abc
SELECT TOP (999999) IDENTITY(int,1,1) AS SeqNo,
CAST('20090101' AS SMALLDATETIME) AS Date_Field, 12 AS Month_Count, 100.00 AS Payment
INTO dbo.abc
FROM sys.objects s1
CROSS JOIN sys.objects s2
go
--If exists, drop the destination table dbo.def
if exists (select * from sys.tables
where name = 'def'
and Schema_NAME(schema_id) = 'dbo'
and [type] = 'U')
drop table dbo.def
go
--Create the destination table, dbo.def
create table dbo.def
(SeqNo smallint
,Date_Field smalldatetime
,Payment decimal(10,2))
go
------------ Inline Tally version ------------------
-- black hole variables:
DECLARE @SeqNo INT, @Date_Field DATETIME, @Payment DECIMAL (10,2)
DECLARE @StartTime DATETIME;
-- time store
SET @StartTime = GETDATE()
;WITH E1(n) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 10 rows
E2(n) AS (SELECT 1 FROM E1 a, E1 b),-- 10x10 rows
E3(n) AS (SELECT 1 FROM E2 a, E2 b, E2 c)-- 100x100x100 rows
SELECT
SeqNo,
DATEADD(MONTH,x.n,Date_Field) AS Date_Field,
Payment
INTO #Results
FROM dbo.abc
CROSS APPLY(SELECT TOP(Month_Count) n = -1+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E3) x
SELECT 'Inline Tally = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
--If the table exists, drop it.
IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results
go
-------------- rCTE version ------------------------------
DECLARE @StartTime DATETIME = GETDATE()
;with CTE_Base (SeqNo, Date_Field, Month_Count, Payment, Begin_Date, End_Date, Frequency)
as
(select SeqNo, Date_Field, Month_Count, Payment, Date_Field, dateadd(mm, Month_Count-1, Date_Field), 1 from dbo.abc
union all
select SeqNo, dateadd(mm, Frequency, Date_Field), Month_Count, Payment, Begin_Date, End_Date, Frequency
from CTE_Base
where dateadd(mm, Frequency, Date_Field) between Begin_Date and End_Date)
select
SeqNo,
Date_Field,
Payment
INTO #Results
from CTE_Base
where Date_Field between Begin_Date and End_Date
order by SeqNo, Date_Field
OPTION(MAXRECURSION 0)
SELECT 'rCTE = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
--If the table exists, drop it.
IF OBJECT_ID('tempdb..#Results') IS NOT NULL DROP TABLE #Results
go
-------------- CURSOR version ------------------------------
DECLARE @StartTime DATETIME = GETDATE()
declare @l_SeqNo int
,@l_date_field DATETIME
,@l_Month_Count int
,@l_Payment decimal(10, 2)
,@l_counter SMALLINT
,@l_max_month_count INT = (SELECT MAX(Month_Count) FROM dbo.abc)
CREATE TABLE #Results (SeqNo INT NOT NULL, Date_Field SMALLDATETIME, Payment DECIMAL(10,2));
SELECT TOP (@l_max_month_count) IDENTITY(int,1,1) AS number
into #Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
select @l_counter = 0
set nocount on;
declare i_Cursor CURSOR FAST_FORWARD
FOR
select SeqNo, Date_Field, Month_Count, Payment from dbo.abc
open i_Cursor
fetch next from i_Cursor into
@l_SeqNo
,@l_date_field
,@l_Month_Count
,@l_Payment
while @@fetch_status = 0
BEGIN
INSERT INTO #Results
select number, dateadd(mm, number-1, @l_date_field), @l_Payment
FROM #Numbers
WHERE number<=@l_Month_Count
fetch next from i_Cursor into
@l_SeqNo
,@l_date_field
,@l_Month_Count
,@l_Payment
end
close i_Cursor
deallocate i_Cursor
set nocount off;
SELECT 'Cursor/Numbers = ' + CAST(DATEDIFF(MS,@StartTime,GETDATE()) AS VARCHAR) + ' milliseconds'
PRINT '-------------------------------------------------------------------------'
DROP TABLE #Results
GO
DROP TABLE #Numbers
GO
June 18, 2013 at 2:41 pm
sagesmith (6/18/2013)
on a relative basis the spreads aren't getting more compelling for rCTE as the row count goes up.
Yes, thinking about it, the runtime for the all of the methods (cursor, rCTE and in-line tally version) should all go up with a linear relationship to the number of rows in the table abc as it is just doing the same work multiplied by the number of rows in table abc. Computer scientists would use big O notation and say it goes up Order n {O(n)} with the number of rows on table abc.
June 19, 2013 at 2:17 am
Suresh Kumar Maganti
I agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.
I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.
Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
June 19, 2013 at 9:03 am
Jason-299789 (6/19/2013)
Suresh Kumar MagantiI agree the bulk insert method is the better way, especially with the One off lock on the destination table, rather than the 700 million lock and commits you were doing using a cursor.
I suppose the angle Chris and others, myself included) are coming from are one of simplicity, and making queries as simple as possible helps those who take over support.
Dont get me wrong its a creative solution, I just think the article could have been refined a little to show a better progression of SQL, such as Cursor vs Recursive CTE vs Tally.
Hi Jason,
Thanks for the constructive feedback. I really do appreciate the same. I certainly would keep the guidance in mind when writing next.
--Thanks and Best Regards,
Suresh.
June 20, 2013 at 10:31 am
nycdotnet (6/18/2013)
Interesting article. Regarding the paradox of semicolons on CTEs - the rule is "the statement before the CTE has to end with a semicolon". The reason is that the WITH keyword is overloaded in T-SQL. Without the semicolon, SQL can't tell if you're saying with as a statement start or qualifying something about the previous statement.People have gotten used to putting a semicolon in front of the WITH, but this is not required - you can just as easily put it at the end of your last statement.
This also explains why this causes an error in an inline Table-value function. Inline table-value functions must be only a single statement. Therefore if you use a semi-colon in it (which is the statement terminator symbol), then you are causing the TVF to have two statements and it breaks.
Thank you for your excellent explanation about the semi-colon and its purpose. (I have seen so many comments about putting the semi-colon at the beginning of a CTE expression.)
June 21, 2013 at 6:26 am
whenriksen (6/18/2013)
One other thing:Please note that I have used TOP 100 PERCENT in the SELECT statement in this function because I want to retain the ORDER BY clause, which is otherwise not allowed in InLine table-values functions.
Top 100 Percent... Order by no longer has any effect on the result. SQL 2008 ignores that combination. You can read more here.
Absolutely true. They changed the optimizer to think that ORDER BY isn't needed when TOP 100% is used as an "optimization" so it doesn't work anymore. However, TOP 2147483647 (Largest number for an integer) still works.
Even with that, I agree that if you need the final result set to be sorted, you cannot rely on the sort of a sub-query to do it for you.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 15, 2015 at 4:04 am
"Being a CTE, its code must be preceded by a semicolon."
That's not quite true. Would rather suggest:
Any preceding statement must be terminated by a semicolon.
In in inline CTE, there is no preceding statement, consequently no semicolon,
and no paradox either.
Anyway, appreciate the lot of thought you have given to the subject.
May 15, 2015 at 7:28 am
In this case I would use a simple tally table:
DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))
DECLARE @tally TABLE (N int)
INSERT INTO @tally(N)
SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1
FROM sys.columns
INSERT INTO @source
values (1, '20090101', 10, 100)
,(2, '20100101', 7, 200)
,(3, '20110101', 5, 300)
SELECT ss.SegNo, ca.Date_Field, ss.Payment
FROM @source ss
CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)
ORDER BY ss.SegNo, ca.Date_Field
May 15, 2015 at 7:42 am
Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.
some expert gentlemen on MSDN forum had helped me to solve it using CTE
here is the code
ALTER PROCEDURE [dbo].[sp_ItemLables]
(
@repeatlables numeric( 3,0),
@blanklables int,
@lbls2prn itemforlabels readonly
)
AS
WITH
Blanks_Lines(N)
AS
(
SELECT
1
WHERE
@blanklables >= 1
UNION ALL
SELECT
N + 1
FROM
Blanks_Lines
WHERE
N < @blanklables
),
Numbers(N)
AS
(
SELECT
1
UNION ALL
SELECT
N + 1
FROM
Numbers
WHERE
N < @repeatlables
)
SELECT
NULL as line1,
NULL as line2,
NULL as line3,
NULL as line4,
NULL as line5,
NULL as line6,
NULL as line7,
NULL as line8,
NULL as ui
FROM
Blanks_Lines
UNION ALL
SELECT
ItemList.Line1,
ItemList.Line2,
ItemList.Line3,
ItemList.Line4,
ItemList.Line5,
ItemList.Line6,
ItemList.Line7,
ItemList.Line8,
itemlist.id
FROM
@lbls2prn b,
dbo.Labels itemlist,
Numbers
WHERE
-- Join itemlist to get the labels for the passed list.
itemlist.id = b.id
OPTION
(MAXRECURSION 0);
May 15, 2015 at 10:01 am
asushil (5/15/2015)
Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.some expert gentlemen on MSDN forum had helped me to solve it using CTE
here is the code
ALTER PROCEDURE [dbo].[sp_ItemLables]
(
@repeatlables numeric( 3,0),
@blanklables int,
@lbls2prn itemforlabels readonly
)
AS
WITH
Blanks_Lines(N)
AS
(
SELECT
1
WHERE
@blanklables >= 1
UNION ALL
SELECT
N + 1
FROM
Blanks_Lines
WHERE
N < @blanklables
),
Numbers(N)
AS
(
SELECT
1
UNION ALL
SELECT
N + 1
FROM
Numbers
WHERE
N < @repeatlables
)
SELECT
NULL as line1,
NULL as line2,
NULL as line3,
NULL as line4,
NULL as line5,
NULL as line6,
NULL as line7,
NULL as line8,
NULL as ui
FROM
Blanks_Lines
UNION ALL
SELECT
ItemList.Line1,
ItemList.Line2,
ItemList.Line3,
ItemList.Line4,
ItemList.Line5,
ItemList.Line6,
ItemList.Line7,
ItemList.Line8,
itemlist.id
FROM
@lbls2prn b,
dbo.Labels itemlist,
Numbers
WHERE
-- Join itemlist to get the labels for the passed list.
itemlist.id = b.id
OPTION
(MAXRECURSION 0);
I'd be interested in seeing the DDL, some sample data, and expected results based on the sample data to see if the recursion could be eliminated.
May 15, 2015 at 10:21 am
asushil (5/15/2015)
Interesting Solution way back i was looking such solution for printing bar Code Labels for given no of item for given number of times.some expert gentlemen on MSDN forum had helped me to solve it using CTE
here is the code
The same result could have been achieved in much more straight forward manner if you'd just used a tally table:
ALTER PROCEDURE [dbo].[sp_ItemLables]
(
@repeatlables numeric(3,0),
@blanklables int,
@lbls2prn itemforlabels readonly
)
AS
BEGIN
SELECT NULL as line1,
NULL as line2,
NULL as line3,
NULL as line4,
NULL as line5,
NULL as line6,
NULL as line7,
NULL as line8,
NULL as ui
FROM dbo.Tally
WHERE N <= @blanklables
UNION ALL
SELECT ItemList.Line1,
ItemList.Line2,
ItemList.Line3,
ItemList.Line4,
ItemList.Line5,
ItemList.Line6,
ItemList.Line7,
ItemList.Line8,
itemlist.id
FROM @lbls2prn b
INNER JOIN dbo.Labels itemlist
ON itemlist.id = b.id -- Join itemlist to get the labels for the passed list.
CROSS APPLY (SELECT TOP(@repeatlables) * FROM dbo.Tally) AS A
END
GO
May 15, 2015 at 12:55 pm
Thomas Schutte (5/15/2015)
In this case I would use a simple tally table:
DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))
DECLARE @tally TABLE (N int)
INSERT INTO @tally(N)
SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1
FROM sys.columns
INSERT INTO @source
values (1, '20090101', 10, 100)
,(2, '20100101', 7, 200)
,(3, '20110101', 5, 300)
SELECT ss.SegNo, ca.Date_Field, ss.Payment
FROM @source ss
CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)
ORDER BY ss.SegNo, ca.Date_Field
I love the way you think - a tally table-based solution will be much better, hands down. That was my first thought when I read this article. A couple things to point out though. ChrisM beat you to it a couple years ago (look at the first couple pages of this thread.) Second, there are a few issues with how you are using a tally table:
First, I would not use a temp table to for a tally table like you did, you are slowing things down by inserting rows into it and then you're doing a table scan against it for your solution. Ideally you want a well-indexed tally table in your schema or you can create one on the fly using a CTE. Compare the query plan for the queries below....
-- Your way:
DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2))
DECLARE @tally TABLE (N int)
INSERT INTO @tally(N)
SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1
FROM sys.columns;
SELECT TOP(12) N FROM @tally;
-- CTE Tally
WITH iTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1 -- No Sort will be performed
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) x(N)
)
SELECT *
FROM iTally;
Your solution (Queries 1 & 2) are hugely expensive relative to the CTE tally. In this case, because we only need the numbers 0-12 rows, you don't even need to use ROW_NUMBER(); you could just so this:
SELECT * FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) x(N);
The performance improvement is minuscule but you'll get a cleaner query plan:
Lastly, the ORDER BY is not necessary and adds a lot overhead for no benefit. Below is your solution and an improved version.
DECLARE @source TABLE (SegNo int, Date_Field smalldatetime, Month_Count int, Payment decimal(10,2));
DECLARE @tally TABLE (N int);
INSERT INTO @tally(N)
SELECT ROW_NUMBER() OVER (ORDER BY column_id) - 1
FROM sys.columns;
INSERT INTO @source
values (1, '20090101', 10, 100)
,(2, '20100101', 7, 200)
,(3, '20110101', 5, 300)
-- ORIGINAL SOLUTION
SELECT ss.SegNo, ca.Date_Field, ss.Payment
FROM @source ss
CROSS APPLY (SELECT DATEADD(MONTH, N, ss.Date_Field) FROM @tally WHERE N < ss.Month_Count) ca(Date_Field)
ORDER BY ss.SegNo, ca.Date_Field;
-- IMPROVED SOLUTION: tally on-the-fly, no sort
SELECT ss.SegNo, ca.Date_Field, ss.Payment
FROM @source ss
CROSS APPLY
(
SELECT DATEADD(MONTH, N, ss.Date_Field)
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) Tally(N) -- here's you tally table
WHERE N < ss.Month_Count
) ca(Date_Field)
--ORDER BY ss.SegNo, ca.Date_Field;
GO
Forgive my unsolicited advice, just showing how to make your tally table faster. 😉
-- Itzik Ben-Gan 2001
May 15, 2015 at 1:29 pm
The "TOP 100 PERCENT" seems to be ignored by the optimizer. The "TOP verylargernumber" can produce the recordset in the "expected" order. However, this is incidental. If the plan has to sort the records for a selection, then that just happens to be the order at that time. A plan that has other following steps or a plan that goes parallel will not add an additional sort to insure the final order matches the initial selective order. Another, thought - what's the final order of a query with two sub-queries each with a selective order by? The answers is in whatever order that the optimizer selects as most efficient. After data growth, the optimizer might select a different index and the result could be a different order.
RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉
Viewing 15 posts - 31 through 45 (of 68 total)
You must be logged in to reply to this topic. Login to reply