November 28, 2008 at 8:10 am
I did some test myself and using the test tables Jeff Moden created. I made a function that is not RBAR and beats both compared solutions on my server bigtime. If others can confirm this on their systems, it will disprove the claim that it is RBAR that sped things up!
The Smileys in the code are a forum bug, and are actually ")" characters.
1. Add the following tally function to your Jeff's test suite:
-- Tally table function (24 bit, large enaugh for general purpose use)
--
create function dbo.tfnTally( @max-2 int ) returns table
as
return
(
with
nQ( N ) as
(
select 0 union all -- 0
select 0 union all -- 1
select 0 union all -- 2
select 0 union all -- 3
select 0 union all -- 4
select 0 union all -- 5
select 0 union all -- 6
select 0 union all -- 7
select 0 union all -- 8
select 0 union all -- 9
select 0 union all -- 10
select 0 union all -- 11
select 0 union all -- 12
select 0 union all -- 13
select 0 union all -- 14
select 0 -- 15
)
select top ( isnull( @max-2, 0 ) )
row_number() over ( order by anchor.constant ) as n
from
( select 0 as constant ) as anchor
cross join nQ n1 -- 16 ( 4 bit)
cross join nQ n2 -- 256 ( 8 bit)
cross join nQ n3 -- 4096 (12 bit)
cross join nQ n4 -- 65536 (16 bit)
cross join nQ n5 -- 1048576 (20 bit)
cross join nQ n6 -- 16777216 (24 bit)
)
;
go
2. Add the following RBAR improvement (drop in replacement) to your Jeff's test suite:
CREATE FUNCTION [dbo].[GenRefDates_Indexed]
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (Date datetime not null primary key clustered with fillfactor = 100)
AS BEGIN
DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)
WHILE @tmpDate <= @EndDate
BEGIN
INSERT INTO @table VALUES (@tmpDate)
SET @tmpDate = DateAdd(dd, 1, @tmpDate)
END
RETURN
END
3. Add the following non RBAR function (drop in replacement) to your Jeff's test suite:
create function dbo.GenRefDates_noRBAR
(
@StartDate datetime
, @EndDate datetime
)
returns @table table
(
Date datetime not null
)
as begin
insert into @table( Date )
select
dateadd( dd, tally.N - 1, @StartDate )
from
dbo.tfnTally( datediff( dd, @StartDate, @EndDate ) + 1 ) as tally
;
return
end
go
4. Add the following non RBAR improved function (drop in replacement) to your Jeff's test suite:
create function dbo.GenRefDates_noRBAR_indexed
(
@StartDate datetime
, @EndDate datetime
)
returns @table table
(
Date datetime not null primary key clustered with fillfactor = 100
)
as begin
insert into @table( Date )
select
dateadd( dd, tally.N - 1, @StartDate )
from
dbo.tfnTally( datediff( dd, @StartDate, @EndDate ) + 1 ) as tally
;
return
end
go
Add the following tests at the end of the Jeff Moden's existing test script
--===== Method with RBAR looping function (indexed result this time)
PRINT '===== Method with RBAR looping function (indexed result this time) ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates_Indexed(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with temp table generation (non-indexed) using and numbers/tally function
PRINT '===== Method with NO-RBAR (non-indexed) function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates_noRBAR(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with temp table generation (indexed) using and numbers/tally function
PRINT '===== Method with NO-RBAR (indexed) function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates_noRBAR_Indexed(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
You will find that the RBAR version offers NO advantage over the NON-RBAR versions. You will also find that the indexed versions of both RBAR and NON-RBAR perform identical and MUCH faster then their non-indexed versions, take a good look at these functions. This proofs that it is not the RBAR property that is significant here, but indexing and query complexity is as an inline version as below performs worst.
Full inline TVF (performs worst on test code due to complexity of query)
create function dbo.GenRefDates_noRBAR_inline
(
@StartDate datetime
, @EndDate datetime
)
returns table
as
return
(
select
dateadd( dd, tally.N - 1, @StartDate ) as Date
from
dbo.tfnTally( datediff( dd, @StartDate, @EndDate ) + 1 ) as tally
)
;
go
And the accompaning test script addition:
--===== Method with temp table generation (inline) using and numbers/tally function
PRINT '===== Method with NO-RBAR (inline) function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates_noRBAR_Inline(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
November 28, 2008 at 8:16 am
Joe Celko (11/28/2008)
Let me make an appeal to simple logic.You build a look-up table once (and if you did a proper calendar table, you also benefit from other uses). It probably has a clustered index or key to take advantage of the sequential nature of time. It probably has other constraints to ensure data integrity.
You have a table-valued function that creates a new table every time, without an index or key. And certainly without constraints for the optimizer to use.
There are (n) sessions in the DB. The Calendar table is cached in main storage -- 20, 50 or even 100 years is small. The function is re-computed over and over.
So before you get to the query, the look-up table has won on both speed and integrity.
That's a good explanation, Joe. Thanks.
The key here is jacRoberts claims and he swears that the function code he is running is faster and has even offered output showing his timings. Either he's a consumate "troll" trying to see how long he can get this thread to last or he actually has something going on with his system that produce the results he claims. If it's the later of the two, then I'm very interested in what the difference between his system and the rest of the world's might be because it would be incredibly useful.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 8:22 am
jacRoberts,
I have SET NOCOUNT ON in the test code so I can't do the verification. Pardon me for asking stupid questions, but I have to check... Have you verified that both result sets are returning both identical rows and an identical number of rows?
There's something else going on here and the devil may be in the data...
Using the methods found in the link in my signature, would you post the table creation statements for the Event and RefDate tables that YOU have along with 10 rows for each. Be sure to include all indexes and keys on each, please. This is very interesting and I'm trying to get to the bottom of it because it could be a "fix" for other RBAR problems.
Also, it would be good to know more about the machine and the exact version of SQL Server that you're using to do this. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 8:31 am
peter (11/28/2008)
Considdering that for me, just writing a post (let alone a focused one) takes a lot of time and then reading all the work you have put into your argument makes me feel "a bit" humble :).
Heh... thanks for the compliment, Peter. I'm just "dedicated", that's all.
Interesting sidebar... do you know what the difference between being "dedicated" to a cause and being "loyal" to a cause is? You have to think of "ham and eggs"... heh... the chicken is loyal... the pig was dedicated. 😛
Did you even sleep?
Ummmm.... ok... new word... guess I'll have to look that up in the dictionary when I get time. :hehe:
Either way, you made a strong case here that something else must have been wrong in the original solution. Be it the queries themselfs, the modeling/indexing or even not up to date statistics.
I didn't post it, but I tested in many different scenarios... no indexes, reversed indexes, silly indexes, no statistics, etc, etc. The set based solution always beats the RBAR with the data I built. I'm thinking that there's something in the data itself that is causing this. It could be a simple as he's getting smaller rowcounts because there's a "time" element in the dates in his table or something similar. Whatever it is, I'd like to find out because I'm just not seeing it.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 8:39 am
Play with my test code and it might help explain some things. But best ofcourse is to have a test database online somewhere that is accurate in modeling and in volume. The data problably needs to be made meaningless without altering distributions for obvious reasons!
Lets hear what he comes up with, so far I am sure we helped him shave off even more time (see my earlyer response).
November 28, 2008 at 8:41 am
Time to compare functions.
In the code below I create a 1,000,000 row Tally table starting at 1. The table has one column, n. This is also clustered primary key.
I then create two functions, the first is the RBAR TVF, the second is a NoRBAR TVF using the Tally table.
I then use these functions to load a Table variable with dates.
-- Create and load 1,000,000 row Tally table
create table dbo.Tally (
n int primary key
);
with TallyLoad as (
select
row_number() over (order by a.object_id) as rownum
from
sys.objects a
cross join sys.objects b
cross join sys.objects c
cross join sys.objects d
)
insert into dbo.Tally
select top (1000000) rownum
from TallyLoad
order by rownum;
go
-- create GenRefDates functions, one RBAR (while loop), one NoRBAR (Tally table)
CREATE FUNCTION [dbo].[GenRefDates_RBAR]
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (ADate datetime)
AS BEGIN
DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)
WHILE @tmpDate <= @EndDate
BEGIN
INSERT INTO @table VALUES (@tmpDate)
SET @tmpDate = DateAdd(dd, 1, @tmpDate)
END
RETURN
END
go
CREATE FUNCTION [dbo].[GenRefDates_NoRBAR]
(
@StartDate datetime,
@EndDate datetime
)
RETURNS @table TABLE (ADate datetime)
AS BEGIN
DECLARE @tmpDate datetime
SET @tmpDate = Convert(varchar, @StartDate, 112)
SET @EndDate = Convert(varchar, @EndDate, 112)
insert into @table
select
dateadd(dd, n - 1, @StartDate)
from
dbo.Tally
where
n <= datediff(dd, @tmpDate, @EndDate) + 1;
RETURN
END
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= RBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2008-11-30';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_RBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= RBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= NoRBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2008-11-30';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_NoRBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= NoRBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= RBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2009-10-31';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_RBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= RBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= NoRBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2009-10-31';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_NoRBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= NoRBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= RBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2018-10-31';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_RBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= RBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
declare @StartTime datetime,
@EndTime datetime,
@StartDate datetime,
@EndDate datetime,
@StrData varchar(256);
declare @MyDates table (
MyDate datetime
);
print '========================================= NoRBAR TVF =======================================================';
set @StartDate = '2008-11-01';
set @EndDate = '2018-10-31';
set @StartTime = getdate();
set statistics io on;
set statistics time on;
insert into
@MyDates
select
*
from
dbo.GenRefDates_NoRBAR (@StartDate, @EndDate)
set statistics io off;
set statistics time off;
set @EndTime = getdate();
select @StrData = char(13) + char(10) + '===> ' + 'Start Time: ' + convert(varchar(30), @StartTime, 121) +
' EndTime: ' + convert(varchar(30), @EndTime, 121) + ' Elapsed Time: ' +
convert(varchar(10), datediff(ms,@StartTime, @EndTime)) + ' Record Count: ' +
convert(varchar(10), count(*))
from
@MyDates;
print @StrData;
print '========================================= NoRBAR TVF =======================================================';
print char(13) + char(10) + char(13) + char(10)
go
Here are the results of the runs. If you note, the first run was for 1 month, then 1 year, then 10 years. You will note that the NoRBAR function scales extremely well, while the RBAR one, not so well.
========================================= RBAR TVF =======================================================
Table '#793441A2'. Scan count 0, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#5402BCF3'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 14 ms.
(30 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:57.660 EndTime: 2008-11-28 08:25:57.677 Elapsed Time: 16 Record Count: 30
========================================= RBAR TVF =======================================================
========================================= NoRBAR TVF =======================================================
Table '#7A2865DB'. Scan count 0, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#55EB0565'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
(30 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:57.770 EndTime: 2008-11-28 08:25:57.770 Elapsed Time: 0 Record Count: 30
========================================= NoRBAR TVF =======================================================
========================================= RBAR TVF =======================================================
Table '#7B1C8A14'. Scan count 0, logical reads 365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#5402BCF3'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 63 ms, elapsed time = 147 ms.
(365 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:57.770 EndTime: 2008-11-28 08:25:57.927 Elapsed Time: 156 Record Count: 365
========================================= RBAR TVF =======================================================
========================================= NoRBAR TVF =======================================================
Table '#7C10AE4D'. Scan count 0, logical reads 365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#55EB0565'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 9 ms.
(365 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:57.957 EndTime: 2008-11-28 08:25:57.973 Elapsed Time: 16 Record Count: 365
========================================= NoRBAR TVF =======================================================
========================================= RBAR TVF =======================================================
Table '#7D04D286'. Scan count 0, logical reads 3659, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#5402BCF3'. Scan count 1, logical reads 9, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 656 ms, elapsed time = 1127 ms.
(3652 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:57.990 EndTime: 2008-11-28 08:25:59.113 Elapsed Time: 1123 Record Count: 3652
========================================= RBAR TVF =======================================================
========================================= NoRBAR TVF =======================================================
Table '#7DF8F6BF'. Scan count 0, logical reads 3659, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#55EB0565'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 77 ms.
(3652 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
===> Start Time: 2008-11-28 08:25:59.113 EndTime: 2008-11-28 08:25:59.190 Elapsed Time: 76 Record Count: 3652
========================================= NoRBAR TVF =======================================================
So, does performance matter, yes. While the code used in the RBAR function works well for a small number of values, it does not work well as the number of values increases.
What will happen is one of two things. Some poor Joe will find the RBAR TVF and simply decide it does exactly what he needs, but he uses it to generate a large number of dates like in my third test for example. Or, some poor Joe will find the RBAR TVF, look at the code, and then decide that is just what he was looking for functionally, and use that code as a template in another stored procedure or TVF, and have if fail miserably due to the large number of values he is generating.
:smooooth:
November 28, 2008 at 8:43 am
jacroberts (11/28/2008)
Joe Celko (11/28/2008)
Let me make an appeal to simple logic.You build a look-up table once (and if you did a proper calendar table, you also benefit from other uses). It probably has a clustered index or key to take advantage of the sequential nature of time. It probably has other constraints to ensure data integrity.
You have a table-valued function that creates a new table every time, without an index or key. And certainly without constraints for the optimizer to use.
There are (n) sessions in the DB. The Calendar table is cached in main storage -- 20, 50 or even 100 years is small. The function is re-computed over and over.
So before you get to the query, the look-up table has won on both speed and integrity.
There is more than one way to skin a cat.
Absolutely correct! However, until I see proof in the form of demonstrable code, I still question the very existance of the cat. 😛
Like I said, there's something else going on here... let's see if we can get to the bottom of it together. Let's keep the rhetoric down to a minimum because this thread already got close to flaming a couple of times.
All these hypothesis and claims have already been laid out over and over... the ONLY thing that matters now is demonstrable proof in the form of code and data with repeatable results.
So far, none of us has been able to duplicate your claims. In fact, most code offered, thus far, seems to repudiate your claims. You posted a run result that seems to support your claims but it looks like you may have used different data (because ot the table name change you made) and that MAY be the key.
Everybody stick to the subject, eh? Only the code and the data mean anything for this thread now.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 8:45 am
Jeff Moden (11/28/2008)
I'm not sure what's going on... I ran the code you modified and here's what I got...===== Set based method with Date table =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 375 ms, elapsed time = 931 ms.
====================================================================================================
===== Method with RBAR looping function =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 10531 ms, elapsed time = 11474 ms.
In other words, on my machine, your changes helped the RBAR solution a bit, but the set bsed solution still ate it up.
I also noticed that you changed my table reference from RefDate to RefDate[font="Arial Black"]s[/font] which means you didn't use the same demo table as I which, of course, means another difference that I can't see. This is going to be a tough one to resolve.
It would be interesting to see what other folks get as a result from "our" test code.
Anyone care to assist?
I changed the code to use RefDate table
--===== Set based method with Date table
PRINT '===== Set based method with Date table ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SET NOCOUNT ON
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.RefDate d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
PRINT REPLICATE('=',100)
GO
--===== Method with RBAR looping function
PRINT '===== Method with RBAR looping function ====='
SET STATISTICS TIME ON
DECLARE @MonthYear DATETIME,
@MonthStart DATETIME,
@MonthEnd DATETIME,
@NextMonthStart DATETIME
SELECT @MonthYear = 'May 2008',
@MonthStart = @MonthYear,
@NextMonthStart = DATEADD(mm,1,@MonthStart),
@MonthEnd = @NextMonthStart-1
SET NOCOUNT ON
SELECT d.Date, e.SomeInt, SUM(e.SomeMoney) AS Total
FROM dbo.Event e
RIGHT OUTER JOIN
dbo.GenRefDates(@MonthStart,@MonthEnd) d
ON e.SomeDate = d.Date
WHERE d.Date >= @MonthStart
AND d.Date < @NextMonthStart
GROUP BY d.Date, e.SomeInt
ORDER BY d.Date, e.SomeInt
SET STATISTICS TIME OFF
GO
The stats I got were:
===== Set based method with Date table =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 296 ms, elapsed time = 375 ms.
====================================================================================================
===== Method with RBAR looping function =====
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 282 ms, elapsed time = 333 ms.
There is a slight variation each time I run it and there really appears to be little difference in the two queries.
The real difference should be noticed with just:
SELECT * FROM GenRefDates('2008-01-01','2008-02-01') d
and
SELECT * FROM RefDate WHERE DATE BETWEEN '2008-01-01' AND '2008-02-01'
The difference in the run time of these two queries is just a couple of milliseconds in favour of table based query.
But my point was that it makes an insignificant difference in this case.
November 28, 2008 at 8:53 am
Wow... now that's interesting. We need to find out what the differences are...
Let's start out with one of my previous requests:
1. What is the CPU, Ram, Harddrive, etc, for your box.
2. What version, sp, and cummulative update for SQL Server are you using?
As a side bar, the one thing that this proves, so far, is that the previous claims of a lot of rows in the RefDate table wasn't the problem and neither was the set based solution. I'd say, we've shifted gears... on every system I've ever seen, the RBAR solution looses very badly. Yet, on your system, it's as good as and, apparently, sometimes better than the set based solution. WHAT is the difference that makes this possible?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 8:57 am
Lynn Pettis (11/28/2008)
Time to compare functions.In the code below I create a 1,000,000 row Tally table starting at 1. The table has one column, n. This is also clustered primary key.
I then create two functions, the first is the RBAR TVF, the second is a NoRBAR TVF using the Tally table.
I then use these functions to load a Table variable with dates.
Can you please look at the test code I posted earlyer and add RBAR and non-RBAR functions based on the tally FUNCTION I provided? I do not intend to cloud the issue, but a lot of the time a tally function is better then a tally table.
And most non-inline functions need an indexed table result to perform optimal, your test code had none atm. In my tests that alone halved the time of the queries, be it an RBAR or a non-RBAR function being used.
November 28, 2008 at 9:01 am
Jeff Moden (11/28/2008)
Wow... now that's interesting. We need to find out what the differences are...Let's start out with one of my previous requests:
1. What is the CPU, Ram, Harddrive, etc, for your box.
2. What version, sp, and cummulative update for SQL Server are you using?
As a side bar, the one thing that this proves, so far, is that the previous claims of a lot of rows in the RefDate table wasn't the problem and neither was the set based solution. I'd say, we've shifted gears... on every system I've ever seen, the RBAR solution looses very badly. Yet, on your system, it's as good as and, apparently, sometimes better than the set based solution. WHAT is the difference that makes this possible?
Oh, yeah... I almost forgot. It's silly, but we need to know for sure... please also verify that the row counts and the data being returned for both are identical. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 9:03 am
Hey folks... I've got outside "obligations"... I probably won't be able to return to this for a while. Keep focused... we need to find out why the system jacRoberts is running on handles the RBAR solution so very well. Heh... no feather fluffing, name calling, or stating the "obvious", eh? Stick to the subject... it's a big sandbox with room enough for us all. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
November 28, 2008 at 9:15 am
Jeff Moden (11/28/2008)
jacRoberts,I have SET NOCOUNT ON in the test code so I can't do the verification. Pardon me for asking stupid questions, but I have to check... Have you verified that both result sets are returning both identical rows and an identical number of rows?
There's something else going on here and the devil may be in the data...
Using the methods found in the link in my signature, would you post the table creation statements for the Event and RefDate tables that YOU have along with 10 rows for each. Be sure to include all indexes and keys on each, please. This is very interesting and I'm trying to get to the bottom of it because it could be a "fix" for other RBAR problems.
Also, it would be good to know more about the machine and the exact version of SQL Server that you're using to do this. Thanks.
The machine is
Windows 2003 SP1
3584 MB
3.60 GHz Intel Xeon (Hyper-threaded)
Microsoft SQL Server 2005 - 9.00.3042.00 (Intel X86) Feb 9 2007 22:47:07 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)
The table is:
/****** Object: Table [dbo].[Event] Script Date: 11/28/2008 16:08:09 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Event](
[EventID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NULL,
[SomeLetters2] [char](2) NULL,
[SomeCSV] [varchar](80) NULL,
[SomeMoney] [money] NULL,
[SomeDate] [datetime] NULL,
[SomeHex12] [varchar](12) NULL,
PRIMARY KEY CLUSTERED
(
[EventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Event Rows
134148ALPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1059.512002-09-07 00:00:00.000C2E0CE2DB52A
248444AHPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1043.922009-12-19 00:00:00.00043C6CBC359A5
343988FGPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1038.342001-02-24 00:00:00.00020ED33B58595
47548IGPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1046.252004-01-02 00:00:00.000E4AA25010C59
540033UEPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1089.172005-07-31 00:00:00.0001A6C3EDF8349
624973XLPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part102.862008-04-26 00:00:00.00067611BAE6164
743800HXPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1041.122005-01-02 00:00:00.000AFA9625FE532
810951FFPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1050.072005-06-07 00:00:00.00013847D7D89A7
922175TKPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part1045.052003-08-04 00:00:00.00083B9227B2A84
1047765GZPart01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part108.172000-10-26 00:00:00.000819F6E858E39
RefDate:
CREATE TABLE [dbo].[RefDate](
[Date] [datetime] NOT NULL,
CONSTRAINT [PK_Date_Date] PRIMARY KEY CLUSTERED
(
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]
2000-01-02 00:00:00.000
2000-01-03 00:00:00.000
2000-01-04 00:00:00.000
2000-01-05 00:00:00.000
2000-01-06 00:00:00.000
2000-01-07 00:00:00.000
2000-01-08 00:00:00.000
2000-01-09 00:00:00.000
2000-01-10 00:00:00.000
2000-01-11 00:00:00.000
Both queries return the same rows.
November 28, 2008 at 9:22 am
We might also need to make sure all tests are performed on a database with the same recovery modal (tempdb uses 'simple') and a comparable load if we are going into tiny differences.
November 28, 2008 at 10:47 am
My original actual query was a lot different and maybe I've over simplified it for want of a simple example. I used to have two tables, RefDates which was just a table of dates and RefTimePeriods which had a row for each 15 minute period in a day (96 rows in total). I then had a view which was a cross product of the RefDates and RefTimePeriods tables.
CREATE TABLE [dbo].[RefDates]
(
[Date] datetime NOT NULL,
CONSTRAINT [PK_RefDates] PRIMARY KEY CLUSTERED
(
[Date] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[RefTimePeriods]
(
[StartTime] datetime NOT NULL,
[EndTime] datetime NOT NULL,
[Hour] tinyint NOT NULL,
[Quarter] tinyint NOT NULL,
CONSTRAINT [PK_RefTimePeriods] PRIMARY KEY CLUSTERED
(
[Hour] ASC,
[Quarter] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE VIEW [dbo].[RefDateTimePeriods] AS
(
SELECT RD.Date Date,
RT.Hour,
RT.Quarter
FROM RefDates RD,
RefTimePeriods RT
)
I changed the RefDateTimePeriods view to be a TVF that used the GenRefDates TVF.
CREATE FUNCTION dbo.RefDateTimePeriods
(
@StartDateTime smalldatetime,
@EndDateTime smalldatetime
)
RETURNS @Table TABLE
(
Date datetime,
Hour tinyint,
Quarter tinyint
)
AS
BEGIN
INSERT INTO @Table(Date, Hour, Quarter)
SELECT RD.Date Date,
RT.Hour,
RT.Quarter
FROM dbo.GenRefDates(@StartDateTime, @EndDateTime) RD
CROSS JOIN dbo.RefTimePeriods RT
RETURN
END
GO
The RefDateTimePeriods TVF was then used in another view
CREATE VIEW Downtime AS
(
SELECT *
FROM RefDateTimePeriods R
LEFT JOIN myTable1 S
ON S.Date = R.Date
AND S.Hour = R.Hour
AND S.Quarter = R.Quarter
LEFT JOIN myTable2 A
ON A.Date = S.Date
AND A.Hour = S.Hour
AND A.Quarter = S.Quarter
AND A.myColumn1 IN (2, 3, 4, 5, 6, 7, 8, 9, 10)
)
This Downtime view is used in reports with outer joins and these ran slowly but sped up when a RBAR TVF was added that only returned the required rows. The difference in the run time of the two methods was orders of magnitude. If I had used a set based query in the TVF instead of an RBAR then I would potentially save a couple of milliseconds but this would be totally insignificant to the total run time of the report.
Viewing 15 posts - 91 through 105 (of 106 total)
You must be logged in to reply to this topic. Login to reply