April 13, 2009 at 10:00 am
Steve Jones - Editor (4/13/2009)
OK, I'm lost again. I'll pick up the Flo story when someone posts a link.
Here's the link:
http://www.sqlservercentral.com/Forums/FindPost695508.aspx
Edit: link added to url.
April 13, 2009 at 11:51 am
RBarryYoung (4/13/2009)
Jeff Moden (4/13/2009)
Heh... crud... I just sent Tony an email saying that I may have to cancel the book. Flo just beat the hell out of a Tally table split using a While Loop, no less. I'm still doing some testing but things aren't looking real good for the Tally table on this one. With the possible exception of the "quirky update", not looking so hot for pseudo-cursors either.BWAA-HAAA... I may also have to learn "C". She also just wrote a CLR that does a dandy split... 6 times faster than even her old/new SQL method. 'Bout time someone actually took the time to do that.
Hmmm... maybe it's finally time to retire. 🙂
Do you have a pointer to that thread Jeff?
http://www.sqlservercentral.com/Forums/FindPost695508.aspx
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 12:00 pm
CLR does a faster running total than the quirky update too.
As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 13, 2009 at 12:20 pm
GSquared (4/13/2009)
CLR does a faster running total than the quirky update too.As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.
I don't know how to program in "C" (yet... gotta learn it for "self defense" 😉 )... I sure do wish someone would post some code and some test results for the running total example.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 12:45 pm
Jeff Moden (4/13/2009)
GSquared (4/13/2009)
CLR does a faster running total than the quirky update too.As we've known for quite some time, SQL just plain isn't good at what it isn't designed for. It's really good at its standard stuff, but when you try to trick it into things like complex string manipulation, running totals, et al, other things are often better.
I don't know how to program in "C" (yet... gotta learn it for "self defense" 😉 )... I sure do wish someone would post some code and some test results for the running total example.
Well, I'm not sure I'd want to tackle this in C either. I haven't coded in C in quite some time, and never enough to be good at it.
I found a sample CLR version here. The guy's an idiot when it comes to T-SQL, and it shows, but the CLR proc should be an okay place to start.
A version that I tested before was much simpler, and in VB.NET, not C#. I don't have that available, but I'll see if I can reconstruct it.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 13, 2009 at 3:43 pm
Jeff Moden (4/13/2009)
RBarryYoung (4/13/2009)
Jeff Moden (4/13/2009)
Heh... crud... I just sent Tony an email saying that I may have to cancel the book. Flo just beat the hell out of a Tally table split using a While Loop, no less. I'm still doing some testing but things aren't looking real good for the Tally table on this one. With the possible exception of the "quirky update", not looking so hot for pseudo-cursors either.BWAA-HAAA... I may also have to learn "C". She also just wrote a CLR that does a dandy split... 6 times faster than even her old/new SQL method. 'Bout time someone actually took the time to do that.
Hmmm... maybe it's finally time to retire. 🙂
Do you have a pointer to that thread Jeff?
I think I found the reason why. If you check the code, the definition field is nvarchar(max). I made some changes to the code, and only selected those procedures where the definition was less than or equal to 4000 characters. then I ran two different versions of the code. Check it out yourself. Here is the code:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE function [dbo].[ufn_Tally2](
@pStartValue bigint = 1,
@pEndValue bigint = 1000000,
@pIncrement bigint = 1
)
returns table
as
return(
with BaseNum (
N
) as (
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1 union all
select 1
),
L1 (
N
) as (
select
bn1.N
from
BaseNum bn1
cross join BaseNum bn2
cross join BaseNum bn3
),
L2 (
N
) as (
select
a1.N
from
L1 a1
cross join L1 a2
),
L3 (
N
) as (
select top ((abs(case when @pStartValue < @pEndValue
then @pEndValue
else @pStartValue
end -
case when @pStartValue < @pEndValue
then @pStartValue
else @pEndValue
end))/abs(@pIncrement) + 1)
a1.N
from
L2 a1
cross join L2 a2
),
Tally (
N
) as (
select
row_number() over (order by a1.N)
from
L3 a1
)
select
((N - 1) * @pIncrement) + @pStartValue as N
from
Tally
);
create table #TestData(
[name] sysname,
[definition] nvarchar(4000),
deflen int
)
;
go
set statistics io on;
set statistics time on;
with TestData as (
SELECT TOP(200)
o.name,
m.definition,
len(m.definition) as deflen
FROM
master.sys.all_objects o
INNER JOIN master.sys.all_sql_modules m
ON (o.object_id = m.object_id)
WHERE
[type] = 'P'
and len(m.definition) <= 4000
)
insert into #TestData
select
[name],
cast(definition as nvarchar(4000)),
deflen
from
TestData
;
go
DECLARE @result TABLE (line nvarchar(max));
set statistics io on;
set statistics time on;
with Lines
(
Item,
ProcName,
Line
)
as
(
select
row_number()
over
(partition by
td.name
order by
td.name
),
td.name,
substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)
from
-- dbo.ufn_Tally2(1,10000,1) t
-- dbo.ufn_Tally2(1,(select max(td1.deflen) from TestData td1),1) t
-- dbo.Tally t
-- cross join TestData td
#TestData td
cross apply dbo.ufn_Tally2(1,td.deflen,1) t
where
t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))
and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)
)
insert into @result
select
Line
from
Lines
;
set statistics time off;
set statistics io off;
select
*
from
@result;
go
DECLARE @result TABLE (line nvarchar(max));
set statistics io on;
set statistics time on;
with Lines
(
Item,
ProcName,
Line
)
as
(
select
row_number()
over
(partition by
td.name
order by
td.name
),
td.name,
substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)
from
dbo.Tally t
cross join #TestData td
where
t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))
and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)
)
insert into @result
select
Line
from
Lines
;
set statistics time off;
set statistics io off;
select
*
from
@result;
go
select * from #TestData;
go
drop table #TestData;
go
April 13, 2009 at 7:29 pm
Lynn Pettis (4/13/2009)
I think I found the reason why. If you check the code, the definition field is nvarchar(max). I made some changes to the code, and only selected those procedures where the definition was less than or equal to 4000 characters. then I ran two different versions of the code. Check it out yourself. Here is the code:
I'll give the code a try right now and sure do appreciate the help, but would you mind summarizing what you found? I'm thinking that limiting something to 4k probably isn't what most people had in mind for a split in 2k5. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 8:08 pm
Ok... I give up... the Tally table part of the code took 22 seconds to do the split. I stopped the function table split after 8 minutes.
I do see that the Tally table solution you have creates a working table with half a million rows in it. Is that what you were trying to show?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 13, 2009 at 10:05 pm
I guess I didn't do a good job of explaining what I found. I rewrite the code some, and it is posted below along with the statistics time and io from a run on my system here at home.
If you notice the second run ran much longer than the first. The difference being the data type of the difference column. In the first, it is limited to data the is 4000 characters or shorter. The second using the nvarchar(max) data type.
This is what I think is causing the difference in the Tally table version of the split. The blob column causes more work for SQL Server.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table '#2B3F6F97'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 200, logical reads 1201, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestData1__________________________________________________________________________________________________________00000000002A'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11157 ms, elapsed time = 11973 ms.
(10468 row(s) affected)
(10468 row(s) affected)
Table '#2C3393D0'. Scan count 0, logical reads 14647, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 200, logical reads 1472, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestData2__________________________________________________________________________________________________________00000000002B'. Scan count 1, logical reads 93, physical reads 0, read-ahead reads 0, lob logical reads 29709670, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 524860 ms, elapsed time = 533627 ms.
[/qoute]
create table #TestData1(
[name] sysname,
[definition] nvarchar(4000),
deflen int
)
;
go
create table #TestData2(
[name] sysname,
[definition] nvarchar(max),
deflen int
)
;
go
with TestData as (
SELECT TOP(200)
o.name,
m.definition,
len(m.definition) as deflen
FROM
master.sys.all_objects o
INNER JOIN master.sys.all_sql_modules m
ON (o.object_id = m.object_id)
WHERE
[type] = 'P'
and len(m.definition) <= 4000
)
insert into #TestData1
select
[name],
cast(definition as nvarchar(4000)),
deflen
from
TestData
;
go
with TestData as (
SELECT TOP(200)
o.name,
m.definition,
len(m.definition) as deflen
FROM
master.sys.all_objects o
INNER JOIN master.sys.all_sql_modules m
ON (o.object_id = m.object_id)
WHERE
[type] = 'P'
)
insert into #TestData2
select
[name],
cast(definition as nvarchar(4000)),
deflen
from
TestData
;
go
DECLARE @result TABLE (line nvarchar(max));
set statistics io on;
set statistics time on;
with Lines
(
Item,
ProcName,
Line
)
as
(
select
row_number() over (partition by td.name order by td.name),
td.name,
substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)
from
dbo.Tally t
cross join #TestData1 td
where
t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))
and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)
)
insert into @result
select
Line
from
Lines
;
set statistics time off;
set statistics io off;
select
*
from
@result;
go
DECLARE @result TABLE (line nvarchar(max));
set statistics io on;
set statistics time on;
with Lines
(
Item,
ProcName,
Line
)
as
(
select
row_number() over (partition by td.name order by td.name),
td.name,
substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N+1,charindex(nchar(1),nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1), t.N+1) - t.N - 1)
from
dbo.Tally t
cross join #TestData2 td
where
t.N < len(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1))
and substring(nchar(1) + replace(rtrim(td.definition), nchar(13) + nchar(10), nchar(1)) + nchar(1),t.N,1) = nchar(1)
)
insert into @result
select
Line
from
Lines
;
set statistics time off;
set statistics io off;
select
*
from
@result;
go
select * from #TestData1;
go
drop table #TestData1;
go
select * from #TestData2;
go
drop table #TestData2;
go
April 13, 2009 at 10:17 pm
I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:
Table '#2F10007B'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 200, logical reads 1201, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestData1__________________________________________________________________________________________________________00000000002C'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11187 ms, elapsed time = 11275 ms.
(10468 row(s) affected)
(10468 row(s) affected)
Table '#300424B4'. Scan count 0, logical reads 10559, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tally'. Scan count 200, logical reads 1201, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#TestData2__________________________________________________________________________________________________________00000000002D'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 204765 ms, elapsed time = 206609 ms.
(10468 row(s) affected)
(10468 row(s) affected)
April 14, 2009 at 8:19 am
Lynn Pettis (4/13/2009)
I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:
Not sure what you mean by you "kept the data the same in both tables this time". Why wouldn't you use the same data for each test?
Anyway, it's been known for a while that many solutions (not just Tally table splits) have some performance issues with the MAX datatypes. With that in mind, what are your conclusions about your latest test?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 14, 2009 at 8:37 am
Jeff Moden (4/14/2009)
Lynn Pettis (4/13/2009)
I made a slight change to the data loaded. I kept the data the same in both tables this time and here are the statistics io and time for both runs:Not sure what you mean by you "kept the data the same in both tables this time". Why wouldn't you use the same data for each test?
Anyway, it's been known for a while that many solutions (not just Tally table splits) have some performance issues with the MAX datatypes. With that in mind, what are your conclusions about your latest test?
The first time, I allowed any length data to be allowed in the definition field in the table defined as nvarchar(max). Realized after the run that this is apples to oranges, but still worthwhile to post the stats for the run. If you compare the times between the two runs where the data could exceed 4000 characters (nchar's nvarchar(max)) with the run time where the data was held to <= 4000 characters, the second run was faster, but still significantly slower than when the data type of the column was declared nvarchar(4000).
This tells me that when you move to varchar(max)/nvarchar(max) data types, you really need to look closer at how you are doing things. Is this point where a CLR procedure begins to make more sense? Is a while loop more efficient? The old adage, test, test, and test some more.
April 14, 2009 at 10:04 am
I have a lot of catching up to do! :w00t:
Just want to say how great it is to see so much hard discussion and comparison going on in so many threads on here. It's like attending a black belt seminar just watching you all work.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
April 14, 2009 at 10:18 am
This tells me that when you move to varchar(max)/nvarchar(max) data types, you really need to look closer at how you are doing things
And since CLR (.NET) strings are unicode, I wonder how this would affect any possible solution.
Far away is close at hand in the images of elsewhere.
Anon.
April 14, 2009 at 12:57 pm
<Sigh>
http://www.sqlservercentral.com/Forums/Topic696113-92-1.aspx
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 3,361 through 3,375 (of 66,742 total)
You must be logged in to reply to this topic. Login to reply