January 24, 2009 at 6:25 pm
RBarryYoung (1/24/2009)
Just to clarify my earlier response to Hemin's question: I was assuming that he was asking about using SELECT pseudocursors to do this. There are of course lots of ways to accomplish the same result, many of which use SELECT at one point or another.
Yep, I agree...
--Jeff Moden
Change is inevitable... Change for the better is not.
January 24, 2009 at 9:53 pm
OK, I decided that I wanted to test this with a substantial sized dataset. Here is the script to generate apprx 1 million rows:
Drop Table TestPCursors
go
CREATE TABLE TestPCursors
(OrderNo INT NOT NULL, SeqNo INT NOT NULL, Type CHAR(2))
ALTER TABLE TestPCursors
ADD PRIMARY KEY CLUSTERED (OrderNo, SeqNo)
go
;With cteNumbers as (Select TOP 300
ROW_NUMBER() Over(Order by object_id) as Num
From master.sys.system_columns)
INSERT into TestPCursors
Select (Num+OBJECT_ID) as OrderNo
, column_id
MAX((Num+Object_Id+column_id)%7)
, MAX(Case When column_id=1 then Left(name,2)
When (Object_Id+column_id)%5=0
Then Left(name,2)
When ((Num+Object_Id+column_id)%7)=0 Then Left(name,2)
Else '' End)
From master.sys.system_columns
Cross Join cteNumbers
Where (Num+Object_Id) NOT IN(Select object_id From master.sys.system_columns)
Group By (Num+OBJECT_ID), column_id
This takes a couple of minutes to load the data.
I ran Jeff's code against this and it took 7 seconds.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 24, 2009 at 10:04 pm
So here's my code using the SELECT pseudocursor hot-wired to act like an UPDATE pseudocursor, plus every trick I could think of to remedy the problem of SQL string aggregation:
Set NoCount ON
Select convert(varchar(24), GETDATE(), 121)
--======
DECLARE @PrevType CHAR(2)
Declare @buffer varbinary(MAX)
Select @buffer = Cast('' as varbinary)
Declare @b6 varbinary(2000)
, @b8 varbinary(8000)
, @b10 varbinary(MAX)
, @b12 varbinary(MAX)
, @b14 varbinary(MAX)
, @b16 varbinary(MAX)
, @b18 varbinary(MAX)
, @bFinal varbinary(MAX)
Select @b6 =Cast('' as varbinary)
, @b8 =Cast('' as varbinary)
, @b10 =Cast('' as varbinary)
, @b12 =Cast('' as varbinary)
, @b14 =Cast('' as varbinary)
, @b16 =Cast('' as varbinary)
, @b18 =Cast('' as varbinary)
, @bFinal =Cast('' as varbinary)
Select TOP 1000000
@buffer = @buffer + Cast(OrderNo as varbinary(4))
+ Cast(SeqNo as varbinary(4))
+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as BINARY(2))
, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End
, @b6 = CASE When RowNum%64=0 Then @b6 + @buffer Else @b6 End
, @buffer = CASE When RowNum%64=0 Then Cast('' as binary(2)) Else @buffer End
, @b8 = CASE When RowNum%256=0 Then @b8 + @b6 Else @b8 End
, @b6 = CASE When RowNum%256=0 Then Cast('' as varbinary) Else @b6 End
, @b10 = CASE When RowNum%1024=0 Then @b10 + @b8 Else @b10 End
, @b8 = CASE When RowNum%1024=0 Then Cast('' as varbinary) Else @b8 End
, @b12 = CASE When RowNum%4096=0 Then @b12 + @b10 Else @b12 End
, @b10 = CASE When RowNum%4096=0 Then Cast('' as varbinary) Else @b10 End
, @b14 = CASE When RowNum%16384=0 Then @b14 + @b12 Else @b14 End
, @b12 = CASE When RowNum%16384=0 Then Cast('' as varbinary) Else @b12 End
, @b16 = CASE When RowNum%65768=0 Then @b16 + @b14 Else @b16 End
, @b14 = CASE When RowNum%65768=0 Then Cast('' as varbinary) Else @b14 End
, @b18 = CASE When RowNum%263072=0 Then @b18 + @b16 Else @b18 End
, @b16 = CASE When RowNum%263072=0 Then Cast('' as varbinary) Else @b16 End
From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum
From TestPCursors) A
Order By OrderNo, SeqNo
Select @bFinal =@b18 + @b16 + @b14 + @b12 + @b10 + @b8 + @b6 + @buffer
--======
Select convert(varchar(24), GETDATE(), 121)
;With cteNumbers as (Select TOP 10000--00
ROW_NUMBER() Over(Order By c1.object_id) as RowNum
From master.sys.system_columns c1, master.sys.system_columns c2)
Select Cast(SUBSTRING(@bFinal, (RowNum-1)*10+1, 4) as int) as OrderNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+5, 4) as int) as SeqNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+9, 2) as CHAR(2)) as [Type]
Into #tempOut --dump into temp table, to factor out display timing differences
From cteNumbers
--======
Select convert(varchar(24), GETDATE(), 121)
Drop table #tempOut
This code runs in 10-11 seconds on my system, which I think is about as fast as this trick can be made to run.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 24, 2009 at 10:15 pm
Thank you so much Jeff and Young. I guess as a newbie its worth it for me to go for a TempTable, it works for me. And by the way Jeff I do read that article, its awesome. I guess it will take a little while to shift higher gear.:)
January 25, 2009 at 9:47 am
RBarryYoung (1/24/2009)
So here's my code using the SELECT pseudocursor hot-wired to act like an UPDATE pseudocursor, plus every trick I could think of to remedy the problem of SQL string aggregation:
Set NoCount ON
Select convert(varchar(24), GETDATE(), 121)
--======
DECLARE @PrevType CHAR(2)
Declare @buffer varbinary(MAX)
Select @buffer = Cast('' as varbinary)
Declare @b6 varbinary(2000)
, @b8 varbinary(8000)
, @b10 varbinary(MAX)
, @b12 varbinary(MAX)
, @b14 varbinary(MAX)
, @b16 varbinary(MAX)
, @b18 varbinary(MAX)
, @bFinal varbinary(MAX)
Select @b6 =Cast('' as varbinary)
, @b8 =Cast('' as varbinary)
, @b10 =Cast('' as varbinary)
, @b12 =Cast('' as varbinary)
, @b14 =Cast('' as varbinary)
, @b16 =Cast('' as varbinary)
, @b18 =Cast('' as varbinary)
, @bFinal =Cast('' as varbinary)
Select TOP 1000000
@buffer = @buffer + Cast(OrderNo as varbinary(4))
+ Cast(SeqNo as varbinary(4))
+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as BINARY(2))
, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End
, @b6 = CASE When RowNum%64=0 Then @b6 + @buffer Else @b6 End
, @buffer = CASE When RowNum%64=0 Then Cast('' as binary(2)) Else @buffer End
, @b8 = CASE When RowNum%256=0 Then @b8 + @b6 Else @b8 End
, @b6 = CASE When RowNum%256=0 Then Cast('' as varbinary) Else @b6 End
, @b10 = CASE When RowNum%1024=0 Then @b10 + @b8 Else @b10 End
, @b8 = CASE When RowNum%1024=0 Then Cast('' as varbinary) Else @b8 End
, @b12 = CASE When RowNum%4096=0 Then @b12 + @b10 Else @b12 End
, @b10 = CASE When RowNum%4096=0 Then Cast('' as varbinary) Else @b10 End
, @b14 = CASE When RowNum%16384=0 Then @b14 + @b12 Else @b14 End
, @b12 = CASE When RowNum%16384=0 Then Cast('' as varbinary) Else @b12 End
, @b16 = CASE When RowNum%65768=0 Then @b16 + @b14 Else @b16 End
, @b14 = CASE When RowNum%65768=0 Then Cast('' as varbinary) Else @b14 End
, @b18 = CASE When RowNum%263072=0 Then @b18 + @b16 Else @b18 End
, @b16 = CASE When RowNum%263072=0 Then Cast('' as varbinary) Else @b16 End
From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum
From TestPCursors) A
Order By OrderNo, SeqNo
Select @bFinal =@b18 + @b16 + @b14 + @b12 + @b10 + @b8 + @b6 + @buffer
--======
Select convert(varchar(24), GETDATE(), 121)
;With cteNumbers as (Select TOP 10000--00
ROW_NUMBER() Over(Order By c1.object_id) as RowNum
From master.sys.system_columns c1, master.sys.system_columns c2)
Select Cast(SUBSTRING(@bFinal, (RowNum-1)*10+1, 4) as int) as OrderNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+5, 4) as int) as SeqNo
, Cast(SUBSTRING(@bFinal, (RowNum-1)*10+9, 2) as CHAR(2)) as [Type]
Into #tempOut --dump into temp table, to factor out display timing differences
From cteNumbers
--======
Select convert(varchar(24), GETDATE(), 121)
Drop table #tempOut
This code runs in 10-11 seconds on my system, which I think is about as fast as this trick can be made to run.
OK, I'll bite. What the heck is goin on there? I've looked at it a few times, and it's not making any more sense than the first time I read it. Tried to run it and see, but get errors creating the sample data table. I'm sure it's quite the trick, and definitely not one I've seen before.
(Msg 102, Level 15, State 1, Line 7
Incorrect syntax near 'Num'.
Msg 156, Level 15, State 1, Line 16
Incorrect syntax near the keyword 'Group'.)
January 25, 2009 at 10:55 am
Garadin (1/25/2009)
OK, I'll bite. What the heck is goin on there? I've looked at it a few times, and it's not making any more sense than the first time I read it. Tried to run it and see, but get errors creating the sample data table. I'm sure it's quite the trick, and definitely not one I've seen before.(Msg 102, Level 15, State 1, Line 7
Incorrect syntax near 'Num'.
Msg 156, Level 15, State 1, Line 16
Incorrect syntax near the keyword 'Group'.)
Arrgh! My bad, sorry. I've been writing my queries on SQL 2008 lately to get more used to it and I forgot to test it on SQL 2005. Hang on, I'll get a corrected version...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 11:10 am
Actually, I think that it was just a typo from a left over diagnostic column.
Anyway, here is a slightly reqwritten version of the Load script:
Drop Table TestPCursors
go
CREATE TABLE TestPCursors
(OrderNo INT NOT NULL, SeqNo INT NOT NULL, Type CHAR(2))
ALTER TABLE TestPCursors
ADD PRIMARY KEY CLUSTERED (OrderNo, SeqNo)
go
;With cteNumbers as (Select TOP 300
ROW_NUMBER() Over(Order by object_id) as Num
From master.sys.system_columns)
, cteCol300 as (Select Object_id
, Num
, column_id
, name
, Num+object_id as OrderNo
, Num+object_id+column_id as ShiftStart1
, object_id+column_id as ShiftStart2
From master.sys.system_columns
Cross Join cteNumbers)
INSERT into TestPCursors
Select OrderNo
, column_id
, MAX(Case When column_id=1 Then Left(name,2)
When ShiftStart2%5=0 Then Left(name,2)
When (ShiftStart1%7)=0 Then Left(name,2)
Else '' End)
From cteCol300
Group By OrderNo, column_id
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 11:13 am
OK, I checked my solution code on 2005, and it works there too In fact its even faster, 8.5 seconds.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 11:28 am
Here is the vastly simpler non-optimized version:
DECLARE @PrevType CHAR(2)
Declare @buffer varchar(MAX)
Select @buffer = ''
--====== Load the buffer with all of the output values
Select TOP 10000--00
@buffer = @buffer + Cast(OrderNo as CHAR(11))
+ Cast(SeqNo as CHAR(3))
+ Cast(Case When [Type]=' ' Then @PrevType Else [Type] End as CHAR(2))
, @PrevType = CASE When [Type]=' ' Then @PrevType Else [Type] End
From (Select *, (ROW_NUMBER() Over(Order By OrderNo, SeqNo)) as RowNum
From TestPCursors) A
Order By OrderNo, SeqNo
--====== Now extract and display the output values for every row
;With cteNumbers as (Select TOP 10000--00
ROW_NUMBER() Over(Order By c1.object_id) as RowNum
From master.sys.system_columns c1, master.sys.system_columns c2)
Select Cast(SUBSTRING(@buffer, (RowNum-1)*16+1, 11) as int) as OrderNo
, Cast(SUBSTRING(@buffer, (RowNum-1)*16+12, 3) as int) as SeqNo
, Cast(SUBSTRING(@buffer, (RowNum-1)*16+15, 2) as CHAR(2)) as [Type]
From cteNumbers
Note that because it is so slow, i have restricted it to just 10,000 rows.
Logically, this is the same algorithm as the optimized version, so hopefully you can see where this one is going...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 11:46 am
OK, how it works. First the unoptimized (simple) version:
The idea here is, just like an Update pseudocursor, use a Select pseudocursor to accumulate state across rows, to determine the handling of the output columns within each row. Two important differences, however. First, because SELECT cannot write to bot variables and output columns, I use a single string variable (@buffer) to hold all of my output columns and rows. The second difference is that because I am using ORDER BY on the outer select to control my order, this technique is completely supported.
Now, once this massive string (apprx. 16 MB) is created, then I use a Numbers/Tally table to enumerate all of the output rows that I want, and then extract the columns values from the giant string buffer, convert them to the correct data type and output them.
So, why is it so slow? Because simple/naive string accumulation is essentially a Triangular operation. Consider this:
row 001, add 'A': @b-2 = '' + 'A'; cost of this row is 1
row 002, add 'B': @b-2 = 'A' + 'B'; cost of this row is 2
row 003, add 'C': @b-2 = 'AB' + 'C'; cost of this row is 3
row 004, add 'D': @b-2 = 'ABC' + 'D'; cost of this row is 4
...
and the cost of the millionth rows is 1,000,000. Add up the cost of every row to get the total cost and you get 5,000,005,000,000, which is a very big number indeed (five trillion, five million). And this was the motivation for the much more complicated stuff that you see in the optimized version.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 12:07 pm
The second performance problem that the unoptimized version has is that the second select is much slower than I thought it would be. Because I know that the conversion from character string to INT is a high overhead operation, I figured to circumvent this problem by using binary strings instead of character strings.
Converting to or from binary is really just a memcopy operation with no conditions or calculations necessary, with a different datatype applied to the output, so it should be much more efficient. And that is why the optimized version uses binary strings everywhere. This does help with the efficiency of that step, but not nearly as much as I had hoped.
And this still does nothing for the Triangular nature of string accumulation...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 3:27 pm
I get it now. Great explanation. So you've also found that it's quicker to dump repeatedly into the slightly bigger varbinary's than just dumping into one big one whenever you pass a certain number?
January 25, 2009 at 4:59 pm
Garadin (1/25/2009)
I get it now. Great explanation. So you've also found that it's quicker to dump repeatedly into the slightly bigger varbinary's than just dumping into one big one whenever you pass a certain number?
Exactly Seth. Consider the following simplified examples, first unoptimized:
Row Step Row Step Total
Cost Cost Cost
1 a 1 1 @b-2 =''+'A' 1
2 a 2 2 @b-2='A'+'B' 3
3 a 3 3 @b-2='AB'+'C' 6
4 a 4 4 @b-2='ABC'+'D' 10
5 a 5 5 @b-2='ABCD'+'E' 15
6 a 6 6 @b-2='ABCDE'+'F' 21
7 a 7 7 @b-2='ABCDEF'+'G' 28
8 a 8 8 @b-2='ABCDEFG'+'H' 36
Now a slightly optimized version:
Row Step Row Step Total
Cost Cost Cost
1 a 1 1 @b-2 =''+'A' 1
2 a 2 2 @b-2='A'+'B' 3
3 a 3 3 @b-2='AB'+'C' 6
4 a 4 @b-2='ABC'+'D'
b 4 @b2=''+'ABCD'
c 8 0 @b-2='' 14
5 a 1 1 @b-2=''+'E' 15
6 a 2 2 @b-2='E'+'F' 17
7 a 3 3 @b-2='EF'+'G' 20
8 a 4 4 @b-2='EFG'+'H'
b 8 @b2='ABCD'+'EFGH'
c 12 0 @b-2='' 32
Now this improvement may seem small (10%), but this is greatly magnified by larger numbers, and then multiplied by stacking them up. The optimizations that I implemented for this improved the overall string performance by 100 to 1000x.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 5:11 pm
I should also add, that my previous numbers were incorrect (left a debugging limiter in), the total time is 30 seconds, compared to Jeffs 7+ seconds.
Only 8 seconds are in the string accumulator, the other 22 seconds are (suprisingly) in the row enumerator SELECT. Using varbinary helped some, but not nearly enough and at this point, I cannot figure out what is taking it so long.
Any ideas, Jeff?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
January 25, 2009 at 7:12 pm
This is basically gap finding, and it seems like you should be able to use a variation of Jeff's identity gap finding technique for this. I've been trying to adapt it to fit this, but keep having to make too many references to the table to do it. I feel like I'm overlooking something basic.
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply