September 22, 2009 at 11:16 pm
RBarryYoung (9/22/2009)
Anyhow, here it is, note it probably still needs that Clustered Key thing for optimal performance
Are you SUUUUURE? I wouldn't want to play any favorites!! 😉
Thanks for posting this Barry. I'm gonna take a look at this in depth tomorrow.
September 25, 2009 at 9:27 am
Seth: any updates on this?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 27, 2009 at 7:58 am
Sorry, I just got back from a cruise to the Bahamas, have been completely unplugged from the world for 4 days now. May take today to 'recover' from my 'vacation', but tomorrow I'll definitely have some new updates.
September 27, 2009 at 3:01 pm
Garadin (9/27/2009)
Sorry, I just got back from a cruise to the Bahamas, have been completely unplugged from the world for 4 days now. May take today to 'recover' from my 'vacation', but tomorrow I'll definitely have some new updates.
Hrrm. I can't honestly say that you're eliciting a lot of sympathy from me here ...
😛
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 28, 2009 at 5:49 am
I concede that my guts can be wrong 😀
But just to add to the debate , this goes some way to resolving quirky update ordering issues.
I havent seen anything quite like this on the web.
Seems ok on 2005, further testing is appreciated.
Drop table mytab
go
create table mytab
(
Idx char(1),
Col1 integer,
Col2 integer
)
go
Create clustered index idx1 on mytab(idx)
go
drop view vw1
go
create view vw1
as
Select top 99999999999 * from mytab
order by col1
go
delete from mytab
go
insert into mytab(Idx,Col1) values('Z',1)
insert into mytab(Idx,Col1) values('Y',2)
insert into mytab(Idx,Col1) values('X',4)
insert into mytab(Idx,Col1) values('W',3)
insert into mytab(Idx,Col1) values('V',6)
go
declare @roll integer
Select @roll =0
update vw1
set @roll = col1+@Roll,
col2 =@roll
go
select * from vw1
select * from mytab
September 28, 2009 at 7:08 am
Or how about this if you cant add a column to the original table.
drop table #mytab2
go
create table #mytab2
(
Idx char(1),
roll integer)
go
insert into #mytab2 (idx,roll)
select idx,NULL
from mytab
go
declare @roll integer
Select @roll =0
;with cte(Idx , Col1 , Col2 , idxupd)
as
(
Select top 9999999 mytab.Idx , Col1 , roll,#mytab2.idx
from mytab join #mytab2
on mytab.idx = #mytab2.idx
order by col1
)
update cte
set @roll = col1+@Roll,
col2 =@roll,
Idxupd = idx
output inserted.Idxupd,inserted.col2
September 28, 2009 at 8:30 am
Dave Ballantyne (9/28/2009)
this goes some way to resolving quirky update ordering issues.
Heh... what ordering issues?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 28, 2009 at 8:47 am
Jeff Moden (9/28/2009)
Heh... what ordering issues?
Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.
September 28, 2009 at 9:31 am
Dave Ballantyne (9/28/2009)
Jeff Moden (9/28/2009)
Heh... what ordering issues?
Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.
You have to trust me on this... if you don't have a clustered index in the order you want to update the data, it will fail in a most unpredictable manner. Further, if the clustered index engages during the update (and it frequently does despite your best attempts), it will still update in clustered index order.
If you don't have the correct clustered index on the original table, the only safe way to use the quirky update is to copy the data to a Temp Table and put the correct one on that.
I haven't quite finished the article yet (trying to trim it down from 26 MS Word pages), but I have the proof of the failures I'm talking about in the form of demonstrable code in the article.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 28, 2009 at 9:40 am
Dave Ballantyne (9/28/2009)
Jeff Moden (9/28/2009)
Heh... what ordering issues?
Well , just that on your base table , you dont ( disclaimer More testing needed) need a clustered index in the order you want to update the data.
Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.
September 28, 2009 at 10:08 am
Ok, so I had some time back at my machine to test your solution Barry. It comes very close to tying the quirky update, but doesn't *quite* manage it. (Once I slightly adjusted your script to match this scenario) I ran the time trials without the execution plan then re-ran to generate it. It is attached. One interesting thing I found is that yours improves slightly when re-running it, where as the quirky update stays approximately the same on repeated runs. It also returns the exact same number of rows. Very close second for those who do not "trust" the 3 part update. Thanks for generating this Barry, I'm going to see if I can get that varbinary code I created the other day working to see if it comes close to this.
Barry's Method
XML Built . Time Elapsed: 16 seconds
(64202 row(s) affected)
Select to Table Done. Total Time Elapsed: 18 seconds
Final Row Count, Barry Pseudo-XML Method:64202
Modified Code:
DECLARE @timerdatetime,
@FRCint
SET @timer = GETDATE()
--create clustered index ix_acct_ndt on MR (acct, ndt)
declare @xml varchar(MAX), @acct int, @nextdate DATETIME, @datstr as varchar(80)
select @xml = '', @acct = -1, @nextdate = '1900-01-01', @datstr = ''
Declare @i int, @z int, @rowstate int
Declare @S16 Varchar(MAX), @S14 Varchar(MAX), @S12 Varchar(MAX), @s10 Varchar(MAX)
Select @i=0, @S16='', @S14='', @S12='', @s10=''
select
@z = ROW_NUMBER() OVER(ORDER BY acct,ndt),
@rowstate = CASE WHEN @acct <> acct THEN 2 WHEN ndt >= @nextdate THEN 1 ELSE 0 END,
@nextdate = CASE @rowstate WHEN 0 THEN @nextdate ELSE DATEADD(day, 150, ndt) END,
@i = CASE @rowstate WHEN 0 THEN @i ELSE @i+1 END,
@datstr = CASE @rowstate WHEN 0 THEN '' ELSE '<r a="' + convert(varchar(20), acct) + '" t="' + convert(varchar(10), ndt, 121) + '"/>' END,
@S16 = CASE WHEN @rowstate=0 THEN @S16 WHEN @i%65536=0 THEN @S16+@S14+@S12+@S10+@xml+@datstr Else @S16 END,
@S14 = CASE WHEN @rowstate=0 THEN @S14 WHEN @i%65536=0 THEN '' WHEN @i%16384=0 THEN @S14+@S12+@S10+@xml+@datstr Else @S14 END,
@S12 = CASE WHEN @rowstate=0 THEN @S12 WHEN @i%16384=0 Then '' WHEN @i%4096=0 THEN @S12+@S10+@xml+@datstr Else @S12 END,
@s10 = CASE WHEN @rowstate=0 THEN @s10 WHEN @i%4096=0 Then '' WHEN @i%1024=0 THEN @s10+@xml+@datstr Else @s10 END,
@xml = CASE WHEN @rowstate=0 THEN @xml WHEN @i%1024=0 THEN '' ELSE @xml+@datstr END,
@acct = acct
from MR
ORDER BY acct,ndt
SELECT @xml = @S16+@S14+@S12+@S10+@xml
Declare @x as XML
select @x = '<d>'+@xml+'</d>'
PRINT 'XML Built . Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'
SELECT acct, ndt38
INTO #3
FROM (
SELECT
T.c.value('@a[1]', 'int') AS [acct],
T.c.value('@t[1]', 'varchar(24)') AS [ndt38]
FROM @x.nodes('/d/r') AS T(c)
) T
--where RIGHT(ndt38,1)='2'
PRINT 'Select to Table Done. Total Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'
SET @FRC = (SELECT COUNT(*) FROM #3)
PRINT 'Final Row Count, Barry Pseudo-XML Method:' + CAST(@FRC AS varchar(10))
[Edit] Issue posting the .sqlplan, will post when able.
[Edit2] Appears it is an issue with Chrome.
September 28, 2009 at 3:54 pm
I realized after the fact that it might have been a little unclear as to why I didn't believe Barry's solution didn't *beat* the quirky update, as the quirky takes 19 seconds when it has to create the temp table vs. Barry's 18.
The reason for this is that there are only 2 reasons why you need to make the temp table in the first place.
1. The big one. The clustered index. It's not realistic to assume that the clustered index on your table will always be the one needed for the quirky update.
2. The inability to add any fields to the base table. This is (in my opinion) a much smaller reason and one that won't come into play in most situations. The field required by the quirky update is a bit field which would either be a very minimal size increase or (depending on the other fields in the table) might not actually increase the size of the table at all.
The first reason seemed moot, as Barry's solution used the same clustered index to achieve those performance numbers. The second is minor, but it is a valid reason in some circumstances and his method would win by a few seconds in the situation if the clustered index was correct on the table but you were not allowed to add the field. When I said it almost beat it, I was comparing it to the quirky method sans temp table, where it was only 2-3 seconds behind.
Now, I decided to take my own advice and completely re-create test data in a new table and never index it. The results were... surprising. Note that because the data is now completely unindexed, a temp table is *required* for the quirky update. I'm honestly shocked at how fast Barry's method runs without any index available.
Quirky Method (w/Temp Table)
(2000000 row(s) affected)
Temp Table Insert Done. Time Elapsed: 12 seconds
(2000000 row(s) affected)
Update Done. Time Elapsed: 40 seconds
(64425 row(s) affected)
Select to Table Done. Total Time Elapsed: 42 seconds
Final Row Count, Quirky Method:64425
Barry Method: No Clustered Index.
(1 row(s) affected)
XML Built . Time Elapsed: 20 seconds
(64425 row(s) affected)
(1 row(s) affected)
Select to Table Done. Total Time Elapsed: 23 seconds
(1 row(s) affected)
Final Row Count, Barry Pseudo-XML Method:64425
Original CTE Method: No Clustered Index.
I re-ran this as well but cut it off at 2m 30s as it still wasn't a contender.
Barry's method appears to win hands down when the data is not ordered to begin with. What is confusing me (and why I can't figure out if I did something wrong or not) is that the quirky update itself seems to take significantly longer even after I've inserted the data into the temp table. (30 seconds for 2M rows? It seems like I did something wrong, I'm just not seeing it.)
Execution Plans and code attached. If anyone spots a mistake, please let me know.
September 28, 2009 at 4:14 pm
Garadin (9/28/2009)
Ok, so I had some time back at my machine to test your solution Barry. It comes very close to tying the quirky update, but doesn't *quite* manage it. (Once I slightly adjusted your script to match this scenario) I ran the time trials without the execution plan then re-ran to generate it. It is attached. One interesting thing I found is that yours improves slightly when re-running it, where as the quirky update stays approximately the same on repeated runs. It also returns the exact same number of rows. Very close second for those who do not "trust" the 3 part update. Thanks for generating this Barry, I'm going to see if I can get that varbinary code I created the other day working to see if it comes close to this...
I am actually quite surprised that it did as well as it did, and I have to give credit to Richard Fryar for his "string to XML" idea which seems to have fixed one of the major problems that my previous varbinary version had.
However, even though it may have worked well in this case, I still do not believe that it will scale as well as Jeff's solution. Even with all of the tricks that I employ in this approach, it is still basically filling up memory with a gigantic string, which works great, Until it runs out of memory. At that point it's performance will very quickly get 10x or even 100x worse, Not Good.
That's why I originally went from a Varchar(MAX)/NVarchar(MAX) approach to a Varbinary(MAX) one, it's probably the most efficient way to pack together are boatload of concatenated data. Unfortunately, the part of that that I though would be super-fast: extracting out the arrayed data using SUBSTRING and fixed-size matrix offset calculations, turned out to be terribly slow (I still don't know for sure why, but apparently big MAX-type strings cannot just be accessed with simple direct-memory offsets, duh (I should have realized that)).
Anyway, apparently, putting it into an XML-structured variable fixes that (so far) as the extracts times have been really good.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
September 29, 2009 at 2:57 am
Garadin (9/28/2009)
Also, if you decide not to take Jeff's word for it and want to test this yourself, make sure you don't insert the data in the exact order it needs to be in.
In my little test scripts i did post the data in a 'random' order and then created the clustered index so it was in a different order. 🙂
Anyway Jeff , im sure you've been there done it and got the T-Shirt 😀 and look forward to your completed document , but , i really am struggling to get incorrect results. Ive scaled up by test to be on adventureworks and no matter what order or filtering i use then my Cte-Quirky update returns the same as my 'control' cursor. Must get on with some 'real' work now 😉
Drop table #Balance
go
drop table #CurBalance
go
drop index Sales.SalesOrderHeader.idxOrderDate
go
create index idxOrderDate on Sales.SalesOrderHeader(Orderdate) include(SalesOrderId,SalesPersonId,subtotal,taxamt,freight)
go
create index idxSalesPerson on Sales.SalesOrderHeader(SalesPersonId) include(SalesOrderId,OrderDate,subtotal,taxamt,freight)
go
Create Table #Balance(
SalesOrderId integer,
RollingBalance money
)
Create Table #CurBalance(
SalesOrderId integer,
RollingBalance money
)
insert into #Balance(SalesOrderId,RollingBalance)
select SalesOrderID,NULL
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
where OrderDate between '01jan03' and '01may03'
-- and SOH.SalesPersonID in(276 ,277)
-- CountryRegionCode = 'CA'
order by OrderDate, SOH.SalesPersonID
--order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
--Order by st.CountryRegionCode,OrderDate
Declare @OrderYear integer,
@OrderMonth integer,
@SalesPersonId integer,
@TerritoryId integer,
@CountryRegionCode char(2),
@RollingBalance money
Select @OrderYear = 0
Select @OrderMonth = 0
Select @RollingBalance = 0
Select @SalesPersonId =0
;with cteValue(SalesOrderId,OrderDate,SalesPersonId, TotalDue ,RollingBalance,TerritoryID , CountryRegionCode)
as(
select top 99999999999 SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue,RollingBalance, SP.TerritoryID,CountryRegionCode
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
join #Balance
on #Balance.SalesOrderId = SOH.SalesOrderID
where OrderDate between '01jan03' and '01may03'
-- and SOH.SalesPersonID in(276 ,277)
-- CountryRegionCode ='CA'
--order by OrderDate, SOH.SalesPersonID
order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
-- Order by st.CountryRegionCode,OrderDate
)
update cteValue
set @RollingBalance = case when @SalesPersonId <> cteValue.SalesPersonId or
@OrderMonth <> DATEPART(mm,OrderDate) or
@OrderYear <> DATEPART(yy,OrderDate) or
@TerritoryId <> TerritoryId
then TotalDue
else @RollingBalance +TotalDue end,
RollingBalance = @RollingBalance,
@OrderYear = DATEPART(yy,OrderDate),
@OrderMonth = DATEPART(mm,OrderDate),
@SalesPersonId = cteValue.SalesPersonId,
@TerritoryId = cteValue.TerritoryID,
@CountryRegionCode = cteValue.CountryRegionCode
go
Declare @OrderYear integer,
@OrderMonth integer,
@SalesPersonId integer,
@TerritoryId integer,
@CountryRegionCode char(2),
@RollingBalance money,
@PrevOrderYear integer,
@PrevOrderMonth integer,
@PrevSalesPersonId integer,
@PrevTerritoryId integer,
@PrevCountryRegionCode char(2),
@SalesOrderId integer,
@OrderDate datetime,
@TotalDue money
Select @OrderYear = 0,
@OrderMonth = 0,
@RollingBalance = 0,
@SalesPersonId =0,
@PrevOrderYear = 0,
@PrevOrderMonth = 0,
@PrevSalesPersonId =0,
@PrevTerritoryId =0,
@PrevCountryRegionCode='XX'
declare balancecur cursor for
select SOH.SalesOrderID,OrderDate,SOH.SalesPersonID,TotalDue, SP.TerritoryID,CountryRegionCode
from Sales.SalesOrderHeader SOH
join Sales.SalesPerson SP
on SP.SalesPersonID = SOH.SalesPersonID
join Sales.SalesTerritory ST
on St.TerritoryID = SP.TerritoryID
where OrderDate between '01jan03' and '01may03'
--and SOH.SalesPersonID in(276 ,277)
--CountryRegionCode ='CA'
--order by OrderDate, SOH.SalesPersonID
order by SOH.SalesPersonID ,OrderDate
--Order by st.TerritoryID,OrderDate
--Order by st.CountryRegionCode,OrderDate
open balancecur
while(0=0) begin
fetch next from balancecur into @SalesOrderId,@OrderDate,@SalesPersonId,@TotalDue,@TerritoryId,@CountryRegionCode
if(@@FETCH_STATUS<>0) break
Select @OrderMonth = DATEPART(mm,@OrderDate),
@OrderYear = DATEPART(yy,@OrderDate)
if( @OrderMonth <> @PrevOrderMonth or
@OrderYear <> @PrevOrderYear or
@TerritoryId <> @PrevTerritoryId or
@CountryRegionCode <> @PrevCountryRegionCode or
@SalesPersonId <> @PrevSalesPersonId
) begin
Select @RollingBalance = 0,
@PrevOrderMonth = @OrderMonth,
@PrevOrderYear = @OrderYear,
@PrevSalesPersonId = @SalesPersonId,
@PrevCountryRegionCode = @CountryRegionCode ,
@PrevTerritoryId = @TerritoryId
end
Select @RollingBalance = @RollingBalance + @TotalDue
insert into #CurBalance(SalesOrderId,RollingBalance)
values(@SalesOrderId,@RollingBalance)
end
close balancecur
deallocate balancecur
go
select COUNT(*) from #CurBalance
go
select COUNT(*) from #Balance
go
select COUNT(*) from #CurBalance join #Balance
on #CurBalance.SalesOrderId = #Balance.SalesOrderId
and #CurBalance.RollingBalance= #Balance.RollingBalance
Viewing 15 posts - 46 through 60 (of 70 total)
You must be logged in to reply to this topic. Login to reply