September 17, 2009 at 10:06 pm
Greg Snidow (9/17/2009)
Ok, this might be a dumb question, but if I specify the physical order of the records on the insert, then why is the index so vital? If I select * from this table a billion times, is there ever a chance that it will return the records in any other order than the physical order, in the absence of an index? Is it that without the index, the update statement does not necessarily run by physical order? Sorry for asking so many questions on this one, but this little baby always intrigues me, and I always look for chances to use it. I can always manage to get it to work by trying everything until it does, but I still only "get it" just a little.
SELECT operates differently than UPDATE does and you cannot trust any natural or index order with SELECT. If you want to guarantee SELECT will return something in the correctly, you MUST use ORDER BY even if the rows were originally inserted using an ORDER BY.
The 3 part Quirky Update, when properly formed with the necessary clustered index and a couple of other things including the prevention of parallelism, will always do an update in the same order as the clustered index even if you fool around with CTE's to try to force an actual ORDER BY.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 17, 2009 at 10:52 pm
Jeff Moden (9/17/2009)
Another of those rules is that it must be a single table update. By that I mean, no joins... period.
That was going to be my next question. I tried doing it without the temp table, and failed miserably. Now I know. Thanks, Jeff.
Greg
_________________________________________________________________________________________________
The glass is at one half capacity: nothing more, nothing less.
September 17, 2009 at 10:56 pm
Heh... I'm actually worse... I just don't let people put CLR's in my systems. It's not that I'm against them... it's that they're not worth having to go through two different languages and maintaining source control for both the source code and the DLL especially since the T-SQL solutions frequently beat them for speed.
Yes, there are a few places where a CLR will beat T-SQL... RegEx replace is one of them and, if done correctly, splitting strings is another. But the differences in performance are so relatively small that I just can't justify the use of CLR's.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2009 at 1:45 am
Garadin (9/17/2009)
If anyone reading this knows of a high performance method other than CLR/Quirky Update/Cursor/Loop to do iterative logic like this, I'd love to hear about it.
How about a recursive CTE ? I think this has the correct output , if not let me know.
Also i would be very interested in how this performs against a large record set
using this index
CREATE CLUSTERED INDEX idxmultirecs ON multrecs (acct,ndt)
I see index seeks all the way through so it *should* be pretty damn fast
with cteFirstDates(acct,ndt)
as(
Select acct,min(ndt)
from multrecs
group by acct
)
,
CteRecurs(acct,ndt)
as
(
Select multrecs.acct,multrecs.ndt
from cteFirstDates
join multrecs
on multrecs.acct = cteFirstDates.acct
and multrecs.ndt = cteFirstDates.ndt
union all
Select acct,ndt
from (
Select multrecs.acct,multrecs.ndt,rown =row_number() over (partition by multrecs.acct order by multrecs.ndt)
from CteRecurs
join multrecs
on multrecs.acct = CteRecurs.acct
and multrecs.ndt >= CteRecurs.ndt + 38
) as x
where rown = 1
)
select * from CteRecurs
order by acct,ndt
September 18, 2009 at 6:54 am
That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2009 at 8:22 am
Jeff Moden (9/18/2009)
That works but be careful... recursive CTE's are as slow as a While Loop in most cases and are actually a form of RBAR.
1st off apologies to the OP only just noticed what forum this is in :doze:
I agree that it isnt a perfect set based solution , but any set where you are wanting to include / exclude rows ( and to a certain extent the row data)
based upon data from previous rows will have to be RBAR somewhere.
My 'gut' feeling is it should be light on the system than a quirky update though.
September 18, 2009 at 8:47 am
Dave Ballantyne (9/18/2009)
My 'gut' feeling is it should be light on the system than a quirky update though.
My gut feeling is that you're wrong :-D. But that's what we have testing for. I'd test it, but first I have to make up a couple million rows of suitable test data. Got any scripts for that lying about by chance?
Also, thank you for posting another method for this. Although it is a 2005+ only method, and is technically RBAR, it's one more thing to keep in mind when coming to these iterative logic problems, and is another use(although whether or not it is a 'good' use remains to be seen) for a recursive CTE.
September 18, 2009 at 9:13 am
Your wish is my command 🙂
Happy to be proved wrong , evidence either way is good in my book.
You just need to manually update the acct value ,
and keep inserting values till happy 😀
also note that this will produce increments of between 1 and 11 days
so adjust your update to be +4 (or something else rather than the specified 38)
Create View vwnewId
as
Select new_id = newId()
go
drop table #datelist
go
Create table #datelist
(
numskip integer,
date datetime
)
go
delete from #datelist
;with ctenums(number)
as(
select number from
master..spt_values
where
TYPE='p' and number between 0 and 400
)
insert into #datelist(numskip)
Select abs(checksum(new_id))%10
from ctenums
cross join vwnewid
go
declare @Date datetime
Select @Date = '01jan1970'
update #datelist
set Date=@Date,
@Date = @Date+numskip+1
delete from multrecs where acct = 3
insert into multrecs (acct,cdt,val,ndt)
select 3,'01jan1970','test',date
from #datelist
September 18, 2009 at 9:35 am
Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.
(No criticism , just observation) 🙂
September 18, 2009 at 9:40 am
Dave Ballantyne (9/18/2009)
Also, with the quirk update , although you have specified index(0) are the rows guaranteed to come back in the required order ? . If you joined out to another table , things could go wrong.(No criticism , just observation) 🙂
There's a few posts earlier in the thread from Jeff directly explaining the index(0) thing and why we won't use it anymore, but the gist of it is that you are guaranteed the row order (assuming you've used the clustered index to create this order) on a single table.
You're absolutely right that Joins present a problem, and AFAIK, you cannot use joins at all. I don't believe you can use a WHERE clause either, but I'm not sure on that. The logic for these updates should be in the case statement.
September 18, 2009 at 9:43 am
Dave Ballantyne (9/18/2009)
Your wish is my command 🙂Happy to be proved wrong , evidence either way is good in my book.
You just need to manually update the acct value ,
and keep inserting values till happy 😀
also note that this will produce increments of between 1 and 11 days
so adjust your update to be +4 (or something else rather than the specified 38)
Create View vwnewId
as
Select new_id = newId()
go
drop table #datelist
go
Create table #datelist
(
numskip integer,
date datetime
)
go
delete from #datelist
;with ctenums(number)
as(
select number from
master..spt_values
where
TYPE='p' and number between 0 and 400
)
insert into #datelist(numskip)
Select abs(checksum(new_id))%10
from ctenums
cross join vwnewid
go
declare @Date datetime
Select @Date = '01jan1970'
update #datelist
set Date=@Date,
@Date = @Date+numskip+1
delete from multrecs where acct = 3
insert into multrecs (acct,cdt,val,ndt)
select 3,'01jan1970','test',date
from #datelist
Awesome, I appreciate the code. Let the battle of the guts commence!
September 18, 2009 at 11:15 am
And the results are in. It probably wasn't a fair contest to begin with. You were going up against a gut accustomed to eating 2 day old Taco Bell burritos without flinching.
Quirky Method No DOP set, Messages:
(2000000 row(s) affected)
(1 row(s) affected)
Update Done. Time Elapsed: 28 seconds
(64202 row(s) affected)
(1 row(s) affected)
Select to Table Done. Total Time Elapsed: 30 seconds
(1 row(s) affected)
Final Row Count, Quirky Method: 64202
Quirky Method MAXDOP 1, Messages:
(2000000 row(s) affected)
(1 row(s) affected)
Update Done. Time Elapsed: 12 seconds
(64202 row(s) affected)
(1 row(s) affected)
Select to Table Done. Total Time Elapsed: 15 seconds
(1 row(s) affected)
Final Row Count, Quirky Method:64202
CTE Method Messages:
(71705 row(s) affected)
(1 row(s) affected)
Select to Table Done. Total Time Elapsed: 170 seconds
(1 row(s) affected)
Final Row Count, CTE Method: 71705
Notes:
1. I simplified your test data script a bit. I created 2 Million rows of test data.
2. In the event that there are two items with exactly the same date that meet the criteria, the Recursive CTE solution includes both, where as the Quirky update does not.
3. I picked a dumb name for the Clustered index (Quirky), because it's used by both methods. Because I'm including the .sqlplans, I didn't rename it post testing.
4. I select these into tables to take screen display times out of the equation.
5. I haven't completely researched the difference in rows returned. I'm sure some is due to the duplicates, but I'm not sure it all is.
6. It's possible that both methods could be optimized further. The Quirky update actually took a lot longer than I expected it to, which probably means I did something wrong.
7. I forgot the MAXDOP the first time around. The second time I added it and re-ran. 15 seconds instead of 30.
Test Data Setup
IF OBJECT_ID('multrecs','u') IS NOT NULL DROP TABLE multrecs
CREATE TABLE multrecs(
acctint,
ndtdatetime
)
DECLARE @i int,
@Acct int
SET @i = 0
WHILE @i 400 records due to duplication.
INSERT INTO multrecs(acct,ndt)
SELECT @Acct, DATEADD(D,ABS(CHECKSUM(NEWID())%10000),'01/01/1970') -- Generates a number 0-10000 and adds this number of days to 1970. This is unsequenced data.
FROM Tally
WHERE N <=400
SET @i = @i + 1
CONTINUE
END
ALTER TABLE multrecs ADD display bit
CREATE CLUSTERED INDEX Quirky ON multrecs (acct, ndt) -- Silly Name for the index.
Quirky Update Code Used:
-- Stop and Start SQL Server Service to completely clear memory.
DECLARE @PrevAcctbigint,
@LDNDT datetime, -- Last Displayed NDT
@Display bit, -- Even though you don't use it, this method won't function without the 3 part update
@timerdatetime,
@FRCint
SET @timer = GETDATE()
UPDATE multrecs
SET @Display = Display = CASE WHEN Acct = @PrevAcct
AND DATEDIFF(d,@LDNDT,NDT) < 150
THEN 0
ELSE 1
END,
@LDNDT =CASE WHEN Acct = @PrevAcct
AND DATEDIFF(d,@LDNDT,NDT) = CteRecurs.ndt + 150
) as x
where rown = 1
)
select *
INTO CTE
from CteRecurs
order by acct,ndt
PRINT 'Select to Table Done. Total Time Elapsed: ' + CAST(DATEDIFF(s,@Timer,GETDATE()) AS varchar(10)) + ' seconds'
SET @FRC = (SELECT COUNT(*) FROM CTE)
PRINT 'Final Row Count, CTE Method:' + CAST(@FRC AS varchar(10))
September 18, 2009 at 11:24 am
Taking this one step further, if anyone wants to supply the script for a CLR function(and possibly explain to me how to use it) or any other alternate method they think might beat this, I'd love to test them.
September 18, 2009 at 12:18 pm
Turn off the execution plan and see what you get.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 18, 2009 at 1:08 pm
Jeff Moden (9/18/2009)
Turn off the execution plan and see what you get.
Without the execution plan, 12 seconds for quirky with MAXDOP 1, 151 for the RCTE. I ran them a few times each with similar results. I also ran them without stopping/starting the server, which actually made almost no difference, probably because this is a workstation with only like 2 gigs of memory.
Viewing 15 posts - 16 through 30 (of 70 total)
You must be logged in to reply to this topic. Login to reply