September 28, 2007 at 9:01 am
I have the following deduping query which takes a long time to run and I'm sure there is a better way to write it. The query is run everyday after all of the new records have been entered for the day. There are 2 Million rows in the table and 3K-5K added each day, some through batch import and some through data entry. Any ideas are appreciated.
update tb_leadbox set invalidflag=6 where
uid2 not in
(select uid2 from (
select min(uid2)as uid2,fname,lname ,homephone from tb_leadbox where datediff(dd,entrydate,getdate()) <=365
group by fname,lname,homephone
) t2) and invalidflag <> 5 and datediff(dd,entrydate,getdate()) <=365
tb_leadbox has the following fields pertaining to this query
uid (uniqueidentifier, Not Null) PK
uid2 (numeric(18,0),Not Nill) identity
entrydate (datetime, Null)
fname (varchar(50), Null)
lname (varchar(50), Null)
homephone (varchar(12), Null)
invalidflag (int, Null)
tb_leadbox has the following indexes pertaining to this query
uid2 unique non clustered
uid2, invalidflag unique non clustered
homephone, workphone, invalid flag non clustered
lastname, invalidflag non clustered
homephone, uid2 non clustered
invlaidflag non clustered
fname,lname,homephone non clustered
fname non clustered
homephone non clustered
September 28, 2007 at 9:09 am
September 28, 2007 at 9:37 am
A couple thoughts on this.
First, if you can eliminate functions, you can really speed things up. Since you're looking for stuff compare to today, you might be better off calculating the date that matters, meaning if something's older than today - a year, then update it. That will get you a quick search on that field. The function reduces the optimizer using the index on date.
Second, how many flags are there? Can you do an IN(1, 2, 3,4 ) instead of 5? The usually results in a scan of all data.
Lastly, not sure of your other queries, but if you can combine any indexes into a compound index, then you can gain a couple efficiencies. First less indexes to update and second, they might make some queries faster. Need to be sure you aren't making others slower though.
September 28, 2007 at 10:13 am
ok now I have this.
update tb_leadbox set invalidflag=6 where
uid2 not in
(select uid2 from (
select min(uid2)as uid2,fname,lname ,homephone from tb_leadbox where entrydate >= dateadd(dd,-365,getdate())
group by fname,lname,homephone
) t2) and invalidflag in (0,1,2,3,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30) and entrydate >= dateadd(dd,-365,getdate())
when I run the execution plan on the old and new together the new takes 49.63% and the old takes 50.37%. There are 5 table spools that occur which each take 17%.
September 28, 2007 at 10:48 am
Get rid of the function.
declare @dt datetime
select @dt = dateadd( d, -365, getdate())
update
...
and entrydate < @dt
That gets you a scalar. It's possible that you can't get this to run better because of the distribution of data. It might be that all of the data is mostly not 5s, so it scans anyway.
Not sure what else to change. That min() clause is probably eating up resources, but not sure how to handle that one.
September 29, 2007 at 11:06 pm
Hey there PoleCat... long time no "see"...
Is it the goal of this bit of code to update all but the "first" row for each customer what has been entered within the last year having any other code than 5 to a code 6?
Also, what do you mean by "long time"? I just ran your code on a 2 million row test table... took 12 seconds to find and update 6400 rows... how many are you updating?
Last but not least, the column you are updating appears in 4 different indexs... those take time to update... you sure you can't knock a few of those out?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 1, 2007 at 2:48 am
Hi Mr Polecat
Do you need columns other than uid2 in your inner derived table?
update tb_leadbox
set invalidflag=6
where uid2 not in
(select min(uid2) as uid2
from tb_leadbox
where entrydate >= dateadd(dd,-365,getdate())
group by fname,lname,homephone
)
and invalidflag in (0,1,2,3,4,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30)
and entrydate >= dateadd(dd,-365,getdate())
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 1, 2007 at 10:15 am
Jeff Moden (9/29/2007)
Hey there PoleCat... long time no "see"...
compulsive personality. I see an article that interests me and stick around answering questions for a few days then find something else to do.
Is it the goal of this bit of code to update all but the "first" row for each customer what has been entered within the last year having any other code than 5 to a code 6?
Yes
Also, what do you mean by "long time"? I just ran your code on a 2 million row test table... took 12 seconds to find and update 6400 rows... how many are you updating?
1 hour + to update 11K but it is a really old server, 512MB ram and 2 300Mhz processors (yes that's an M not a G)
Last but not least, the column you are updating appears in 4 different indexs... those take time to update... you sure you can't knock a few of those out?
Everytime a new report/process is built a new index is created to speed up that report/process. Yes we could stand to lose a few but I'm not sure how it would effect other users. I am going to try rebuiolding the indices and see if that helps. I've always blamed the code but if it ran ok on your server maybe I need to look elsewhere.
October 1, 2007 at 2:04 pm
October 1, 2007 at 8:33 pm
really old server, 512MB ram and 2 300Mhz processors (yes that's an M not a G)
I'm thinking that you've squeezed just about all the speed you're gonna get out of that... might be a few tricks left but I'm not sure what they might be...
fyi... NOT IN is gonna be a tiny bit faster than an outer join because is has one less step in execution plan.
I think the big problem is all the reporting indexes you mentioned... and, any triggers on the table?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 2, 2007 at 12:15 am
UPDATEy
SETy.InvalidFlag = 6
FROMtb_LeadBox AS y
LEFT JOIN(
SELECTMIN(UId2) AS UId2
FROMtb_LeadBox
WHEREEntryDate >= DATEADD(YEAR, -1, GETDATE())
GROUP BYFName,
LName,
HomePhone
) AS x ON x.UId2 = y.UId2
WHEREx.UId2 IS NULL
AND y.InvalidFlag IN (0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
AND y.EntryDate >= DATEADD(YEAR, -1, GETDATE())
N 56°04'39.16"
E 12°55'05.25"
October 2, 2007 at 7:19 am
I ran this test changing my update to a select
--old query with Steve's suggestion for the date parameter and Chris's suggestion to remove one of the nests.
begin
declare @dt datetime
select @dt = dateadd( d, -365, getdate())
select uid2,invalidflag from tb_leadbox where
uid2 not in
(
select min(uid2)as uid2 from tb_leadbox where entrydate >= @dt
group by fname,lname,homephone
) and invalidflag <> 5 and entrydate >= @dt
end
-- took 1 hr 50 minutes
--new query with join similar to Peter's
begin
declare @dt datetime
select @dt = dateadd( d, -365, getdate())
select t1.uid2,invalidflag
from tb_leadbox t1
left join
(select min(uid2)as uid2 from tb_leadbox where entrydate >= @dt
group by fname,lname,homephone) t2
on t1.uid2=t2.uid2
where t2.uid2 is null and invalidflag <> 5 and entrydate >= @dt
end
--took 1 minute 11 seconds
Quite a perfomance boost for the join
I will test the update next.
Jeff- yes there is one trigger on update.
October 2, 2007 at 7:33 am
mrpolecat (10/2/2007)
.
.
.
-- took 1 hr 50 minutes
--new query with join similar to Peter's
.
.
.
--took 1 minute 11 seconds
Quite a perfomance boost for the join
No kidding Mr Polecat, nice result.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 2, 2007 at 7:43 am
October 2, 2007 at 6:15 pm
Perfect... Nice "kick in" Peter... I'm gonna go back to sucking my thumb now... 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply