May 19, 2004 at 10:31 am
I am attempting to update a table (21 million rows) with information from a much smaller table (250k rows).
When I run my query I end up with the process taking an inordinate amount of time, which leads me to believe that what I am doing is EXCEPTIONALLY inefficient.
Below is a sample of the update query that I wrote, and what I would like to know is if anyone can shed some light on how I might be better able to do it.
Information on the tables:
Q12004 is Indexed on Column COL035(contains a phone number) (Char (10)
ADNH has PK on Column Phone (Char (10)
Query :
update q12004
set q12004.batch = adnh.batch
from adnh, q12004
where adnh.phone = q12004.col035
This "runs" but is ridiculous in the execution time, anyone know of a better method to do this?
May 19, 2004 at 10:52 am
Run your query through the Estimated Execution Plan checker in Query Analyzer and then run this:
update q12004
set q12004.batch = adnh.batch
from adnh
join q12004
on adnh.phone = q12004.col035
See if there is any difference in using a join/on instead of the join/where.
-SQLBill
May 19, 2004 at 12:28 pm
Almost no difference at all that I could see.
I'm trying to run it using the code that SQLBill has written and it doesn't seem to be exceptionally different than what I had before.
I'm making the assumption that if we do a join for adnh (250k records) and select ONLY those records which match in Q12004 (21 mill records) that so long as the columns are indexed and match the update should be reasonably quick.
Is there possibly a different way to write this particular query to force the joins in a more efficient manner?
May 19, 2004 at 12:32 pm
Should be, but look at the execution plan and see what it has decided the query as written would use for best performance.
May 19, 2004 at 5:32 pm
You never mentioned how long it was taking for the update...
My experience for these two table type of single column update joins says you'll get about 4 million rows per minute on a "quiet" server if both tables have the correct indexes. Longer if the server is "busy".
--Jeff Moden
Change is inevitable... Change for the better is not.
May 20, 2004 at 3:41 am
Initial update time was 21 hours. On a screaming machine which is why I was under the belief that I really didn't have something right. (p4 2.8E/2 GB RAM/1 TB storage in SATA drives) And it does nothing else but service my requests.
I'd be ecstatic to have this thing done in under 10 minutes.
May 20, 2004 at 5:59 am
Test how long it takes to do the join without the update:
SELECT * from adnh
join q12004
on adnh.phone = q12004.col035
If that's efficient, check to see if q12004.batch column is contained in an index. If it is and you can remove it (without affecting your production, apps, etc.) do that and recreate that index after the update. If the above join is inefficient if you make changes to tune it then it should fix your update performance?
Definitely only the matched rows (250K or less from the inner join) will get updated.
Good luck!
Jeff
May 20, 2004 at 6:57 am
Check the defragmentation of the table using of DBCC SHOWCONTIG.
In Scan Density [Best Count:Actual Count], Best Count and Actual count should be almost equal...
HTH...
Regards,
Sachin Dedhia
May 20, 2004 at 1:27 pm
A simple man's solution that will run faster that I use if your indexes are not speeding up the process is to do a left join as follows and create a new table:
select q12004.*, coalesce(adnh.batch, null) as batch
into new_table_name
from q12004
left join adnh on q12004.col035 = adnh.phone
from adnh, q12004
where adnh.phone = q12004.col035
May 20, 2004 at 5:37 pm
Indexes a must. Sometimes filtering data will speed things up and order of tables first. Col035 and phone should both be indexed. You can add a query hint with the index name for each table and that would really speed it up.
update q12004
set q12004.batch = adnh.batch
from q12004 inner join adnh
on q12004.col035 = adnh.phone
Where q12004.col034 in (select phone from adnh where q12004.col035 = adnh.phone)
May 21, 2004 at 12:51 am
Is your table a heap?
Could you post the extimated execution plan?
Have you monitored Sql Server activity, including the Freespace scans/sec counter from the SQL Server:Access Methods?
Salvor
May 21, 2004 at 1:32 am
What is the data type of the batch column?
May 27, 2004 at 12:44 pm
Please confirm your indexes on both tables:
Does each index contain just a phone number or do the indexes contain mutiple columns and if so, what are the columns?
Are the indexes both primary keys?
Are both indexes are clustered?
How mush faster is the query without the update?
Since these phone number columns contain just numbers, why not make them integers? That will be faster than dealing with varchars.
Eaton
May 27, 2004 at 2:51 pm
Please run in query analyzer the following:
set showplan_text on
go
update q12004
set q12004.batch = adnh.batch
from adnh
join q12004
on adnh.phone = q12004.col035
go
And post the output back here. This will show us your execution plan and will help us to understand what it is doing.
Also, do the following
select count(*)
from adnh
join q12004
on adnh.phone = q12004.col035
and let us know the result and how many indexes are on the table being updated. In some cases (usually rare) if ou are updating a lot of rows then you may be suffering a performance hit because of a larger number of indexes and page splits in them.
In some case it may make more since to drop some indexes before doing the update then add them back.
Sometimes you may need to batch the query until complete like this
Declare @rows int
SET @rows = 1
set rowcount 10000
WHILE @rows < 0
BEGIN
update q12004
set q12004.batch = adnh.batch
from adnh
inner join q12004
on adnh.phone = q12004.col035
where q12004.batch != adnh.batch
set @rows = @@rowcount
END
Note: If the same adnh.phone exists multiple times with different adnh.batch values then you cannot use that method. This method helps with log file growth do to logged changes.
If you do have the same adnh.phone exists multiple times with different adnh.batch values and there is a way to determine the last occurance then you may want to subquery first so you decrease the data changes to the same column.
EX.
update q12004
set q12004.batch = adnh.batch
from adnh
inner join q12004
on adnh.phone = q12004.col035
inner join
(select X.batch, max(X.oper_date) maxdate from adnh X group by X.batch) Y
ON
adnh.oper_date = Y.maxdate
Not sure without more details what may be best.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply