UPDATE Query Question

Question

Post reply

UPDATE Query Question

thgamble1

Ten Centuries

Points: 1212
More actions
May 19, 2004 at 10:31 am

#108048

I am attempting to update a table (21 million rows) with information from a much smaller table (250k rows).
When I run my query I end up with the process taking an inordinate amount of time, which leads me to believe that what I am doing is EXCEPTIONALLY inefficient.
Below is a sample of the update query that I wrote, and what I would like to know is if anyone can shed some light on how I might be better able to do it.
Information on the tables:
Q12004 is Indexed on Column COL035(contains a phone number) (Char (10)
ADNH has PK on Column Phone (Char (10)
Query :
update q12004
set q12004.batch = adnh.batch
from adnh, q12004
where adnh.phone = q12004.col035
This "runs" but is ridiculous in the execution time, anyone know of a better method to do this?

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

SQLBill SSC Guru Points: 51440 More actions · Answer 1

Run your query through the Estimated Execution Plan checker in Query Analyzer and then run this:

update q12004

set q12004.batch = adnh.batch

from adnh

join q12004

on adnh.phone = q12004.col035

See if there is any difference in using a join/on instead of the join/where.

-SQLBill

thgamble1 Ten Centuries Points: 1212 More actions · Answer 2

Almost no difference at all that I could see.

I'm trying to run it using the code that SQLBill has written and it doesn't seem to be exceptionally different than what I had before.

I'm making the assumption that if we do a join for adnh (250k records) and select ONLY those records which match in Q12004 (21 mill records) that so long as the columns are indexed and match the update should be reasonably quick.

Is there possibly a different way to write this particular query to force the joins in a more efficient manner?

Antares686 SSC Guru Points: 125444 More actions · Answer 3

Should be, but look at the execution plan and see what it has decided the query as written would use for best performance.

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 4

You never mentioned how long it was taking for the update...

My experience for these two table type of single column update joins says you'll get about 4 million rows per minute on a "quiet" server if both tables have the correct indexes. Longer if the server is "busy".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

thgamble1 Ten Centuries Points: 1212 More actions · Answer 5

Initial update time was 21 hours. On a screaming machine which is why I was under the belief that I really didn't have something right. (p4 2.8E/2 GB RAM/1 TB storage in SATA drives) And it does nothing else but service my requests.

I'd be ecstatic to have this thing done in under 10 minutes.

Jeff Burton SSCommitted Points: 1551 More actions · Answer 6

Test how long it takes to do the join without the update:

SELECT * from adnh

join q12004

on adnh.phone = q12004.col035

If that's efficient, check to see if q12004.batch column is contained in an index. If it is and you can remove it (without affecting your production, apps, etc.) do that and recreate that index after the update. If the above join is inefficient if you make changes to tune it then it should fix your update performance?

Definitely only the matched rows (250K or less from the inner join) will get updated.

Good luck!

Jeff

Sachin Dedhia SSCrazy Points: 2104 More actions · Answer 7

Check the defragmentation of the table using of DBCC SHOWCONTIG.

In Scan Density [Best Count:Actual Count], Best Count and Actual count should be almost equal...

HTH...

Regards,
Sachin Dedhia

dave.mansfield Newbie Points: 4 More actions · Answer 8

A simple man's solution that will run faster that I use if your indexes are not speeding up the process is to do a left join as follows and create a new table:

select q12004.*, coalesce(adnh.batch, null) as batch

into new_table_name

from q12004

left join adnh on q12004.col035 = adnh.phone

from adnh, q12004

where adnh.phone = q12004.col035

Edward McGovern Ten Centuries Points: 1299 More actions · Answer 9

Indexes a must. Sometimes filtering data will speed things up and order of tables first. Col035 and phone should both be indexed. You can add a query hint with the index name for each table and that would really speed it up.

update q12004

set q12004.batch = adnh.batch

from q12004 inner join adnh

on q12004.col035 = adnh.phone

Where q12004.col034 in (select phone from adnh where q12004.col035 = adnh.phone)

Salvor Ten Centuries Points: 1219 More actions · Answer 10

Is your table a heap?

Could you post the extimated execution plan?

Have you monitored Sql Server activity, including the Freespace scans/sec counter from the SQL Server:Access Methods?

Salvor

Peter P SSCrazy Points: 2777 More actions · Answer 11

Peter P

SSCrazy

Points: 2777

May 21, 2004 at 1:32 am

#507235

What is the data type of the batch column?

Eaton SSC Journeyman Points: 94 More actions · Answer 12

Please confirm your indexes on both tables:

Does each index contain just a phone number or do the indexes contain mutiple columns and if so, what are the columns?

Are the indexes both primary keys?

Are both indexes are clustered?

How mush faster is the query without the update?

Since these phone number columns contain just numbers, why not make them integers? That will be faster than dealing with varchars.

Eaton

Antares686 SSC Guru Points: 125444 More actions · Answer 13

Please run in query analyzer the following:

set showplan_text on

go

update q12004

set q12004.batch = adnh.batch

from adnh

join q12004

on adnh.phone = q12004.col035

go

And post the output back here. This will show us your execution plan and will help us to understand what it is doing.

Also, do the following

select count(*)

from adnh

join q12004

on adnh.phone = q12004.col035

and let us know the result and how many indexes are on the table being updated. In some cases (usually rare) if ou are updating a lot of rows then you may be suffering a performance hit because of a larger number of indexes and page splits in them.

In some case it may make more since to drop some indexes before doing the update then add them back.

Sometimes you may need to batch the query until complete like this

Declare @rows int

SET @rows = 1

set rowcount 10000

WHILE @rows < 0

BEGIN

update q12004

set q12004.batch = adnh.batch

from adnh

inner join q12004

on adnh.phone = q12004.col035

where q12004.batch != adnh.batch

set @rows = @@rowcount

END

Note: If the same adnh.phone exists multiple times with different adnh.batch values then you cannot use that method. This method helps with log file growth do to logged changes.

If you do have the same adnh.phone exists multiple times with different adnh.batch values and there is a way to determine the last occurance then you may want to subquery first so you decrease the data changes to the same column.

EX.

update q12004

set q12004.batch = adnh.batch

from adnh

inner join q12004

on adnh.phone = q12004.col035

inner join

(select X.batch, max(X.oper_date) maxdate from adnh X group by X.batch) Y

ON

adnh.oper_date = Y.maxdate

Not sure without more details what may be best.