Is there a more efficient way to run this query?

Question

Post reply

Is there a more efficient way to run this query?

xnl28

Ten Centuries

Points: 1094
More actions
June 8, 2005 at 6:35 am

#167366

Hello,
I have just cancelled a query that had taken 3 hours to run (it looked like it was goind to run all night!). Just wondering if any of you can suggest a more efficient way to run the query. Here is what I have:
TABLE_A (Thoroughfare varchar(50), ThoroughfareID int NULL)
has 26,000,000 rows
  has index on Thoroughfare
  ThoroughfareID is NULL in all rows
TABLE_B (id int PRIMARY KEY, Thorughfare varchar(50))
has 300,000 rows
  has index on Thoroughfare
I want to update ThoroughfareID in TABLE_A with the corresponding ID from TABLE_B where the Thoroughfare fields match. This is the query I used:
update TABLE_A
set ThoroughfareID = b.ID
from TABLE_B b
where b.Thoroughfare = TABLE_A.Thoroughfare
I would not have thought updating 26,000,000 rows would have taken anywhere near 3 hours (the server is high-spec and nothing else is running on it). Any suggestions welcome.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

Try updating less rows at the time. It'll take much less ressource to do this task.

if object_id('Test') > 0

DROP TABLE Test

GO

CREATE TABLE [Test] (

[id] [int] NOT NULL ,

[Name] [sysname] NOT NULL ,

[Colid] [smallint] NOT NULL ,

[Colid2] [int] NULL ,

[LastId] [int] NULL ,

CONSTRAINT [PK_Test] UNIQUE CLUSTERED

(

[id],

[Colid],

[Name]

) ON [PRIMARY]

GO

Insert into Test (id, name, Colid) Select id, name, Colid from dbo.SysColumns

GO

Update Test set Colid2 = null, LastId = null

SET rowcount 100 --use whatever fits your needs

select 'starting (init rowcount)'

while @@rowcount > 0

begin

print 'updating'

Update dbo.Test set LastId = Id where LastId is null

end

SET rowcount 0

Select * from Test

GO

DROP TABLE Test

xnl28 Ten Centuries Points: 1094 More actions · Answer 2

Remi,

That worked well, thank you. I ran

set rowcount 100000

select 'Populating value of ThoroughfareID...'

while @@rowcount > 0

update TABLE_A

set ThoroughfareID = b.ID

from TABLE_B b

where b.Thoroughfare = TABLE_A.Thoroughfare

and TABLE_A.ThoroughfareID is null

set rowcount 0

and all 26,000,000 rows were updated in 25 minutes.

Thanks again!

-Steve

David A. Long SSCrazy Eights Points: 8421 More actions · Answer 3

I like this way:

UPDATE TABLE_A

SET TABLE_A.ThoroughfareID = b.ID

FROM TABLE_A

INNER JOIN TABLE_B b ON TABLE_A.Thoroughfare = b.Thoroughfare

WHERE TABLE_A.ThoroughfareID IS NULL

Andy

xnl28 Ten Centuries Points: 1094 More actions · Answer 4

Andy,

Is there any performance difference doing it this way? My guess is that it is merely a syntactical variation, and the actual query run by SQL in both cases is identical. Am I right?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Yes it's syntaxe only... but I preffer the read something that looks like an inner join. That way I can't miss the 2nd table in the query... But that's just me.

Ian Yates SSCoach Points: 19738 More actions · Answer 6

Personally I thought you always had to have the inner join syntax - nice to know you didn't need it

Out of curiousity, were the indices on your varchar columns clustered? That would certainly help the operation as SQL could then just step through the tables row by row updating rather than going to the next row and peforming some index + bookmark lookup to get the update value.

Cheers

xnl28 Ten Centuries Points: 1094 More actions · Answer 7

Ian,

No, the index on TABLE_A.Thoroughfare is not clustered. There is no clustered index on the table - I wonder how long it would take SQL to create a clustered index on a 4GB, 26,000,000-row table...

Ian Yates SSCoach Points: 19738 More actions · Answer 8

If you're ever bored, please try it and let us know

I think having them clustered could certainly help if you were after further gains in performance?