June 8, 2005 at 6:35 am
Hello,
I have just cancelled a query that had taken 3 hours to run (it looked like it was goind to run all night!). Just wondering if any of you can suggest a more efficient way to run the query. Here is what I have:
TABLE_A (Thoroughfare varchar(50), ThoroughfareID int NULL)
has 26,000,000 rows
has index on Thoroughfare
ThoroughfareID is NULL in all rows
TABLE_B (id int PRIMARY KEY, Thorughfare varchar(50))
has 300,000 rows
has index on Thoroughfare
I want to update ThoroughfareID in TABLE_A with the corresponding ID from TABLE_B where the Thoroughfare fields match. This is the query I used:
update TABLE_A
set ThoroughfareID = b.ID
from TABLE_B b
where b.Thoroughfare = TABLE_A.Thoroughfare
I would not have thought updating 26,000,000 rows would have taken anywhere near 3 hours (the server is high-spec and nothing else is running on it). Any suggestions welcome.
June 8, 2005 at 6:53 am
Try updating less rows at the time. It'll take much less ressource to do this task.
if object_id('Test') > 0
DROP TABLE Test
GO
CREATE TABLE [Test] (
[id] [int] NOT NULL ,
[Name] [sysname] NOT NULL ,
[Colid] [smallint] NOT NULL ,
[Colid2] [int] NULL ,
[LastId] [int] NULL ,
CONSTRAINT [PK_Test] UNIQUE CLUSTERED
(
[id],
[Colid],
[Name]
) ON [PRIMARY]
) ON [PRIMARY]
GO
Insert into Test (id, name, Colid) Select id, name, Colid from dbo.SysColumns
GO
Update Test set Colid2 = null, LastId = null
SET rowcount 100 --use whatever fits your needs
select 'starting (init rowcount)'
while @@rowcount > 0
begin
print 'updating'
Update dbo.Test set LastId = Id where LastId is null
end
SET rowcount 0
Select * from Test
GO
DROP TABLE Test
June 9, 2005 at 2:03 am
Remi,
That worked well, thank you. I ran
set rowcount 100000
select 'Populating value of ThoroughfareID...'
while @@rowcount > 0
update TABLE_A
set ThoroughfareID = b.ID
from TABLE_B b
where b.Thoroughfare = TABLE_A.Thoroughfare
and TABLE_A.ThoroughfareID is null
set rowcount 0
and all 26,000,000 rows were updated in 25 minutes.
Thanks again!
-Steve
June 9, 2005 at 3:14 am
I like this way:
UPDATE TABLE_A
SET TABLE_A.ThoroughfareID = b.ID
FROM TABLE_A
INNER JOIN TABLE_B b ON TABLE_A.Thoroughfare = b.Thoroughfare
WHERE TABLE_A.ThoroughfareID IS NULL
Andy
June 9, 2005 at 3:22 am
Andy,
Is there any performance difference doing it this way? My guess is that it is merely a syntactical variation, and the actual query run by SQL in both cases is identical. Am I right?
June 9, 2005 at 6:46 am
Yes it's syntaxe only... but I preffer the read something that looks like an inner join. That way I can't miss the 2nd table in the query... But that's just me.
June 9, 2005 at 6:13 pm
Personally I thought you always had to have the inner join syntax - nice to know you didn't need it
Out of curiousity, were the indices on your varchar columns clustered? That would certainly help the operation as SQL could then just step through the tables row by row updating rather than going to the next row and peforming some index + bookmark lookup to get the update value.
Cheers
June 10, 2005 at 1:42 am
Ian,
No, the index on TABLE_A.Thoroughfare is not clustered. There is no clustered index on the table - I wonder how long it would take SQL to create a clustered index on a 4GB, 26,000,000-row table...
June 10, 2005 at 2:29 am
If you're ever bored, please try it and let us know
I think having them clustered could certainly help if you were after further gains in performance?
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply