Is there a more efficient way to run this query?

  • Hello,

    I have just cancelled a query that had taken 3 hours to run (it looked like it was goind to run all night!). Just wondering if any of you can suggest a more efficient way to run the query. Here is what I have:

    TABLE_A (Thoroughfare varchar(50), ThoroughfareID int NULL)

      has 26,000,000 rows

      has index on Thoroughfare   

      ThoroughfareID is NULL in all rows

    TABLE_B (id int PRIMARY KEY, Thorughfare varchar(50))

      has 300,000 rows

      has index on Thoroughfare

    I want to update ThoroughfareID in TABLE_A with the corresponding ID from TABLE_B where the Thoroughfare fields match. This is the query I used:

      update TABLE_A

      set ThoroughfareID = b.ID

      from TABLE_B b 

      where b.Thoroughfare = TABLE_A.Thoroughfare

    I would not have thought updating 26,000,000 rows would have taken anywhere near 3 hours (the server is high-spec and nothing else is running on it). Any suggestions welcome.

  • Try updating less rows at the time. It'll take much less ressource to do this task.

    if object_id('Test') > 0

    DROP TABLE Test

    GO

    CREATE TABLE [Test] (

    [id] [int] NOT NULL ,

    [Name] [sysname] NOT NULL ,

    [Colid] [smallint] NOT NULL ,

    [Colid2] [int] NULL ,

    [LastId] [int] NULL ,

    CONSTRAINT [PK_Test] UNIQUE CLUSTERED

    (

    [id],

    [Colid],

    [Name]

    ) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    Insert into Test (id, name, Colid) Select id, name, Colid from dbo.SysColumns

    GO

    Update Test set Colid2 = null, LastId = null

    SET rowcount 100 --use whatever fits your needs

    select 'starting (init rowcount)'

    while @@rowcount > 0

    begin

    print 'updating'

    Update dbo.Test set LastId = Id where LastId is null

    end

    SET rowcount 0

    Select * from Test

    GO

    DROP TABLE Test

  • Remi,

    That worked well, thank you. I ran

    set rowcount 100000

    select 'Populating value of ThoroughfareID...'

    while @@rowcount > 0

     update TABLE_A

     set ThoroughfareID = b.ID

     from TABLE_B b

     where b.Thoroughfare = TABLE_A.Thoroughfare

     and TABLE_A.ThoroughfareID is null

    set rowcount 0

    and all 26,000,000 rows were updated in 25 minutes.

    Thanks again!

    -Steve

  • I like this way:

    UPDATE TABLE_A

      SET TABLE_A.ThoroughfareID = b.ID

    FROM TABLE_A

      INNER JOIN TABLE_B b ON TABLE_A.Thoroughfare = b.Thoroughfare

    WHERE TABLE_A.ThoroughfareID IS NULL

    Andy

  • Andy,

    Is there any performance difference doing it this way? My guess is that it is merely a syntactical variation, and the actual query run by SQL in both cases is identical. Am I right?

  • Yes it's syntaxe only... but I preffer the read something that looks like an inner join. That way I can't miss the 2nd table in the query... But that's just me.

  • Personally I thought you always had to have the inner join syntax - nice to know you didn't need it

    Out of curiousity, were the indices on your varchar columns clustered?  That would certainly help the operation as SQL could then just step through the tables row by row updating rather than going to the next row and peforming some index + bookmark lookup to get the update value.

     

    Cheers

  • Ian,

    No, the index on TABLE_A.Thoroughfare is not clustered. There is no clustered index on the table - I wonder how long it would take SQL to create a clustered index on a 4GB, 26,000,000-row table...

  • If you're ever bored, please try it and let us know

    I think having them clustered could certainly help if you were after further gains in performance?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply