How can I improve this code?

  • Hi,

    I'm having a bit of a headache with some code I'm running on a database.

    I have had to import a lot of data from Oracle into SQL Server. On the Oracle platform there were two tables for the data. One which contained all of the index information, and one which contained keywords for the data.

    The keywords tables contains multiple entries for each record in the indexes table and is linked by the index_id.

    On sql server, the data has to be merged into one table, which means there is now a keyword column in the index table, and all of the keywords need to be merged into a space separated list in that column.

    I have imported the indexes information successfully, and have decided to import the keywords into a temporary (not a sql temporary table as such - just one that I will remove once the keywords are successfully imported at the go live point) table and then update the indexes table with the correct information as working with Oracle gives me a much worse headache.

    I have written a UDF which takes a keyword and index type, and generates the list of keywords for that record from the temp keywords table. I have also written some code which works (using a cursor), but is taking forever to run. There are 88000 indexes, and over 500000 keywords. I would really appreciate it if someone could run through my code, and let me know what I can do to make it work faster and better. I know there has to be a better way, I'm just not finding it.

    This is the function which generates the keyword list:

    CREATE FUNCTION KeywordString

    (@id int, @type char(1))

    returns varchar(5000)

    AS

    begin

    declare @KeywordList varchar(5000)

    select @KeywordList = coalesce(@KeywordList+' ','') + Keyword

    from TempKeywords

    where Index_ID = @id and SourceType = @type

    order by rowid

    return @csv

    end

    This is the cursor which loops through the indexes and gets the keywords from the temporary table and updates the index records with them:

    DECLARE @id int, @type char(1)

    DECLARE k_cursor CURSOR FOR

     SELECT Index_OldID,Index_Type

     FROM Tbl_Indexes

    OPEN k_cursor

    FETCH NEXT FROM k_cursor INTO @id, @type

    WHILE @@FETCH_STATUS = 0

     BEGIN

      UPDATE Tbl_Indexes SET Index_Desc = SFI.dbo.KeywordString(@id, @type) WHERE Index_OldID = @id AND Index_Type = @type

      FETCH NEXT FROM k_cursor INTO @id, @type

     END

    CLOSE k_cursor

    DEALLOCATE k_cursor

  • My first recomendation is not to denormalize it.

    The second is

    I don't see why you need a cursor though

    UPDATE Tbl_Indexes SET Index_Desc = SFI.dbo.KeywordString(Index_OldID , Index_Type )

     


    * Noel

  • I did try that. For some reason it just sits running for ages and ages and nothing happens. No rows get updated. I don't know why.

  • this update is gonna table a while... you have 500 000 keywords about 88 000 indexes. the udf will be called once for each index. This is gonna take a while. How about just doing the select of the update and see how long it will take to run?

  • Well, if you stop the update even if it runs for ages then all changes will be rolledback there for no updates.

    Now, let's concentrate on the rest.

    1. Does the TempKeywords table exists already independently of the Tbl_Indexes table ?

    2. Do you have indexes on  Index_ID and SourceType  on the TempKeyWords Table

    3. Does it works if you specify a particular set of values like:

    UPDATE Tbl_Indexes SET Index_Desc = SFI.dbo.KeywordString(Index_OldID , Index_Type ) Where Index_OldID = n and Index_type ='xx'

    4. This is supposed  to be a very slow operation because you are scanning the WHOLE index table!

     


    * Noel

  • 1. It does exist yes. I have already created it and populated with the keyword data.

    2. No indexes yet. Will add them in.

    3. It does work with a particular set of values.

    4. I know it's going to take a while, but I don't want to have to leave it running for ages. Surely I should be able get it done in under an hour?

    Also, running a select version of the query also takes forever - I haven't let it run all the way yet. It takes 3 minutes and 16 seconds to return 1778 rows.

  • I would check the execution plan of the function and check for scans and try to turn then into seeks... it's the most likely reason for your slowlyness.

  • It is all good now. It turns out putting the indexes on the table worked miracles. It now takes 1 minute and 16 seconds. Brilliant!!!

    Thanks for you help guys. I really apprecaite it. This was an excellent response to my first post.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply